How to Fix Change Control – Understanding DevOps’ Secret Weapon
by Bob Aiello with Dovid Aiello
In many organizations, Change Control is badly broken. Very often, change control only focuses on the calendar and fails to realize its true potential of ensuring that changes can be delivered as frequently as necessary in a secure and reliable way. In our consulting practice, we hear many complaints of boring two-hour meetings where nothing seems to actually get done. Change control is often perceived as being little more than a rubber stamp and, as one esteemed colleague famously claimed publicly, that the purpose of change control was to “prevent change”. We disagree and believe that change control can be a valuable function that helps identify and mitigate technical risk. That said, very few organizations have effective change control practices. Here’s how to fix change control in your company and realize the benefits of DevOps’s secret weapon.
We have previously written about the dysfunction that often resides in the operations organization. This dysfunction often involves change control. The poorly-managed change control function wastes time, but that is only the tip of the “dysfunctional” iceberg. Far more serious is the missed opportunity to identify and mitigate serious technical risks. This failure often results in incidents and problems – often at a loss to the company, both in terms of profitability, as well as reputation. The bigger problem is the missed opportunity to be able to roll out changes faster and thus enabling secure and reliable systems, not to mention, delivering business functionality. When change control fails, most senior managers declare that they are going to stop allowing changes – which is actually the worst thing that you can decide to do. Slowing down changes almost always means that you are going to bunch together many changes and allow change windows less frequently, such as on a bimonthly basis. When we help teams fix their change control, the first thing we push for is making more frequent changes, but keeping them to very tiny changes. The typical cadence that we find that works well is most often moving from bimonthly change windows to twice weekly – ideally during the week. There is something magical about moving from bimonthly to twice a week that often eliminates much of the noise and frustration.
One important approach is to identify which changes are routine and low-risk, categorizing them as “pre-approved” or standard changes. Changing a toner cartridge on a printer is a good example, as it is a task that has been done many times before. Communication that the printer will be down for this activity is important, but it does not require a discussion during the change control meeting. Standard changes should be, ideally, fully automated and if you are using the ITIL framework, listed in your [1] service catalogue. Getting all of the easy changes pre-approved means that your change control meeting can now focus on the changes which actually require some analysis of technical risk.
Normal changes follow your normal change control process. Emergency changes should only be used when there is a true emergency and not just someone trying to bypass the change control process. Occasionally, someone may miss the normal change control deadline and then you may need an “out-of-cycle” change that would have been a normal change had the person made the deadline. One effective way to ensure that folks are not just using emergency changes to bypass the change control process is to require that all emergency changes be reviewed by your highest-ranking IT manager – such as the CTO.
Another effective approach is to distinguish between the change control board (CCB) and the change advisory board (CAB). Frankly, this has been an area of confusion for many organizations. The CCB is responsible for the change control process. The change advisory board should be comprised of sharp subject matter experts who can tell you the potential impact of making (or not making) a particular change. Make sure that you ask them who else might be impacted and should be brought into the discussion. We have seen many organizations, unfortunately, rename their CCB to CAB (cool name it is) and in doing so, lose the input from the change advisory folks. Keep your CCB separate from your CAB. The CCB handles the process – while the CAB advises on the technical risk of making (or perhaps not making) a particular change.
In reviewing each change, make sure that the change is described clearly and in sufficient detail to understand each step. We see many change requests that are just high-level descriptions which can be open to interpretation by the person making the changes and consequently result in human errors that lead to incidents and problems.
Testing, as well as verification and validation (V&V), criteria should always be specified. By testing, we refer to continuous testing beginning with unit testing and extending into other forms of testing, including regression, functional/nonfunctional, and integration testing. (We are huge fans of API and service virtualization testing, but that is the subject of another article.) Verification usually refers to whether or not the change meets the requirements and validation ensures that the system does what it needs to do. Some folks refer to fitness for warranty and fitness for use. If you want effective DevOps you must have robust continuous testing and the change control process is the right toll gate to ensure that testing has been implemented and especially automated. We’d be remiss if we did not mention the importance of asking about backout plans. In short, always have a plan B.
Change control done well is indeed the DevOps’ secret weapon. Making changes more often should be your goal and keeping those changes as tiny and isolated as possible will help to reduce the risk of making changes. We like to have everyone share a screen and have that DevOps cross-functional team ensure that every change is executed successfully. Every change must be automated. If this is not possible, then make sure that you have a 4-eyes policy where one person makes the change and another person observes and verifies that the manual step has been completed successfully. Always record the session – and allow others to see what you’re doing and then review the recordings to identify areas where you can improve your deployment processes.
The best organizations have processes which are transparent and allow others to learn and help continuously improve the deployment process. Change control can help you get to a place where you can safely make changes as often as you need to, helping to deliver secure and reliable systems.
What change control practices do you believe are most effective? Drop us a line and share your best practices!
[1] The service catalogue is an automated set of jobs that perform routine “low risk” tasks such as taking backups and changing toner cartridges.
Since the request is in the service catalogue, the change may be designated as being “standard” (pre-approved) and then there is no need to perform risk assessment in change control.