Part 3 of 4 – Bob Aiello on Agile Configuration Management – The First Seven Things
Wall Street got a dramatic reminder of the value of strong configuration management (CM) stewardship on August 1, 2012, when Knight Capital Group experienced an incident which resulted in the erroneous purchase of stocks worth over 7 billion dollars. Knight had little choice but to sell as many of the stocks as possible, resulting in a 440 million dollar loss which I reported on in stickyminds. Bloomberg published an article on August 14th that claims that the software glitch was due to, “software that was inadvertently reactivated when a new program was installed, according to two people briefed on the matter.” The Bloomberg articlewent on to say, “Once triggered on Aug. 1, the dormant system started multiplying stock trades by one thousand, according to the sources, who requested anonymity because the firm hasn’t commented publicly on what caused the error. Knight’s staff looked through eight sets of software before determining what happened, the people said.” This incident highlights the importance of Configuration Management Best Practices. This article will describe some of the essential IT controls that could have potentially prevented this mistake from occurring (and I personally guarantee that they would cost a lot less than 440 million dollars to implement). First, here’s a quick description of the regulatory environment within which Knight Capital and other financial services firms must operate on a daily basis.
Federal regulatory agencies, including the Federal Financial Institutions Examination Council’s (FFIEC) and the Office of the Comptroller of the Currency (OCC), monitor the IT controls established by banks and other financial institutions who are required to comply with section 404 of the Sarbanes-Oxley Act of 2002. The ISACA Cobit framework is the defacto standard which describes the essential controls that need to be established – including both change and configuration management.
In a regulatory environment, all changes need to be controlled. This is usually accomplished by having the proposed changes reviewed by a Change Control Board (CCB). Each request for change (RFC) must be reviewed with an assessment of the potential downstream impact (e.g. risks) of the change. Once approved, releases can be deployed and then verified. In CM terminology, there is a requirement for a physical and functional configuration audit, which means that the deployed binaries (called configuration items) must be verified to be the correct version and also that they are functioning as desired. Obviously, software must be thoroughly tested before it is approved for promotion to production.
Automating the application build, package and deployment is essential for success and this is precisely what DevOps is all about. In classic CM terminology, status accounting is the function that tracks a configuration item (CI) throughout its lifecycle and this would absolutely include retiring (uninstalling) any assets determined to be no-longer needed.
Apparently, Knight Capital lacked the necessary procedures to accurately track changes and deploy their code. According to the Bloomberg report, Knight did not know exactly what code had been deployed to their production servers and, most importantly, how to retire assets that were no longer being utilized.
Now before anyone starts to feel too smug, let’s consider the fact that most of the financial services firms on Wall Street lack the basic configuration management procedures to ensure that this same problem cannot occur on their servers. Financial services firms are not the only companies lacking CM Best Practices. I have also seen medical firms (including those rsponsible for surgical operating room equipment), government agencies and many other companies with mission critical systems that lack these basic competencies.
It’s time for IT controls to be implemented on all computer systems that matter. This just makes sense and obviously improves productivity and quality. Not to mention the savings benefit – proper CM controls can prevent many types of errors which, though easily overlooked, can cause millions of dollars of losses in just minutes!
We will be covering the top technology news stories including DevOps-related stories about failed deployments as well as new and exciting technologies to help ensure that systems are secure and reliable.
Please submit your technology news announcements to email@example.com
DevOps focuses on improving communication and collaboration between software developers and the operation professionals who help to maintain reliable and dependable systems. In our consulting practice, we often assess and evaluate existing practices and then make recommendations for improving. Our focus is often on configuration and release management and, lately, today’s popular new star, DevOps best practices as well. Bringing different technology groups together can result in some interesting challenges. We often feel like we are doing group therapy for a very dysfunctional family and many of the challenges encountered highlight the biases that people often bring into the workplace. This article will describe how to identify these behavioral issues and utilize positive psychology to help develop high performance teams.
We all come to work with the sum of our own past experiences and personal views which, by definition, means that we are predisposed to having specific viewpoints and maybe even more than a few biases. Many professionals come into meetings with their own agenda based upon their experiences, most business-related, some not. When conducting an assessment, we are typically asking participants to explain what they believe works well in their organization and what can be improved. In practice, getting people comfortable results in better and more useful information. When we bring developers into a room to talk about their experiences, we get a very different view than when we speak with their counterparts in operations or other departments including QA and testing. The stories we hear initially may sound like a bad marriage that cannot be saved. Fortunately, our experience is that there is also a great deal of synergy in bringing different viewpoints together. The key is to get the main issues on the table and facilitate effective and open communication.
Developers are often pressured to rapidly create new and exciting product features, using technology that itself is changing at a breathtaking rate. The QA and testing group is charged with ensuring that applications are defect-free and members often have to work under consider pressure, including ever shrinking timelines. The operations group must ensure that systems are reliable and available on a consistent basis. Each of these stakeholders has a very different set of goals and objectives. Developers want to roll out changes constantly, delivering new and exciting features while operations and QA may find themselves challenged to keep up with the demand for new releases. The frustration we hear reflects the somewhat self-focused perceptions from each side of the table as their differing perspectives cause an impasse.
Developers are highly skilled and often much more technically knowledgeable than their counterparts in QA and operations. This makes for some challenging dynamics in terms of mutual respect and collaboration. The operations and QA professionals often feel that developers are the immature children who lack discipline and constantly try to bypass established and necessary IT controls. This clashing of views and values is often a source of conflict within the organization with decisions being made based upon positional power by senior executives who may not be completely aware of all of the details of each challenge. The fact is that this conflict can be very constructive and lead to high performance if managed effectively.
Psychologists Martin Seligman and Mihaly Csikszentmihalyi have developed an approach, known as Positive Psychology, which focuses on encouraging positive and effective behaviors that will help to bring out the best in each stakeholder during these challenging situations . By focusing on developing desirable behaviors, positive psychology moves from just identifying behavioral dysfunction to promoting effective and high performance behaviors. The first area to focus on is honest and open communication. Martin Seligman uses the term bravery to describe speaking up or taking the initiative, a person’s capacity to exhibit valor and courage. Integrity and honesty, along with perseverance and diligence, are also desirable traits that need to be modeled and encouraged in positive organizations. Successful organizations value and encourage these characteristics and their active expression. Positive organizations encourage their employees to take initiative and ensure that employees feel safe – even when reporting a potential problem or issue. Dysfunctional organizations punish the whistleblower, while effective organizations recognize the importance of being able to evaluate the risks or problems that have been brought to their attention and actively solicit such self-monitoring efforts.
We typically meet with each stakeholder separately and document their views, including frustrations and challenges. We then put together a report that synthesizes all of our findings including existing challenges and suggestions for improvements. The truth is that dysfunctional behavior must be identified and understood. But the next step is to bring all stakeholders to the table to look for solutions and suggest positive ideas for making improvements. Sometimes, this feels a little like horse trading. We may get one group which is convinced that only open source tools are appropriate for use while another team may be very interested in the features and support that comes from commercial products. We often facilitate the evaluation and selection of the right tools and processes with appropriate transparency, collaboration and communication.
Positive psychology focuses on promoting the right kinds of behaviors that you need for a high performance team. Obviously, this has to start with understanding existing views and experiences. Clearly, bringing stakeholders to the table and getting their management to support, reward and model collaborative behavior is the key to any high performance team and successful organization!
 Seligman, M. E. P., & Csikszentmihalyi, M. (2000). Positive psychology: An introduction. American Psychologist, 55, 5–14
 Seligman, Martin, Authentic Happiness: Using the New Positive Psychology to Realize Your Potential for Lasting Fulfillment, Free Press, New York 2002
 Abramson, L. Y.; Seligman, M. E. P.; Teasdale, J. D. (1978). “Learned helplessness in humans: Critique and reformulation”. Journal of Abnormal Psychology 87
 Deming, W. Edwards (1986). Out of the Crisis. MIT Press
 Aiello, Bob and Leslie Sachs. 2010. Configuration Management Best Practices: Practical Methods that Work in the Real World. Addison-Wesley Professional.
Computer systems today have reached a level of complexity that is truly amazing. We expect websites and packaged software to have an incredible number of features and we also expect systems to practically anticipate our every need and response. Creating feature rich systems is not an easy job and neither is writing the deployment infrastructure that empowers the organization to deliver new features continuously while maintaining a high level of reliability and quality. Software engineers and architects do an amazing job designing a systems architecture to fully represent all of the parts of the system that are created during the development lifecycle. One of the biggest challenges is fully understanding how each part of the system depends upon the others.
Software today is often designed and implemented as components that fit together and run seamlessly as a complete system. One of the biggest challenges that we have today is being able to successfully update one or more components, without any risk of a downstream to the other parts of the system. There is a lot of complexity involved in creating a structure that allows one component of the system to be updated without any chance of a downstream impact on the other components. How do we go about understanding and managing this complexity?
The first step is to create a logical model of the system to help all of the other stakeholders understand how the different parts of the system are assembled and work together. In my work, I often find that we have many specialists but very few people who understand the entire system end-to-end. In deployment engineering, I often have developers who are concerned with deploying the entire release just to fix a specific bug. I understand their concern but truthfully, managing patches to a release can actually be a lot more complicated than deploying a full release. The next thing that I often hear is that deploying a full release requires that you test the entire system.
The truth is that you have to retest the entire system even of you just deploy a patch unless you may fully understand how that patch impacts the other components of the system. The point here is that managing component dependencies is essential and it is not a trivial task. I recommend that organizations develop their software to be discoverable by embedding immutable version IDs and having a formal way to represent component dependencies such as descriptive XML files that could be shipped with the code to help with understanding how each part of the system depends upon the others. The only time when you will be able to understand and document these dependencies is during the software and system development lifecycle.
Many systems are developed by teams of highly qualified consultants who work under extreme pressure to develop feature rich software in a very short period of time. Once these technology experts are done they move on to the next project and you may find that you no longer have anyone from the original development team who really understands all of the internal component dependencies. When software is being written you have a unique opportunity to document dependencies and design a strategy for managing patches or full baselined releases in an automated way. This is exactly the same challenge that quality engineers face when they develop robust automated tests including service virtualization testing which is becoming a popular practice within continuous testing. Systems have to designed to ensure that we can manage complexity.
Complexity is not bad. We need to develop strategies to understand and manage the complexity inherent in writing complex software systems. The first step is to design systems to be fully verifiable using automated test harnesses. We can use this same approach to understand and document component dependencies and then develop strategies to be able to reliably update software in patches (or verifiable full baselined releases) while ensuring that we fully understand component dependencies. Designing a logical model is an important part of this effort but ensuring that we have some mechanism such as a descriptive XML file is a must-have for documenting and managing component dependencies. There is no magic here and very few technologies allow you to reverse engineer component dependencies. You need to design your systems to have these capabilities.
The good news is that if you do this right you will find that your systems are easier to test and update. More importantly, you will be able to continuously deliver new features to your customers while maintaining a high level of systems reliability. How do you manage component dependencies? Drop me a line and share your best practices!!
We do not share your information with other organizations. We ask for your Name and Email address so that we can mail you our newsletters and journals. We use other information that you provide to us to ensure that the articles that we write are relevant and useful to our readers.
Please contact me personally with any concerns or requests.
We write about industry best practices based upon our experience and the reported experience of our colleagues. Your results may be different. Obviously, we cannot guarantee that best practices will yield the same results in your environment and we cannot take responsibility for how you implement these practices in your organization. That said, contact us and we will do what we can to help!