Wall Street got a dramatic reminder of the value of strong configuration management (CM) stewardship on August 1, 2012, when Knight Capital Group experienced an incident which resulted in the erroneous purchase of stocks worth over 7 billion dollars. Knight had little choice but to sell as many of the stocks as possible, resulting in a 440 million dollar loss which I reported on in stickyminds. Bloomberg published an article on August 14th that claims that the software glitch was due to, “software that was inadvertently reactivated when a new program was installed, according to two people briefed on the matter.” The Bloomberg articlewent on to say, “Once triggered on Aug. 1, the dormant system started multiplying stock trades by one thousand, according to the sources, who requested anonymity because the firm hasn’t commented publicly on what caused the error. Knight’s staff looked through eight sets of software before determining what happened, the people said.” This incident highlights the importance of Configuration Management Best Practices. This article will describe some of the essential IT controls that could have potentially prevented this mistake from occurring (and I personally guarantee that they would cost a lot less than 440 million dollars to implement). First, here’s a quick description of the regulatory environment within which Knight Capital and other financial services firms must operate on a daily basis.
Federal regulatory agencies, including the Federal Financial Institutions Examination Council’s (FFIEC) and the Office of the Comptroller of the Currency (OCC), monitor the IT controls established by banks and other financial institutions who are required to comply with section 404 of the Sarbanes-Oxley Act of 2002. The ISACA Cobit framework is the defacto standard which describes the essential controls that need to be established – including both change and configuration management.
In a regulatory environment, all changes need to be controlled. This is usually accomplished by having the proposed changes reviewed by a Change Control Board (CCB). Each request for change (RFC) must be reviewed with an assessment of the potential downstream impact (e.g. risks) of the change. Once approved, releases can be deployed and then verified. In CM terminology, there is a requirement for a physical and functional configuration audit, which means that the deployed binaries (called configuration items) must be verified to be the correct version and also that they are functioning as desired. Obviously, software must be thoroughly tested before it is approved for promotion to production.
Automating the application build, package and deployment is essential for success and this is precisely what DevOps is all about. In classic CM terminology, status accounting is the function that tracks a configuration item (CI) throughout its lifecycle and this would absolutely include retiring (uninstalling) any assets determined to be no-longer needed.
Apparently, Knight Capital lacked the necessary procedures to accurately track changes and deploy their code. According to the Bloomberg report, Knight did not know exactly what code had been deployed to their production servers and, most importantly, how to retire assets that were no longer being utilized.
Now before anyone starts to feel too smug, let’s consider the fact that most of the financial services firms on Wall Street lack the basic configuration management procedures to ensure that this same problem cannot occur on their servers. Financial services firms are not the only companies lacking CM Best Practices. I have also seen medical firms (including those rsponsible for surgical operating room equipment), government agencies and many other companies with mission critical systems that lack these basic competencies.
It’s time for IT controls to be implemented on all computer systems that matter. This just makes sense and obviously improves productivity and quality. Not to mention the savings benefit – proper CM controls can prevent many types of errors which, though easily overlooked, can cause millions of dollars of losses in just minutes!