Continuous Integration – It’s Not Just for Developers!


Continuous Integration – it’s not just for Developers!
By Bob Aiello

Recently, I was having a conversation with a group of colleagues who are some of the smartest people with whom I have had the privilege of working in the discipline of DevOps. This team of amazing people is collaborating together to write the IEEE P2675 DevOps Standard for Building Reliable and Secure Systems Including Application Build, Package and Deployment. We have over one hundred IT experts signed up to help with this project and about thirty who are actively engaged each week with drafting the standard, which will then be ready for review in the next month or two. We are not defining DevOps per se, as lots of folks who like to attend conferences have done a fine job of doing that already. What we are doing is defining how to do DevOps (and to some extent Agile) in organizations that must utilize industry standards and frameworks to adhere to regulatory requirements that are commonly found in industries such as banking, finance, medical, pharmaceutical and defense. I usually describe this effort as explaining how to do DevOps and still pass an audit. One of the topics that keeps coming up in our discussions is whether or not our work is too focused on development and not enough on operations. Recently, one of my colleagues suggested that continuous integration was an example of a key DevOps practice that was entirely focused on development. I disagree completely with this point of view and this article is the first of series explaining how continuous integration, delivery and deployment are fundamental to the work done by any operations team.
DevOps is a set of principles and practices which is intended to help development and operations collaborate and communicate more effectively. We have written articles previously (and in our Agile ALM DevOps book) on DevOps principles and practices. The most fundamental concern for the DevOps practitioner is to understand that different stakeholders hold diverse views and often bring to the table varying types of expertise. DevOps is all about ensuring that we all share our knowledge and help each team perform more effectively. In organizations which prioritize DevOps excellence, teams are constantly sharing their screens and sharing their knowledge and expertise. Ops is a key player in this effort and often “the adult in the room” who understands how the application is actually going to perform in a production environment with a production workload.

In continuous integration (CI), developers are constantly integrating their code (often by merging source code and components) ensuring that the code that they write can work together and a bad commit does not break the build. CI is a seminal process for any DevOps function, and very few folks would think that you are doing DevOps if you did not implement continuous integration. To demonstrate that Ops does indeed do continuous integration, I could use the low-hanging fruit of describing how infrastructure as code (IaC) is written using Terraform or AWS CloudFormation and, no doubt, such efforts are a valid example of systems engineers who need to integrate their code continuously. But there are more dramatic examples of Ops using continuous integration which present greater challenges. For example, operations engineers often have to manage patching servers at the operating system, middleware and applications levels. Hardware changes may also have to be integrated into the systems which can have far reaching impact, including managing changes to storage and networking infrastructure. Configuration changes, from firewalls to communication protocols, also have to be managed – and continuously integrated. These changes are coming from engineers in different groups, often working in silos and are every bit as complicated as merging code between developers. The work also has to be done using a full systems lifecycle which is often sadly overlooked.

Let’s take a look at how we might implement a systems lifecycle in support of continuous integration for Ops. We have been in many conversations where we are told that patches must be applied to a server to ensure compliance with security and regulatory requirements. We don’t doubt the importance of such efforts, but we often see that there is a lack of discipline and due diligence in how patches are deployed to production servers. The first step should always be to obtain a description of the patches (e.g. release notes) that are going to be applied and have them reviewed by the folks who are able to understand the potential downstream impact of the required patches. Sometimes, patches require changes to the code which may necessitate a full development lifecycle. These changes always require due diligence including both functional and non-functional testing. Knowing what to test should start with understanding what the patch is actually changing. Too often, the folks who downloaded the code for the patch neglect to also download and circulate the release notes describing what the patch is actually changing. (Trust me its right there on the website right next to where you downloaded the patch.) Once you have ensured that the release notes for the patch have been reviewed by the right stakeholders (often including both developers and operations systems engineers), then you are in a much better position to work with your QA and testing experts on an appropriate strategy to test those patches (don’t forget load testing). Obviously, you want to promote a patch through a systems lifecycle just like you would promote any piece of code, so start with patching the Dev servers and work your way up to UAT and then schedule the change window for updating the production servers.

Keep in mind that these patches are often coming at different levels of the system, including the operating system (perhaps low level), middleware and then application frameworks. You probably need to consider related configuration changes that are required and you may need to coordinate these changes with upgrading your storage or networking system. You’ll want to take an iterative approach and continuously integrate these changes – just as if you were integrating application changes. Remember in DevOps we take a wide systems view and we also ensure that all of the smart people around us are well informed and have an opportunity to share their knowledge and expertise. What other examples of Ops practicing continuous integration can you think of? Drop me a line and share your views!

What is ISO Certification Who Needs It and Why

What is ISO Certification, Who Needs it & Why
by Ken Lynch

The main purpose of ISO certification is to offer potential clients an independent assessment of a company’s conformity. In the recent years, there has been increased use of technology in business and potential clients are concerned with the issue of data security. ISO certification is, therefore, a way of quelling the fears of potential investors. Security professionals are aware that compliance does not go hand in hand with security. Compliance, therefore, gives future customers a technique to use the business controls as a way of ensuring the clients’ needs are met.

What is ISO?

In 1946, delegates from twenty-five different countries met at the Institute of Civil Engineers in London. These delegates created an organization referred to as the International Standards Organization (ISO) tasked with forming and unifying industrial standards.

Different types of ISO certification

ISO standards influence the workings of different industries. For many IT companies, meeting ISO standards is a way of meeting the regulations set out by this organization. In the IT industry, there exist three types of standards that assist an organization in compliance, namely ISO 27001, ISO 31000 and ISO 9001.

ISO 27001 standard

This standard sets out requirements for an information management system. For organizations looking to meet ISO certification, creating information management systems is necessary. This standard is concerned with ensuring the security, reliability, and availability of information as part of risk management. As a result, it is concerned with assuring consumers. For certification of this standard, there are two stages. The first stage involves a collection of documents by auditors to ensure that a firm’s ISMS is ready for review. The documentation collected by auditors include a company’s ISMS scope, data security procedure, risk identification, and response process, risk review report, company assets, company policies and compliance requirements.

ISO 31000 standard

This standard outlines the requirements for enterprise risk management (ERM). The risk control process requires that senior management and the board assess the impact and likelihood of risks occurrence in order to determine proper controls to manage risks. When assessing a company’s ERM for certification, auditors look at documents that detail management’s approach to risk identification and mitigation.

ISO 9001 standard

This standard spells out the requirements for a quality management system (QMS). QMS details the techniques and responsibilities over quality control. The ISO 9001 mainly applies to industries that need quality controls. However, it can also offer a new direction for compliance. Audits in this standard review product, process, and system. The documentation collected by auditors covers both mandatory and non-necessary information. Mandatory documents include document control techniques, internal audit methodology, corrective and preventative action policies and control of non-conformance procedures. Certification of this standard can be overwhelming for many companies.

Why is ISO certification necessary?

There is a difference between ISO conformation and ISO certification. ISO conformity means that an organization complies with ISO standards. Any company, for instance, carrying out audits internally can implement ISO conformity as part of business operations.

ISO certification offers customers assurance about quality control and data management. A certified company is one that conforms to ISO standards. Certification also assures outsiders that a company meets requirements established by a group of experts. Due to the many standards ISO establishes, there is a need for companies to be direct in stating which ISO standard they meet.

In addition, ISO certification enables companies to use the opinion of an autonomous third party as evidence of compliance.

What does ISO accredited mean?

ISO establishes standards but does not issue certificates or take part in the certification process. The Committee on Conformity Assessment (CASCO) determines the standards used for certification, which are in turn used by certification organizations. CASCO, therefore, establishes standards that third parties must use to determine whether a company meets ISO standards.

ISO accreditation is different from ISO certification. ISO certification happens after organization policies, techniques and documents are reviewed by an independent third party. When choosing a certification body, an organization should ensure that the third party employs CASCO standards and ensure that they are accredited.

However, companies should not assume that non-accredited third parties are incapable of reviewing their company. Accreditation refers to autonomous capability confirmation. In simple words, accredited bodies are those that have been reviewed independently to ensure they meet CASCO standards. This ensures that accredited bodies can properly review other organizations to determine whether they meet ISO standards.

How automating GRC can ease the burden of ISO certification

The process of managing a company to ensure compliance can confuse managers. Using an automated solution allows a company to determine its controls and conduct a gap analysis so it can manage its workload better.

Author Bio

Ken Lynch is an enterprise software startup veteran, who has always been fascinated about what drives workers to work and how to make work more engaging. Ken founded Reciprocity to pursue just that. He has propelled Reciprocity’s success with this mission-based goal of engaging employees with the governance, risk, and compliance goals of their company in order to create more socially minded corporate citizens. Ken earned his BS in Computer Science and Electrical Engineering from MIT.  Learn more at  .

How to Assess and Improve DevOps


How to Assess and Improve DevOps
By Bob Aiello and Dovid Aiello

Many of my esteemed colleagues are sharing their approaches to implementing continuous delivery and other DevOps best practices. Their approach often assumes that the organization is willing to completely discard their existing processes and procedures and start from what we often refer to as a “green field”. Internet startups can afford to take this approach. But, if you are implementing DevOps in a large financial services firm or a growing ecommerce retailer, you may not be able to stop the train while we lay down new tracks and switches. In my work, I have always had to help teams improve their CM, ALM & DevOps best practices while simultaneously supporting the existing production operations – introducing new processes without disrupting the continuous implementing of new releases and patches to support the business. If you would like to succeed in the real world, then you need to be able to assess your existing practices and come up with a strategy for improving while you still support a vibrant ongoing business. This article explains how to conduct the DevOps assessment.

Too often, DevOps practitioners begin the journey by administering a questionnaire that seeks to assess an organization’s DevOps process maturity. The development managers who typically provide the answers usually become quite adept at gaming the questionnaire. For all appearances, they are doing just great at managing their source code, automating builds and delivering releases to production on a fairly rapid basis. However, if you speak to the folks doing the actual work, you may get a very different impression.

My assessments start with a pretty robust dance card that includes everyone from the CTO to the engineers who do the actual work. If you are going to implement DevOps, make sure that your dance card includes both developers and the operations engineers. You will also want to speak with the data security engineers, QA testers and, most importantly, the end users or their representatives. I usually start by asking what is going well and what could be improved. If I can get the assessment participants to trust me then I can usually identify which practices really need to be improved and which ones should not be changed – at least not in the beginning.

It is important to start by respecting the fact that the organization has evolved and adapted organically, establishing many practices and procedures along their way. You are there to identify what should be improved, but make sure that you start by understanding that the existing approaches may very well be there for good reasons. I believe that you need to respect the journey that the organization has taken to get to this point and then you can focus on what should be improved.

I usually interview participants individually or in small groups and I ask what is going well and what could be improved. I probe with an ear to identifying their understanding and application of the six core configuration management best practices of source code management, build engineering, environment management, change control, release management and deployment engineering. I then try to help the team determine specific initiatives that should be addressed on their journey to implementing DevOps.

My report synthesizes common themes and helps to highlight which items are more urgent than others. It is essential to create an environment where everyone is comfortable being open and truthful so I never report on what one particular person has stated – instead my results group items along the six core CM best practices. One practice consideration is to pick one or two items that can be achieved easily and quickly. Don’t tackle the tough ones first. After you have achieved some small successes, addressing the bigger challenges will become a lot more practical.

Use the information from your assessment to help plan the roadmap for your own process improvement initiative.

Some common initiatives that I often see are:

  • Need for training when adopting new technologies from Bitbucket to Ansible
  • Automation of common tasks that everyone has become accustomed to doing manually
  • Monitoring and addressing failed builds in the continuous integration server
  • Improving coverage for automated testing from unit through functional/regression and don’t forget non-functional testing (including performance)
  • Improving communication and collaboration by breaking down those silos (hint: get everyone to share a screen and show what they are working on)
  • Bringing in a vendor every now and then and making sure their demo is focused on teaching best practices (instead of just a sales pitch)
  • Making it cool to show off what your team is doing

Most of all, remember that change takes time, so take an agile iterative approach to process improvement. As always, view DevOps initiatives as a “team sport” and feel free to drop me a line and share your experiences!


How to Fix Change Control – Understanding DevOps’ Secret Weapon


How to Fix Change Control – Understanding DevOps’ Secret Weapon
by Bob Aiello with Dovid Aiello

In many organizations, Change Control is badly broken. Very often, change control only focuses on the calendar and fails to realize its true potential of ensuring that changes can be delivered as frequently as necessary in a secure and reliable way. In our consulting practice, we hear many complaints of boring two-hour meetings where nothing seems to actually get done. Change control is often perceived as being little more than a rubber stamp and, as one esteemed colleague famously claimed publicly, that the purpose of change control was to “prevent change”. We disagree and believe that change control can be a valuable function that helps identify and mitigate technical risk. That said, very few organizations have effective change control practices. Here’s how to fix change control in your company and realize the benefits of DevOps’s secret weapon.

We have previously written about the dysfunction that often resides in the operations organization. This dysfunction often involves change control. The poorly-managed change control function wastes time, but that is only the tip of the “dysfunctional” iceberg. Far more serious is the missed opportunity to identify and mitigate serious technical risks. This failure often results in incidents and problems – often at a loss to the company, both in terms of profitability, as well as reputation. The bigger problem is the missed opportunity to be able to roll out changes faster and thus enabling secure and reliable systems, not to mention, delivering business functionality. When change control fails, most senior managers declare that they are going to stop allowing changes – which is actually the worst thing that you can decide to do. Slowing down changes almost always means that you are going to bunch together many changes and allow change windows less frequently, such as on a bimonthly basis. When we help teams fix their change control, the first thing we push for is making more frequent changes, but keeping them to very tiny changes. The typical cadence that we find that works well is most often moving from bimonthly change windows to twice weekly – ideally during the week. There is something magical about moving from bimonthly to twice a week that often eliminates much of the noise and frustration.

One important approach is to identify which changes are routine and low-risk, categorizing them as “pre-approved” or standard changes. Changing a toner cartridge on a printer is a good example, as it is a task that has been done many times before. Communication that the printer will be down for this activity is important, but it does not require a discussion during the change control meeting. Standard changes should be, ideally, fully automated and if you are using the ITIL framework, listed in your [1] service catalogue. Getting all of the easy changes pre-approved means that your change control meeting can now focus on the changes which actually require some analysis of technical risk.

Normal changes follow your normal change control process. Emergency changes should only be used when there is a true emergency and not just someone trying to bypass the change control process. Occasionally, someone may miss the normal change control deadline and then you may need an “out-of-cycle” change that would have been a normal change had the person made the deadline. One effective way to ensure that folks are not just using emergency changes to bypass the change control process is to require that all emergency changes be reviewed by your highest-ranking IT manager – such as the CTO.

Another effective approach is to distinguish between the change control board (CCB) and the change advisory board (CAB). Frankly, this has been an area of confusion for many organizations. The CCB is responsible for the change control process. The change advisory board should be comprised of sharp subject matter experts who can tell you the potential impact of making (or not making) a particular change. Make sure that you ask them who else might be impacted and should be brought into the discussion. We have seen many organizations, unfortunately, rename their CCB to CAB (cool name it is) and in doing so, lose the input from the change advisory folks. Keep your CCB separate from your CAB. The CCB handles the process – while the CAB advises on the technical risk of making (or perhaps not making) a particular change.

In reviewing each change, make sure that the change is described clearly and in sufficient detail to understand each step. We see many change requests that are just high-level descriptions which can be open to interpretation by the person making the changes and consequently result in human errors that lead to incidents and problems.

Testing, as well as verification and validation (V&V), criteria should always be specified. By testing, we refer to continuous testing beginning with unit testing and extending into other forms of testing, including regression, functional/nonfunctional, and integration testing. (We are huge fans of API and service virtualization testing, but that is the subject of another article.) Verification usually refers to whether or not the change meets the requirements and validation ensures that the system does what it needs to do. Some folks refer to fitness for warranty and fitness for use. If you want effective DevOps you must have robust continuous testing and the change control process is the right toll gate to ensure that testing has been implemented and especially automated. We’d be remiss if we did not mention the importance of asking about backout plans. In short, always have a plan B.

Change control done well is indeed the DevOps’ secret weapon. Making changes more often should be your goal and keeping those changes as tiny and isolated as possible will help to reduce the risk of making changes. We like to have everyone share a screen and have that DevOps cross-functional team ensure that every change is executed successfully. Every change must be automated. If this is not possible, then make sure that you have a 4-eyes policy where one person makes the change and another person observes and verifies that the manual step has been completed successfully. Always record the session – and allow others to see what you’re doing and then review the recordings to identify areas where you can improve your deployment processes.

The best organizations have processes which are transparent and allow others to learn and help continuously improve the deployment process. Change control can help you get to a place where you can safely make changes as often as you need to, helping to deliver secure and reliable systems.

What change control practices do you believe are most effective? Drop us a line and share your best practices!



[1] The service catalogue is an automated set of jobs that perform routine “low risk” tasks such as taking backups and changing toner cartridges.
Since the request is in the service catalogue, the change may be designated as being “standard” (pre-approved) and then there is no need to perform risk assessment in change control.

DevOps for Hadoop


DevOps for Hadoop
By Bob Aiello

Apache Hadoop is a framework that enables the analysis of large datasets (or what some folks are calling “Big Data”), using clusters of computers or cloud-based “elastic” resources offered by most cloud-based providers. Hadoop is designed to scale up from a single server to thousands of machines, allowing you to start with a simple installation and scale to a huge implementation capable of processing petabytes of structured (or even unstructured) complex data. Hadoop has many moving parts and is both elegant in its simplicity and challenging in its potential complexity. This article explores some of the essential concerns that the DevOps practitioner needs to consider in automating the systems and applications build, package and deployment of components in the Hadoop framework.

For this article, I read several books [1][2] and implemented a Hadoop sandbox using both Hortonworks as well as Cloudera. In some ways, I found the technology to be very familiar as I have supported very large Java-based trading systems, which were deployed in web containers from Tomcat to Websphere. At the same time, I was humbled by the sheer number of components and dependencies that must be understood by the Hadoop administrator as well as the DevOps practitioner. In short, I am kid in a candy store with Hadoop, come join me as we sample everything from the lollipops to the chocolates 🙂

To begin learning Hadoop you can set up a single node cluster. This configuration is adequate for simple queries and more than functional for learning about Hadoop and then scaling to a cluster setup when you are ready to implement your production environment. I found the Hortonworks sandbox to be particularly easy to implement using virtual box (although almost everything that I do these days is on docker containers). Both Cloudera Manager and Apache Ambari are administration consoles that help to deploy and manage the hadoop framework.

I used Ambari which has features that help provision, manage, and monitor Hadoop clusters, including supporting the Hadoop Distributed File System (HDFS), Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. I used the Ambari dashboard which helps to view cluster health (including heatmaps) and also provides the ability to view MapReduce, Pig and Hive applications including performance and other resources. HDFS can also be supported on Amazon Simple Storage Service (S3) buckets, Azure blobs and OpenStack Object Storage (Swift) .

One of the most interesting things about Hadoop is that you can use “commodity hardware” which does not necessarily mean cheap, but does mean that you use the servers that you are able to obtain and support and they do not all have to be the exact same type of machine. Obviously, the DevOps practitioner will need to pay attention to the requirements for provisioning, and especially supporting, the many required configuration files. This is where configuration tools including Chef, Puppet, CFEngine, Bcfg2 and SaltStack can be especially helpful, although I am finding myself migrating towards using Ansible for configuration management as well as both infrastructure and application deployments.

Logfiles, as well as the many environment settings, need to be monitored. The administration console provides a wide array of alerts which can identify potential resource and operational issues before there is end-user impact. The Hadoop framework has daemon processes running, each of which require one or more specific open ports for communication.

Anyone who reads my books and articles knows that I emphasize processes and procedures which ensure that you can prove that you have the right code in production, detect unauthorized changes and have the code “self-heal”, through returning to a known baseline (obviously while adhering to change control and other regulatory requirements). Monitoring baselines in a complex java environment utilizing web containers including Tomcat, jBoss and WebSphere can be very difficult especially because there are so many files which are dynamically changing and therefore should not be monitored. Identifying the files which should be monitored (using cryptographic hashes including MAC SHA1 and MD5) can take some work and should be put in place from the very beginning of the development lifecycle in all environments from development test to production.

In fact getting Hadoop to work in development and QA testing environments does take some effort, giving you an opportunity to start working on your production deployment and monitoring procedures. The lessons learned while setting up the lower environments (e.g. development test) can help you begin building the automation to support your production environments. I had a little trouble getting the Cloudera framework to run successfully using docker containers, but I am confident that I will get this to work in the coming days. Ultimately, I would like to use docker containers to support Hadoop – mostly likely with Kubernetes for orchestration. You can also run Hadoop on hosted services such as AWS EMR or use AWS EC2 instances (which could get pretty expensive). Ultimately, you want to run a lean environment that has the capability of scaling to meet peek usage needs, but can also scale down when all of those expensive resources are not needed.

Hadoop is a pretty complex system with many components and as complicated as the DevOps requirements are, I cannot imagine how impossible it would be to manage a production Hadoop environment without DevOps principles and practices. I am interested in hearing your views on best practices around supporting Hadoop in production as well as other complex systems. Drop me a line to share your best practices!

Bob Aiello (

[1] Hadoop the Definitive Guide, by Tom White, O’Reilly Media; 4 edition, 2015
[2] Pro Apache Hadoop, by Jason Venner et al, Apress; 2nd ed. edition, 2014
[3] Aiello, Robert and Leslie Sachs. Configuration Management Best Practices: Practical Methods that Work in the Real World. Addison-Wesley, 2010
[4] Aiello, Robert and Leslie Sachs. Agile Application Lifecycle Management: Using DevOps to Drive Process Improvement, Addison-Wesley, 2016

The Magic of DevOps


The Magic of DevOps
By Bob Aiello with Dovid Aiello

Recently, a well-respected colleague of mine reacted to an article that I had written regarding the Equifax data breach and suggested that I had made it sound as if DevOps “magically” could solve problems. I was stunned at first when I saw his comments, because everything that I have written has always gone into specific details on how to implement DevOps and CM best practices including the core functions of source code management, build engineering, environment management, change control, and release and deployment engineering. At first, I responded to my colleague that he should get a copy of my book to see the detail in which I prescribe these principles and practices. My colleague reminded me that he not only had a copy of my CM best practices book, but had reviewed it as well – and as I recall it was a pretty positive review. So how then could he possibly believe that I viewed DevOps as magically solving anything? The more I pondered this incident, the more I realized that DevOps does indeed have some magic and the effective DevOps practitioner actually does have some tricks up his sleeve. So, unlike most magicians I am fine with sharing some of my magic and I hope that you will write back and share your best practices as well.

DevOps is a set of principles and practices intended to help improve communication and collaboration between teams, including development and operations, but equally important are other groups including quality assurance, testing, information security and of course the business user and the groups who help us by representing their interests. DevOps is all about sharing often conflicting ideas and the synergy we enjoy from this collaboration. At the core of DevOps systems and application delivery is a set of practices based upon configuration management, including source code management, build engineering, environment management, change control, and release and deployment engineering.

Source code management is fundamental and without the tools and processes you could easily lose your source code, not to mention, have a very difficult time tracking changes to code baselines. With robust version control systems and effective processes, you can enjoy traceability to know who made changes to the code – and back them out if necessary. When you have code in version control you can scan that code using a variety of tools and identify open source (and commercial) code components which may have one or more vulnerabilities as identified in CVEs and the VulnDB database. I have written automation to traverse version control repositories and scanned for licensing, security and operational risks – actually identifying specific code bases which had zero-day vulnerabilities much like the one which impacted Equifax recently. Build engineering is a fundamental part of this effort as the build process itself may bring in dependencies which may also have vulnerabilities. Scanning source code is a good start, but you get a much more accurate picture when you scan components which have been compiled and linked (depending upon the technology). Taking a strong DevOps approach means that your security professionals can work directly with your developers to identify security vulnerabilities and identify the best course of action to update the code. With the right release and deployment automation you can ensure that your changes are delivered to the production as quickly as necessary.

Environment management is the function which is most often forgotten and understanding your environment dependencies is an absolute must-have. Sadly, we often forget to capture this information during the development process and discovering it after the product has been deployed can be a slow and painful process. Similarly, change control should be your first line of defense for identifying and mitigating technical risk, but most companies simply view change as a “rubber stamp” which focuses mostly on avoiding calendar collisions. Change control done well can help you identify and mitigate technical risk, deliver changes more quickly and avoid costly mistakes so often the cause of major outages.

As news reports emerge claiming that Equifax actually knew that they had code which contained the Struts vulnerability, the focus should be on why the code was not updated. Sadly, many companies do not have sufficient automation and processes in place to be able to safely update their systems without introducing significant technical risk. I have known of companies who could not respond effectively to a disaster, because their deployment procedures were not fully automated and failing over to a DR site resulted in a week-long set of outages. Companies may “test” their DR procedures, but that does not guarantee that they can actually be effectively used in a real disaster. You need to be able to build your infrastructure (e.g. infrastructure as code) and deploy your applications without any chance of a step being missed which could result in a systems outage.

DevOps and CM best practices actually give you the specific capabilities required to identify security vulnerabilities and update your code as often as needed. The first step is to assess your current practices and identify specific steps to improve your processes and procedures. I would like to say that really there is no magic – just good process engineering, picking the right tools and of course rolling up your sleeves and automating each and every step. But maybe the truth is that there is some magic here. Taking a DevOps approach, sharing the different views between development, operations, security and other key stakeholders can make this all possible. Please drop me a line and share your challenges and best practice too. Between us, we can see magic happen as we implement DevOps and CM best practices!

Bob Aiello (


How to Maintain 143 Million Customers


How to Maintain 143 Million Customers
by Phil Galardi

So what recently happened to 143 Million Americans anyway? Well, you probably heard that it was a cyber security incident related to an open source software component called Apache Struts. What exactly is Apache Struts? Why was it so easily hacked? Could it be prevented using some common best practices? And, what can you do to protect your organization now and in the future?

Apache Struts is a common framework for developing Java web applications. It’s one of the most commonly used open source components, with plenty of community support. 634 commits in the last 12 months at the time of this blog, meaning that folks from all over the world are actively participating in efforts to fix bugs, add features/functions, and remediate vulnerabilities.

According to Lgtm, the folks who discovered the vulnerability that impacted nearly half of Americans, more than 65% of Fortune 100 companies are using Struts meaning 65% of the Fortune 100 could be exposed to remote attacks(similar to Equifax) if not fixed.

Initially, the suspect vulnerability was a zero-day (CVE-2017-9805), impacting the Struts framework since 2008. However, recent speculation is pointing to a more likely culprit (CVE-2017-5658) which was reported in March 2017. If the latter is the case, Equifax and any other organizations properly managing open source components would have had visibility into this issue and could have remediated it before the attack occurred. At this time, Equifax has not issued a public statement pinpointing the exploit.

The Apache Struts Project Management Committee lists 5 steps of advice to anyone utilizing Struts as well as all open source libraries. To paraphrase, these are:

  1. Know what is in your code by having a component bill of materials for each of your product versions.
  2. Keep open source components up to date and have a process in place to quickly roll out security fixes.
  3. Close the gap, your open source components are going to have security vulnerabilities if unchecked.
  4. Establish security layers in your applications, such that a public facing layer (such as Struts) should never allow access to back-end data.
  5. Monitor your code for zero-day vulnerability alerts. Again, back to #1. If you know what is in your code, you can monitor it. You can reduce incidence response time, and notify your customers quickly (or catch it before it’s too late).

Certainly, you can prevent Apache Struts vulnerabilities from ever making their way into your web applications by not using the component. However, based on metrics from Black Duck software for Struts we see that it would take an estimated 102 years of effort to build on your own. You probably won’t need every line of code. Yet even still, there are huge advantages to using open source software in your applications.

Best practices dictate identifying the open source components in your applications at the time of build and integrating into CI tools when possible. This provides you with an inventory or bill of materials for all the open source developers are using. You can further drive automation by monitoring those applications bill of materials and creating policies around what actually get’s built. For example, you could warn and notify developers that a particular component (OpenSSL 1.0.1 through 1.0.1f) is not acceptable to use if they build it and ultimately fail builds containing critical vulnerabilities.

What can you do now about this latest vulnerability? According to Mike Pittenger, VP of Security Strategy at Black Duck Software, if you don’t need the REST plug-in for Apache Struts, you can remove it. Otherwise, users are advised to update to versions 2.3.34 and 2.5.13 as soon as possible.

So back to keeping your customers happy? Protect their data, maintain the security of your applications, and don’t forget about open source components and applying best practices.

About the author:

Phil Galardi has over 15 years of experience in technology and engineering; 8 years as an application developer, 3 years in application lifecycle management and currently helping organizations improve, manage, and secure their SDLC. With experience spanning multiple vertical markets, Phil understands what is required to build secure software from each aspect of people, process, and technology.  While he loves coffee, he doesn’t get the same feelings of joy from completing expense reports


Personality Matters – Development in the Cloud


Personality Matters – Development in the Cloud
By Leslie Sachs

Developing software in the cloud involves working with people who are likely in a different location and employed by an entirely different company. These folks may have very different priorities than you do – and getting what you need may be quite a challenge at times. Development in the Cloud likely involves working in an environment where you do not always have full control of the resources you need. You may feel that you are the customer and deserve priority service. But the reality is that interfacing with all of the stakeholders in a cloud-based development environment presents unique challenges. Your ‘people skills’ may very well determine whether or not you get what you need when you need it. Read on if you would like to be more effective when developing in the Cloud.

Control of Resources

Development in the Cloud has a number of challenges, but none more apparent than the obvious loss of control over essential resources. Development in the cloud involves relying upon another entity and the services that they provide. This may turn out great for you, or it may be the worst decision of your career. One issue to address early on is how you feel about having to rely upon others for essential resources and its implicit loss of control. This situation may result in some stress and, at times, considerable anxiety, for technology managers who are responsible for the reliability of their companies’ systems.

Anxiety in the Cloud

Seasoned IT professionals know all too well that bad things happen. Systems can crash or have other serious outages that can threaten your profitability. When you have control over your resources, you usually have a stronger sense of security. With the loss of control, you may experience anxiety. As a manager, you need to assess both your, and upper management’s, tolerance for risk. Risk is not inherently bad. But risk needs to be identified and then mitigated as best as is practical. One way to do that is to establish a Service-level Agreement (SLA).

Setting the SLA

The prudent manager doing development in the cloud will examine closely the Service-level Agreements that govern the terms of the Cloud-based resources upon which that team depends. One may have to choose, however, between working with a large established service provider and a smaller company, willing to work harder for your business. This is where you need to be a savvy consumer and technology guru, too. If you’re thinking that ironing out all of these terms is going to be easy, then think again. The one thing that you can be certain about, though, is that communication is key.

Communication as Key

Make sure that you establish an effective communications plan to support your Cloud development effort, including announcing outages and service interruptions. [1] You should consider the established communications practices of your service provider within the context of the culture of your organization. Alignment of communication styles is essential here. Plan to not only receive communications, but to process, filter and then distribute essential information to all of your stakeholders. Remember, also, that even weekend outages may impact the productivity of your developers. The worst part is that you may not have a specific dedicated resource at the service provider with whom to partner.

Faceless and Nameless Partners

Many large Cloud-based providers have well-established service organizations, but you as a manager need to consider how you feel about working with partners who you do not know and may never actually meet. The faceless and nameless support person may be just fine for some people especially if they do a great job. But you need to consider how you will feel if you cannot reach a specific person in charge when there is a problem impacting your system. This may seem like a non-issue if you are the customer. Or is it?

Customer Focus

If you are paying a service provider then you will most likely be expecting to be treated as a customer. Some Internet Service Providers (ISPs) may have excellent service while others may act like they are a utility with an absolute monopoly. At CM Best Practices Consulting, we’ve had some experiences with ISPs who provided horrible service resulting in an unreliable platform supporting the website for our book on Configuration Management Best Practices. Poor service aside, there are certainly advantages to considering cloud services as a utility.

Cloud as a Utility

When you need more electricity, most of us just assume that the Electric Company will provide as much as we need. So the cloud as a utility certainly has some advantages. If you need to scale up and add another hundred developers, giving each one a virtual image on a server farm can be as easy as providing your credit card number. However, knowing that additional resources are there for the asking, does have its own special risk of failing to plan for the resources that you need. You still need to plan strategically.

Planning and Cost

Planning and cost can be as dangerous as running up bills on your credit card. In fact, they may actually be on your credit card. From a personality perspective, you should consider whether or not using Cloud-based services is just a convenient excuse to avoid having to plan out the amount of resources you really need. This approach can get expensive and ultimately impact the success of your project. Development in the cloud does not mean that you have to stay in the cloud. In fact, sometimes cloud-based development is just a short-term solution to respond to either a seasonal need or a temporary shortage. You should always consider whether or not it is time to get out of the clouds.

Bringing it Back In-house

Many Cloud-based development efforts are extremely successful. Others are not. Ultimately, smart technology professionals always consider their Plan-B. If you find that you are awake at night thinking about all of the time lost due to challenges in the Cloud, then you may want to consider bringing the development back in from the Cloud. Just keep in mind that every approach has its risks and you probably cannot implement a couple hundred development, test or production servers overnight, either. Many managers actually use a hybrid approach of having some servers in-house, supplemented by a virtual farm via a cloud-based service provider. Having your core servers in-house may be your long term goal anyway. Smart managers consider what works best from each of these perspectives.

Being pragmatic in the Cloud means that you engage in any technology effort while keeping both eyes open to the risks and potential advantages of each approach. Cloud-based development has some unique challenges and may not be the right choice for everyone. You need to consider how these issues fit with you and your organization when making the choice to develop in the cloud.

[1] Aiello, Robert and Leslie Sachs. Configuration Management Best Practices: Practical Methods that Work in the Real World. Addison-Wesley, 2010, p. 155.

How DevOps Could Have Saved Equifax


How DevOps Could Have Saved Equifax
by Bob Aiello

Equifax is the latest large firm to make unwanted headlines due to exposure of clients’ personal data; a reported 143 million people may have had their Social Security numbers, birth dates, credit card numbers and other personal information stolen. According to published accounts, the breach occurred through a vulnerability in the Apache Struts web framework which is used by many organizations. The incident was an embarrassment to a company whose entire business revolves around providing a clear, and presumably confidential, financial profile of consumers that lenders and other businesses use to make credit decisions.

Large organizations often have hundreds of major systems using thousands of commercial and open source components – each of which could potentially have a security vulnerability. The Apache organization issued a statement about the most recent incident. There were also many alerts issued about the potential risks in the Apache Struts framework, but large organizations which receive alerts via Common Vulnerabilities and Exposures (CVEs) and VulnDB may find it very difficult to identify exactly which software components are vulnerable to attack and be unable to quickly fix the problem and/or deploy the updated code that prevents hackers from exploiting known vulnerabilities.

So how best to handle these scenarios in large organizations?

The first step is to get all of your code stored and baselined in a secure version control system (VCS). Then you need to be able to scan the code using any of the products on the market which can identify vulnerabilities as reported in CVEs and the VulDB database. There are costs involved with implementing an automated solution, but the cost of not doing so could be far greater.

One approach could be to clone each and every repo in your version control system (e.g. bitbucket) and then programmatically scan the baselined source code identifying the projects which contain these vulnerabilities. You can get better results if you scan code that has been compiled, as the build process may pull in additional components. But even just scanning source code will help you get the conversation involving your security experts, operations engineers and the developers who wrote the code started. Suddenly, you can find that needle lost in your haystack pretty quickly and begin taking steps to update the software. Obviously, another key ingredient is having the capability to immediately roll out that fix through a fully-automated application build, package and deployment process, what many folks are referring to as continuous delivery.

Implementing these tools and processes does take some time and effort, but, as the Equifax data breach has painfully demonstrated, effective DevOps is clearly worth it.

What is your strategy for identifying security issues buried deep in a few hundred thousand lines of code? It is actually not that hard to fix this issue as long as you can work across development, operations and other stakeholders to implement effective CM best practices including:

Source Code Management
Build Engineering
Environment Management
Lean and Effective Change Control
Release & Deployment engineering

yeah – I am saying that you need DevOps today!

Bob Aiello


How DevOps can eliminate the risk of ransomware and cyber attacks


How DevOps can eliminate the risk of ransomware and cyber attacks
By Bob Aiello

Reports of global cyberattacks, said to have impacted more than 200,000 users in over seventy countries, have certainly garnered much attention lately. You might think that the global IT “sky” was falling and there is no possible way to protect yourself. That just isn’t true. The first thing that you need to understand is that any computer system can be hacked, including yours. In fact, you should assume that your systems will be hacked and you need to plan for how you will respond. Experts are certainly telling us all to refrain from clicking on suspicious attachments and to keep our Windows patches up-to-date. None of that advice is necessarily wrong, but it fails to address the real problem here. In order to properly avoid such devastating problems, you really need to understand the root cause. There certainly is plenty of blame to go around, starting with who made the malware tools used in the attack. There is widespread speculation that the tool used in the attack was stolen from the National Security Agency (NSA), which leads one to question whether those agencies in the business of securing our national infrastructure are really up to the job. This global cyberattack was felt by thousands of people around the world.

Hospitals across the UK were impacted, which, in turn, impacted medical care, even delaying surgical procedures. Other organizations hit were FedEx in the United States, the Spanish telecom company Telefónica, the French automaker Renault, and Deutsche Bahn, Germany’s federal railway system. I have supported many large organizations relying upon Windows servers, often running complex critical systems. Building and upgrading windows software can be very complex and that is the key consideration here. It is often not trivial to be able to rebuild a Windows machine and get to a place where you have software fully functioning as required. I have seen teams tinker with their windows build machines and actually get to a place where they simply could not build another windows machine with the same configuration. Part of the problem is that very few technology professionals really have an understanding of how Microsoft actually works, which really is the problem here. In DevOps, we need to be able to provision a server and get to a known state without having to resort to heroic efforts.

With infrastructure as code, we build and provision machines from scratch using a programmatic interface. Many use cloud-based resources which can be provisioned in minutes and taken down just as easily when no longer needed. Container-based deployments are actually taking this capability to the new level making infrastructure as code routine. From there, we can then establish the deployment pipeline and within a reasonable period of time, which enables us to deploy known baselines of code very quickly. Backing up our data is also essential and in practice, you may lose some transactions. If you are supporting a global trading system, then obviously there must be strategies to create transactions logs which can “replay” your buys and sells, restoring you to a known state.

What we need now is for corporate executives to understand the importance of having competent IT executives implement best practices capable of addressing these inevitable risks. We know how to do this. In the coming months, a group of dedicated IT professionals will be completing the first draft of an industry standard for DevOps, designed to cover many of these considerations. Let’s hope the folks running institutions from hospitals to retail chains take notice and actually commit to adopting DevOps best practices.