We recently caught up with Gene Kim (@realgenekim) as he was wrapping up the final touches of a draft of the upcoming DevOps Cookbook. We wanted to get his thoughts on what causes IT deployment failure, the DevOp movement, and finally what separates high performing IT organizations from low performing ones. If you’re following DevOps, no doubt you are familiar with Gene. In case you are not, Gene is an award winning CTO, researcher and author. He founded the security firm Tripwire, and was the CTO there for 13 years. To date, he’s written three books, including The Visible Ops Handbook and The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win. Gene is a huge fan of IT operations, and how it can enable developers to maximize throughput of features from “code complete” to “in production,” without causing chaos and disruption to the IT environment. Now, let’s jump into our discussion.
staging-devopsy.kinsta.cloud: When organizations decide they want to move to DevOps, or continuous delivery, what is a barrier that they immediately run into? And is DevOps something all types of companies can embrace?
Gene Kim: In my mind it’s the unicorns (leading web app companies) that are sort of showing the way out. Just to show why I’m so interested in this, you know I’ve been studying high performing IT organizations since 1999. What I’ve noticed is that there is this downward spiral that happens in almost every IT organization. It led to more fragile application in production, longer deployment times, building up of technical debts, and the business goes slower and slower. I think one of the reasons that I cared so much about is that it led to preordained failure. Where people felt, especially downstream (operations, test, security) trapped in a system where we were powerless to change the outcomes. No matter what we did, failure has been preordained. I think the way out of the downward spiral is what organizations like Google, Amazon, Twitter, LinkedIn, GitHub, are doing. These organizations are doing tens, hundreds, or even thousands of deploys a day. In a world where most of us are stuck with one deploy every nine months with catastrophic outcomes. I think that the first objection is that we’re not an “Amazon,” we’re not a “Google,’ we’re a bank, we’re telco, we’re a retailer. That it doesn’t apply to them. However, it turns out if you look at who’s adopting DevOps it’s Nordstrom, Walmart, General Motors, SAP. One of the statistics that I saw was, that Google has 15,000 engineers, or developers. HSBC, the European bank, has 15,000 developers. I think every company is an IT company regardless of what business we think we are in. I think it just shows that anything we try to get down relies on IT. So for DevOps not to be a core competency is absurd.
staging-devopsy.kinsta.cloud: And even increasingly software has to become a core competency, as more devices become API-enabled? All companies are software companies now.
Gene Kim: I love that Jeff Immelt, CEO of GE, quote: “Our biggest fear is being disintermediated by software. So I’m going back to school on software and big data.” I imagine that they run into that same objection internally. “Hey, we’re not Netflix, we’re not Amazon.” Right? They are trying to sell the idea of moving to DevOps or becoming more agile. Anyone can do it and benefit from it. Even in general conversations that was the first objection that always came up. Well that’s good for so and so, but we’re not so and so. I think that makes it a belief issue. So that’s what we’re talking about here, right? It can’t work here. I think it a serious of problems and it’s very predictable what the bottlenecks or obstacles are. The first one is that it takes too long to do the deployments, actually you can go a step before that. One of the biggest problems is that operations is always in the way. Whenever a dev or test wants a test environment it’s not uncommon in typical enterprises, that it takes 40 weeks to get one. Back in my Tripwire days, I use to think that developers didn’t like testing the code, because that’s what I saw. The truth is even worse than that; the developers really want to test their code. Their problem is they can never get an environment when they want one or they need one. So the survival mechanism kicks in, they’ll always go to the guy they know that’s really good at getting an environment, Their Corporal Klinger or Nash. They don’t want to ask any questions about where it came from, because you know someone is missing a server somewhere. This person probably just stole an environment from another project. With virtualization and infrastructure code there’s no almost no excuse not to be able to create environments available on demand. One of the things that we did, with Puppet Labs, is we reached out to me to help design their DevOps benchmark. This was awesome, when I found out about it I jumped out of my chair because, I saw this as a logical continuation of ten years of benchmarking we had done.
staging-devopsy.kinsta.cloud: What have you learned that separates high-performers and makes DevOps possible?
Gene Kim: There were three findings about it that blew me away. The first one is the high performers were massively outperforming their non-high performing peers. Thirty times more frequent code deployments. They were able to complete the deployments from code committed to running a production 8,000 times faster. In other words they could do a deployment measured in minutes or hours. Whereas low performers it took them weeks, months, or quarters. Furthermore, they actually had better outcomes. Twice the change success rate and when they caused outages they could fix it 12 times faster. There were 4,200 that completely responded and it just showed that you could be more agile and reliable at the same time. I think if we just connect some dots, they’re also going to be more secure, they’re probably going to have a faster find fix cycle time. We didn’t actually test that but, if that were true it would mirror what we found in the previous benchmarking work I had done. So that’s surprise number one the extent that they were out performing their non-high performing peers. Second surprise, we found two behaviors that were present in every high performer but they were universally absent in every non-high performer. The first one was that they are they checking in all their production artifacts and version control. Meaning, they could reproduce a production environment based off of what was in version control. Versus only it being an indescribable asset that only one person understood, if they knew about it at all. The second one was is there an automated code deployment process? Which includes being able to create environments available on demand. The next one is the code deployment process. I think it’s very typical in most enterprises that whenever you want to do a new release or deploy it takes days or weeks to complete. Involving tons of hand-offs from the developers, DBA, release team, back to the developers to the security people, and the firewall change and so forth. All that creates delays and is error prone and so forth. In the high performer there usually a one click to deploy or an automatic deploy. If you look at Google, Amazon, Twitter, Facebook, in general they have a continuous deployment process so that when a developer checks in code it automatically runs all the tests. Then it puts it into an integration test environment and runs all the tests there. There might be some manual work, but then in general it automatically migrates to production. Clearly, that’s what’s required to do multiple hundred of deploys a day.
staging-devopsy.kinsta.cloud: That is incredible to learn, Gene. A lot of fascinating findings. Thanks so much for your time.
Gene Kim: Thank you, George. Always a pleasure speaking with you.