The DevOps Scorecard

Mid-last year our team switched from doing Agile to doing DevOps. As we forayed into the journey trying to learn about DevOps and practice it at the same time, a lot of questions arose in the team. How was this different from agile and most importantly How were we going to be successful?

That’s when we wrote down what would be the success criteria for our team: “Ship code frequently without causing a customer outage“.

As the team matured we started evaluating a more granular way to track success. Could the team mantra be broken down into quantifiable success metrics that could be represented in a scorecard? Based on our experience the DevOps scorecard should contain these 9 metrics to track DevOps team success:

Deployment frequency: How often were we deploying code and getting new code in the hands of our customers? This metric should trend up or remain stable from week to week. Example: Twice a week, 50 times a day
Change volume: For each deployment how many user stories and new lines of code were we shipping? Example: 3 new features per day, Average 500 lines of new code per week. Another parameter to consider in addition to volume is complexity of change.
Lead Time (from Dev to Deploy): How long does it take on an average to get the code from development complete through a cycle of A/B testing to 100% deploy and upgrade on production? Lead time should reduce as the team gets a better hold of the lifecycle.
Percentage of failed deployments: What percentage of deployments failed causing an outage or a negative user reaction? This metric should decrease over time. Example: 9% deployments failed this month as opposed to 15% last month. This metric should be reviewed in combination with the change volume. If the change volume is low or remained the same but the percent of failed deployments increased, then there maybe a disfunction somewhere.
Mean time to recovery: When we did fail, how long did it take us to recover? This is a true indicator of how good we are getting with handling change and this should ideally reduce over time. You can expect some spikes in this number due to complex issues not encountered before. Example: On an average it took the team 15 minutes to resolve each last week, 14 minutes this week.
Customer Ticket Volume: Number of alerts generated by customers to indicate issues in the service. This is a basic indicator of customer satisfaction. Example: 54 tickets were generated this week as opposed to 38 while the user volume remained steady is not a good thing
% Change in User Volume: Number of new users signing up, interacting with my service and generating traffic. As new users sign up is my infrastructure able to handle the demand? Example: This week the number of customers spiked by 30% due to an external event causing volume of requests to go up
Availability: What is the overall uptime for my service and did I violate any SLAs? Example: 99.9% uptime consistently for the last 3 months even with change in user volume
Performance (Response Time): Is my service performing within my predetermined thresholds? This metric should remain stable irrespective of % change in user volume or any new deployment. Example: Sub 5 second response time from all geographies and devices

Would love to hear thoughts on what other critical metrics DevOps teams are using

payal_chakravarty

Comments

Bertrand says

November 13, 2014 at 5:11 am

Hi,

One that I find quite practical and critical as well to offer a good feedback of the overall quality of the deliveries is the amount a new incidents or bugs reported by testers and categorize them by environment (assembly, pre-prod., prod.)

That also allows us to usually get an overview on what we are doing right and where to enhance tests and change people’s focus. Typically, if we see more bugs in pre-prod. than in QA, that means we do have a serious testing problem in that specific environment (wrong config, wrong set of tests, environment failure, lack of competency…).
it’s usually a lot more useful in the beginning of a project but it’s always good to have.

Regards,

B.

Log in to Reply
Douglas da Silva Lima says

October 31, 2015 at 7:48 pm

Hi,
I am very interested in DevOps Performance indicators, but I have no idea how to collect them, may I use a automated tool or what process can I use.

Log in to Reply
- Payal Chakravarty says
  
  April 20, 2016 at 3:00 am
  
  We wrote scripts to track the metrics by pulling data from Jenkins, Chef, Rational , Github, APM tool and ticket management.
  
  Log in to Reply
William Drew says

April 4, 2016 at 3:55 pm

I’m confused by the first sentence. “Mid-last year our team switched from doing Agile to doing DevOps.” Can’t I use them together? You would think that they would complement each other.

Log in to Reply
- LWWZ says
  
  June 23, 2016 at 9:05 am
  
  Yes, See my comment above.
  
  Log in to Reply
LWWZ says

June 23, 2016 at 9:03 am

The premise of this article is disheartening. You don’t “switch” from doing agile to doing DevOps. DevOps is an Engineering culture of mutual understanding, shared business objectives, communication, collaboration and empathy that enables the business to be more effective which “sometimes” has the side affect of resulting in continuous deployment, continuous integration and automation.

DevOps can happen in any development methodology whether it’s waterfall, agile, scrum or kanban.

Log in to Reply
- Payal Chakravarty says
  
  September 15, 2016 at 6:37 pm
  
  The reason I said switched from doing Agile to doing DevOps is that when we were practicing “Agile” we were shipping very 3 – 6 months – there were still silos between Dev and Ops. As we adopted the DevOps culture we were able to speed up our delivery, shipping code as often as multiple times a week or day.
  
  Log in to Reply

Comments

Trackbacks

Leave a Reply Cancel reply

Sign up for our newsletter!Stay informed on the latest DevOps news

Comments

Trackbacks

Leave a Reply Cancel reply

Sign up for our newsletter!
Stay informed on the latest DevOps news