In my past experience as an SRE, I learned some valuable lessons about how to respond to and learn from incidents. If you want the TL;DR, I’ll summarize them here: Declare and run retros for the small incidents. It’s less stressful, and action items become much more actionable. Decrease the time it takes to analyze an […]
Choosing an Incident Management Platform
When you’re feeling the stress and pain of manually managing incidents and incident response, making the decision to find an incident management tool is a no-brainer. But how do you choose the one that will work best for you, your team and your business? You might be asking yourself, “Where do I start? What do […]
Why You Should Embrace Incidents and Ditch MTTR
The cliché is that everyone in IT hates incidents, and the natural reaction when assembling incident response metrics is to look for numbers that you can lower over time. Fewer incidents and shorter incident response times must be better, we think. You might already be familiar with the common metrics associated with these goals, including […]
Best Practices for Cloud Incident Response
Cloud computing is now mainstream, with almost all organizations running at least some resources in the public cloud—whether software-as-a-service (SaaS), platform-as-a-service (PaaS) or infrastructure-as-a-service (IaaS). Security teams have been scrambling to adapt to cloud environments, and with the growing adoption of DevSecOps, they are working together with DevOps teams to secure cloud systems from the […]
Why AIOps Is Critical for Pandemic Business Recovery
As businesses worked to stay afloat over the last year, innovation in many areas fell behind. Business leaders struggled to understand the short-term and long-term impact the COVID-19 pandemic would have on their business, while employees worried about losing their jobs and customers scrambled to rearrange budgets. As we near the end and can finally […]
The Single-Sentence Postmortem
Do you have four seconds? In the time it takes you to read this sentence you could be done with your incident’s postmortem. “But my postmortem has 8 sections! I have to construct a critical timeline, restate my followup tasks, show the major milestones, evaluate root cause…” Stop right there. First of all, your incident […]
Report: The State of DevOps Automation
In the race to accelerate digital transformation initiatives, organizations are encountering more incidents, more downtime, and longer resolution times. In fact, 90.4% of organizations saw an increase in incidents since the pandemic began, according to a recent Transposit report. In a completely digital economy, service downtime takes a higher toll. For ITOps teams, working remotely […]
How to Eliminate Incident Inefficiencies
In today’s complex, dynamic IT environments, the proliferation of disparate IT Ops, NOC, DevOps and SRE teams and tools is a given – and usually considered a necessity. This leads to the inevitable truth that when an incident happens, often the biggest challenge is collaborating between these teams to understand what happened and resolve the […]
XDR: The DevOps Transformation of Security Infrastructure
eXtended detection and response (XDR) is a security technology that unites multiple security systems into one. Organizations are transitioning from traditional systems such as endpoint detection and response (EDR) and security and information event management (SIEM) to XDR, in a move that is analogous to the transition from agile to DevOps work processes. XDR can […]
Leading Effective Incident Response Without Interminable Bridge Calls
There are easier ways to manage incident response without creating war rooms and packing IT staff onto bridge calls Your phone vibrates at 11 p.m., and you know that can only mean another major incident with one of the business’ critical systems. You get geared up for the war room, dial into the bridge call […]
- « Previous Page
- 1
- …
- 3
- 4
- 5
- 6
- 7
- Next Page »









