Traditionally, a security operations center (SOC) is a physical facility where an organization performs information security activities. The SOC team analyzes and monitors the organization’s security systems. A SOC aims to protect businesses from security breaches by identifying, analyzing and responding to cybersecurity threats. The SOC team consists of administrators, security analysts and security engineers. […]
Embrace Change in Production to Improve MTTR
DevOps creates value through constant deployment changes. When done right, more frequent deployments allow teams to continuously improve their products or services while making it easier to manage the effects of the change. One way to do this is with continuous integration and continuous deployment (CI/CD). CI/CD helps to reduce the risks to each build […]
Kintaba Creates the First Conference Dedicated to Incident Response – Techstrong TV
John Egan, Co-founder and CEO at Kintaba, talks about IRConf, the first conference fully dedicated to Incident Response. The company behind the conference is Kintaba — a modern incident management platform for the entire organization. The video is below followed by a transcript of the conversation. Alan Shimel: Hey, everyone. Welcome to Techstrong TV. We […]
The Rogers Outage of 2022: Takeaways for SREs
When, eight years from now, folks are creating lists of the top IT incidents of the 2020s, there’s a good chance that they’ll include the Rogers outage of 2022. The failure, which made internet and cellular network service unavailable for more than 12 million users across Canada, was one of the most significant outages in […]
Why More Incidents Are Better
Ask most SREs how many incidents they’d have to respond to in a perfect world, and their answer would probably be ‘zero.’ After all, making software and infrastructure so reliable that incidents never occur is the dream that SREs are theoretically chasing. Reducing the number of actual incidents as much as possible is a noble […]
Improving Observability With ML-Enabled Anomaly Detection
Nowadays, DevOps and SRE teams have many tools to access and analyze logging data. However, there are two challenges that prevent these teams from resolving issues in a timely manner: They aren’t equipped with all the data they need Detecting and resolving issues is reactive and manual In this article, I’m going to break down […]
5 Tips If You’re the First SRE Hire
Site reliability engineers (SREs) have a considerable set of tasks to juggle no matter where they work or how long their company has had an SRE practice. But if you’re the very first one to join an organization—as many SREs are these days, given that the trend is trickling down into smaller and smaller companies—you […]
SRE Vs. DevOps: The Wrong Question?
The age-old question about the competition between DevOps and SRE sets up a false dichotomy. DevOps is a methodology while SRE is a team within operations. Although the two are often pitted against one another, developers also embody the skills and the capabilities to implement fixes. Traditionally, application changes were reactive. The process of SRE […]
Prioritizing Scalability, Reliability and Security in Engineering
As digital products and services are more deeply embedded in critical industries and infrastructure and the implications of problems grow in scope, engineering organizations are renewing their focus on building platforms that are scalable, reliable and secure. More importantly, they are recognizing that this isn’t the responsibility of an SRE or CSO, it’s an ethos […]
Verica Launches Database for Sharing IT Incident Reports
Verica, a provider of an incident management platform, announced this week it is making available a database to make incident reports publicly available via a single repository. Courtney Nash, senior research analyst at Verica, said the Verica Open Incident Database (VOID) initiative is part of an effort to enable organizations to learn from outages and […]










