It seems SREs are of two minds when it comes to AIOps. On one hand, AIOps’ potential is pretty exciting. By automating complex workflows and troubleshooting processes, AIOps could make your life as an SRE much easier. But on the other hand, some SREs may choose to view AIOps with disdain and distrust. They might […]
Analyzing SRE Job Postings
You can find plenty of high-level definitions out on the internet about what site reliability engineering means and what site reliability engineers do. But if you want to understand what it’s actually like to work as a site reliability engineer, there is perhaps no better source than job descriptions. SRE job ads explain what real […]
The Evolution of Incident Management
Have you ever thought about the history of incident management? If you’re an SRE, you might be so caught up in the day-to-day work of managing reliability and responding to incidents that you never take time to step back and reflect on the evolution of your role and your responsibilities. And that’s a shame because […]
Site Reliability Engineering (SRE) Comes of Age in 2022
The site reliability engineer (SRE) role is still gathering steam across organizations. In January 2022, LinkedIn listed SRE as the 21st job with the highest global demand throughout the past five years. That’s pretty high for such a specific tech role. And, looking to the future, it appears the SRE practice will only continue to […]
CNCF Takes LitmusChaos Platform to the Incubation Level
The technical oversight committee (TOC) for the Cloud Native Computing Foundation (CNCF) announced today it is elevating the open source LitmusChaos application testing platform to the incubation level. LitmusChaos is a chaos engineering platform donated to the CNCF by ChaosNative in 2020. Since then, it has seen adoption within production environments by more than 25 […]
Best of 2021 – Hiring When You’re Not FAANG
As we close out 2021, we at staging-devopsy.kinsta.cloud wanted to highlight the most popular articles of the year. Following is the twentieth in our series of the Best of 2021. Most of us are not FAANG, and we shouldn’t act like we are. Instead of using that, I’m going to use the term we used […]
Prioritizing Scalability, Reliability and Security in Engineering
As digital products and services are more deeply embedded in critical industries and infrastructure and the implications of problems grow in scope, engineering organizations are renewing their focus on building platforms that are scalable, reliable and secure. More importantly, they are recognizing that this isn’t the responsibility of an SRE or CSO, it’s an ethos […]
Defining Availability, Maintainability and Reliability in SRE
In the world of reliability engineering, you’ll frequently encounter the three “-ability” words: Availability, maintainability and reliability. They sound similar and have similar meanings. In fact, these words may seem so similar that it can be tempting to use them interchangeably. That would be a mistake. Availability, maintainability and reliability all have distinct—if related—meanings, and […]
Say Goodbye to Late-Night SRE Wake-Up Calls
Thousands of SREs, on-call engineers and DevOps pros all over the world dread nothing more than the late-night incident alert. The pager buzzing at 2:00 a.m. can cause panic for SREs and leave IT and DevOps teams with quite a mess to contain. But incident response doesn’t have to cause panic if you have the […]
Why Observability and Monitoring Matter to SREs
Site reliability engineers, or SREs, do many things. They help developers build reliability into applications. They manage SLAs and SLOs. They play a leading role in incident management and incident response. For all of these tasks, SREs draw heavily on observability and monitoring. Although other parts of the IT organization also typically help to manage […]
- « Previous Page
- 1
- …
- 9
- 10
- 11
- 12
- 13
- …
- 17
- Next Page »









