SRE - Tagged - Page 11 of 17 - staging-devopsy.kinsta.cloud

What Does AIOps Mean for SREs?

March 15, 2022 by JJ Tang Leave a Comment

It seems SREs are of two minds when it comes to AIOps. On one hand, AIOps’ potential is pretty exciting. By automating complex workflows and troubleshooting processes, AIOps could make your life as an SRE much easier. But on the other hand, some SREs may choose to view AIOps with disdain and distrust. They might […]

Analyzing SRE Job Postings

February 17, 2022 by JP Cheung Leave a Comment

You can find plenty of high-level definitions out on the internet about what site reliability engineering means and what site reliability engineers do. But if you want to understand what it’s actually like to work as a site reliability engineer, there is perhaps no better source than job descriptions. SRE job ads explain what real […]

The Evolution of Incident Management

February 8, 2022 by JJ Tang Leave a Comment

Have you ever thought about the history of incident management? If you’re an SRE, you might be so caught up in the day-to-day work of managing reliability and responding to incidents that you never take time to step back and reflect on the evolution of your role and your responsibilities. And that’s a shame because […]

Site Reliability Engineering (SRE) Comes of Age in 2022

January 25, 2022 by Bill Doerrfeld Leave a Comment

The site reliability engineer (SRE) role is still gathering steam across organizations. In January 2022, LinkedIn listed SRE as the 21st job with the highest global demand throughout the past five years. That’s pretty high for such a specific tech role. And, looking to the future, it appears the SRE practice will only continue to […]

CNCF Takes LitmusChaos Platform to the Incubation Level

January 11, 2022 by Mike Vizard Leave a Comment

The technical oversight committee (TOC) for the Cloud Native Computing Foundation (CNCF) announced today it is elevating the open source LitmusChaos application testing platform to the incubation level. LitmusChaos is a chaos engineering platform donated to the CNCF by ChaosNative in 2020. Since then, it has seen adoption within production environments by more than 25 […]

Best of 2021 – Hiring When You’re Not FAANG

December 29, 2021 by Don Macvittie Leave a Comment

As we close out 2021, we at staging-devopsy.kinsta.cloud wanted to highlight the most popular articles of the year. Following is the twentieth in our series of the Best of 2021. Most of us are not FAANG, and we shouldn’t act like we are. Instead of using that, I’m going to use the term we used […]

Prioritizing Scalability, Reliability and Security in Engineering

November 12, 2021 by Cristina Buenahora Bustamante Leave a Comment

As digital products and services are more deeply embedded in critical industries and infrastructure and the implications of problems grow in scope, engineering organizations are renewing their focus on building platforms that are scalable, reliable and secure. More importantly, they are recognizing that this isn’t the responsibility of an SRE or CSO, it’s an ethos […]

Defining Availability, Maintainability and Reliability in SRE

October 25, 2021 by JJ Tang Leave a Comment

In the world of reliability engineering, you’ll frequently encounter the three “-ability” words: Availability, maintainability and reliability. They sound similar and have similar meanings. In fact, these words may seem so similar that it can be tempting to use them interchangeably. That would be a mistake. Availability, maintainability and reliability all have distinct—if related—meanings, and […]

Say Goodbye to Late-Night SRE Wake-Up Calls

October 20, 2021 by Pradeep Padala Leave a Comment

Thousands of SREs, on-call engineers and DevOps pros all over the world dread nothing more than the late-night incident alert. The pager buzzing at 2:00 a.m. can cause panic for SREs and leave IT and DevOps teams with quite a mess to contain. But incident response doesn’t have to cause panic if you have the […]

Why Observability and Monitoring Matter to SREs

October 18, 2021 by JJ Tang Leave a Comment

Site reliability engineers, or SREs, do many things. They help developers build reliability into applications. They manage SLAs and SLOs. They play a leading role in incident management and incident response. For all of these tasks, SREs draw heavily on observability and monitoring. Although other parts of the IT organization also typically help to manage […]

Sign up for our newsletter!Stay informed on the latest DevOps news

Sign up for our newsletter!
Stay informed on the latest DevOps news