SRE - Tagged - Page 9 of 17 - staging-devopsy.kinsta.cloud

The Curious Connection Between Cloud Repatriation and SRE Ops

September 13, 2022 by Lori MacVittie Leave a Comment

I have a fondness for philosophy. I’m about three classes short of a degree, and every few years I tell myself one day I’ll finish it. Thus, I am very familiar with what is known in statistics—and logic—as a post hoc fallacy, from which we get the saying “correlation is not causation.” This is the […]

Study: Real-Time Data Has A Transformative Effect

September 6, 2022 by Bill Doerrfeld Leave a Comment

We all know that data is the heart of the modern digital business. And real-time data is arguably the next evolution—having up-to-date data can empower energy savings and optimizations to reduce cost. Real-time data processing also aids fraud detection and improves health care responsiveness. Especially in DevOps, access to real-time data is necessary to reduce […]

NewOps? AIOps? NoOps? Just Don’t Call Me Late for Dinner

August 25, 2022 by Alan Shimel Leave a Comment

Once again, we are seeing an uptick in the use of the term NoOps. I always felt that NoOps was a misleading and, in fact, empty phrase. I don’t care what magic you think you have; you are never going to eliminate the need to operate your software, applications and infrastructure. I first heard the […]

Everything is DevOps

August 24, 2022 by Don Macvittie Leave a Comment

It’s funny how we in IT tend to be absolutists, shoving everything into the current bucket. XML was going to eliminate programming—even though it wasn’t even particularly good at data representation. NoCode was going to eliminate programming (yeah, that’s a theme, I’ll skip the other 50 or so ‘going to end programming’ cycles we’ve been […]

The Rogers Outage of 2022: Takeaways for SREs

August 15, 2022 by JP Cheung Leave a Comment

When, eight years from now, folks are creating lists of the top IT incidents of the 2020s, there’s a good chance that they’ll include the Rogers outage of 2022. The failure, which made internet and cellular network service unavailable for more than 12 million users across Canada, was one of the most significant outages in […]

5 Ways to Prevent an Outage

August 15, 2022 by Ashley Stirrup Leave a Comment

In today’s always-on, ever-connected world, we all expect 100% availability. What gets in the way of this? The devil is in the details. Over time, everything breaks: Disks, nodes, containers, networks, DNS servers and configuration issues can all lead to major outages. Amazon’s 13 minutes of downtime in August 2021 translated to almost $5 million […]

Simplifying DevOps and Infrastructure-as-Code (IaC)

July 25, 2022 by Evgeny Zislis Leave a Comment

Let’s face it, tools for infrastructure-as-code (IaC) configuration management may be vital, but they’re difficult to use. But what if it’s not so much the tools themselves as it is our understanding of them? What if we looked at provisioning and configuration systems differently and did so using vocabulary that was less confusing and more […]

Why More Incidents Are Better

July 15, 2022 by Andre King Leave a Comment

Ask most SREs how many incidents they’d have to respond to in a perfect world, and their answer would probably be ‘zero.’ After all, making software and infrastructure so reliable that incidents never occur is the dream that SREs are theoretically chasing. Reducing the number of actual incidents as much as possible is a noble […]

5 Mean-Time Reliability Metrics To Follow

July 7, 2022 by Bill Doerrfeld Leave a Comment

Most folks working in DevOps or SRE roles are familiar with metrics like mean-time-to-recovery (MTTR). Keeping track of the average time a team takes to respond to incidents is crucial to identifying bottlenecks in the support process. It’s also something executives like to show higher-ups when sharing a snapshot of overall platform performance. However, focusing […]

How to Adopt an SRE Practice (When You’re not Google)

June 14, 2022 by Jemiah Sius Leave a Comment

Site reliability engineering (SRE) isn’t a new term or practice. The practice of applying software engineering skills and principles to operations problems and tasks happened even before site reliability engineer was a defined job title. But organizing a proactive approach to building and maintaining software drives long-term success in improving operational efficiency, data-driven roadmap planning […]

« Previous Page
1
…
7
8
9
10
11
…
17
Next Page »

Sign up for our newsletter!Stay informed on the latest DevOps news

Sign up for our newsletter!
Stay informed on the latest DevOps news