I have a fondness for philosophy. I’m about three classes short of a degree, and every few years I tell myself one day I’ll finish it. Thus, I am very familiar with what is known in statistics—and logic—as a post hoc fallacy, from which we get the saying “correlation is not causation.” This is the […]
Study: Real-Time Data Has A Transformative Effect
We all know that data is the heart of the modern digital business. And real-time data is arguably the next evolution—having up-to-date data can empower energy savings and optimizations to reduce cost. Real-time data processing also aids fraud detection and improves health care responsiveness. Especially in DevOps, access to real-time data is necessary to reduce […]
NewOps? AIOps? NoOps? Just Don’t Call Me Late for Dinner
Once again, we are seeing an uptick in the use of the term NoOps. I always felt that NoOps was a misleading and, in fact, empty phrase. I don’t care what magic you think you have; you are never going to eliminate the need to operate your software, applications and infrastructure. I first heard the […]
Everything is DevOps
It’s funny how we in IT tend to be absolutists, shoving everything into the current bucket. XML was going to eliminate programming—even though it wasn’t even particularly good at data representation. NoCode was going to eliminate programming (yeah, that’s a theme, I’ll skip the other 50 or so ‘going to end programming’ cycles we’ve been […]
The Rogers Outage of 2022: Takeaways for SREs
When, eight years from now, folks are creating lists of the top IT incidents of the 2020s, there’s a good chance that they’ll include the Rogers outage of 2022. The failure, which made internet and cellular network service unavailable for more than 12 million users across Canada, was one of the most significant outages in […]
5 Ways to Prevent an Outage
In today’s always-on, ever-connected world, we all expect 100% availability. What gets in the way of this? The devil is in the details. Over time, everything breaks: Disks, nodes, containers, networks, DNS servers and configuration issues can all lead to major outages. Amazon’s 13 minutes of downtime in August 2021 translated to almost $5 million […]
Simplifying DevOps and Infrastructure-as-Code (IaC)
Let’s face it, tools for infrastructure-as-code (IaC) configuration management may be vital, but they’re difficult to use. But what if it’s not so much the tools themselves as it is our understanding of them? What if we looked at provisioning and configuration systems differently and did so using vocabulary that was less confusing and more […]
Why More Incidents Are Better
Ask most SREs how many incidents they’d have to respond to in a perfect world, and their answer would probably be ‘zero.’ After all, making software and infrastructure so reliable that incidents never occur is the dream that SREs are theoretically chasing. Reducing the number of actual incidents as much as possible is a noble […]
5 Mean-Time Reliability Metrics To Follow
Most folks working in DevOps or SRE roles are familiar with metrics like mean-time-to-recovery (MTTR). Keeping track of the average time a team takes to respond to incidents is crucial to identifying bottlenecks in the support process. It’s also something executives like to show higher-ups when sharing a snapshot of overall platform performance. However, focusing […]
How to Adopt an SRE Practice (When You’re not Google)
Site reliability engineering (SRE) isn’t a new term or practice. The practice of applying software engineering skills and principles to operations problems and tasks happened even before site reliability engineer was a defined job title. But organizing a proactive approach to building and maintaining software drives long-term success in improving operational efficiency, data-driven roadmap planning […]
- « Previous Page
- 1
- …
- 7
- 8
- 9
- 10
- 11
- …
- 17
- Next Page »










