As Gartner’s Milind Govekar recently said, “There is no business strategy without a cloud strategy.” But in 2021, some of the world’s biggest brands suffered disruptions in their services due to outages from their cloud service providers, including Google, Slack, Venmo, Disney+, Facebook/Meta, Tinder, iRobot, The Washington Post and Sony. While the outages do not negate the […]
Dependability is Crucial for Real-Time Experiences
Millions of people around the world experienced a major disruption after an October 2021 outage saw Facebook, WhatsApp and Instagram go offline. A few days prior, the popular workplace collaboration platform, Slack, also went down for some users. And in the last few weeks of 2021, Amazon’s AWS caused considerable global chaos after significant networking […]
Unreliable Server Scare | Information Batteries | ARM IPO PDQ
In this week’s The Long View: We worry about chips failing randomly, we ponder a new way of thinking about workload shifting, and we grok Arm’s IPO.
Defining Availability, Maintainability and Reliability in SRE
In the world of reliability engineering, you’ll frequently encounter the three “-ability” words: Availability, maintainability and reliability. They sound similar and have similar meanings. In fact, these words may seem so similar that it can be tempting to use them interchangeably. That would be a mistake. Availability, maintainability and reliability all have distinct—if related—meanings, and […]
The Key Benefits of Observability for SREs
In today’s technology landscape, organizations strive to champion innovative ideas, techniques and technologies to achieve success and outshine their competitors. For this reason, site reliability engineering (SRE) has become one of the fastest-growing enterprise roles and a set of organizational practices for fast and reliable software delivery. SREs use various tools and practices at their […]
How Real-Time Debugging Improves Reliability
When designing and building software, service reliability is always at the top of the list of critical focus areas for development teams. Every team that builds software typically has, either directly or indirectly, service level agreements with their customers. These are, essentially, agreed-upon metrics or performance criteria that teams use to measure and ensure the […]
Ten Years Later — What Is Cloud Native?
Ever since the term cloud native made its debut in 2010, its usage has grown in popularity. Today cloud native is used as a qualifier for innovative, leading edge applications, platforms and systems or cloud-savvy organizations and processes. However, what is cloud native exactly? There are some definitions, there are many opinions, but first and […]
What It Really Takes to Build a Reliable App
What makes an app reliable? If you ask most IT professionals that question, their minds immediately go to uptime. That’s not surprising; after all, it’s fashionable for software and infrastructure vendors these days to boast about how many nines of uptime they guarantee. To be sure, uptime is one component of delivering a reliable application […]
Testing’s Role in Delivering Digital Transformation
“We have a quality philosophy now…but it took the rise of the middle child to make it happen.” Ann Lewis, ExxonMobil Ann Lewis’ metaphor of testers being the “middle child” between software development and IT operations was present throughout Accelerate 2017, Tricentis’ annual user conference. Nearly every head in the room nodded as Lewis described […]
In Determining DevOps issues, Sometimes It’s the Process
It is pretty much standard DevOps process that when a server instance starts having problems, you simply kill it and start another. This is in line with the idea that servers are cattle and there isn’t a ton of difference between them. But it creates a problem that no amount of CI/CD or automated provisioning […]










