If you’re a software engineer, you’ve likely heard all about shift left, a practice that can streamline certain aspects of software development. But shift left isn’t just for developers. It can be equally valuable for site reliability engineers (SREs). Although the main mission of SREs is ensuring the reliability of software after it has been […]
SRE’s Guide to Pragmatic Incident Response
In my past experience as an SRE, I learned some valuable lessons about how to respond to and learn from incidents. If you want the TL;DR, I’ll summarize them here: Declare and run retros for the small incidents. It’s less stressful, and action items become much more actionable. Decrease the time it takes to analyze an […]
The Key Benefits of Observability for SREs
In today’s technology landscape, organizations strive to champion innovative ideas, techniques and technologies to achieve success and outshine their competitors. For this reason, site reliability engineering (SRE) has become one of the fastest-growing enterprise roles and a set of organizational practices for fast and reliable software delivery. SREs use various tools and practices at their […]
Site Reliability Engineers to the DevOps Fore
[pdf-embedder url=”https://staging-devopsy.kinsta.cloud/wp-content/uploads/2022/02/site-reliability-engineers-to-the-devops-fore.pdf”]
Improving Resiliency by Creating Chaos
In the digital economy, preventing downtime is paramount. When digital systems fail, the consequences for business can be huge. The cost of downtime can run to thousands of dollars per minute for large businesses. That’s without taking into account the impact of customer dissatisfaction and reputational damage for the company and the IT careers involved. […]
Where Do SREs Go From Here?
Charlene O’Hanlon talks with Leo Vasiliou, director of product marketing at Catchpoint, about the results of a study the company fielded with VMWare Tanzu and DevOps Institute of nearly 300 site reliability engineers (SREs). This year’s report underscores the challenges of multi-cloud, calls out the underutilization of AIOps and shows a systemic shift in core […]
SREs Say AIOps Doesn’t Live Up to the Hype
What can you expect when investing in artificial intelligence for IT operations (AIOps)? Real-time visibility across huge volumes of information. Lightning-fast event correlation and anomaly detection. Automated remediation and self-healing, without Ops personnel having to lift a finger. It all sounds amazing—at least, if you believe the marketing hype. But do those kinds of AIOps […]
Code Ownership Is Key to Accelerate Debugging
App stability is a fundamental part of every app experience. Broadly speaking, app stability is a measurement of the number of total app sessions that are crash-free or the percentage of daily active users who do not experience an error. End users have little patience for apps that crash or fail. Every application, whether it’s […]
Survey Reveals Slight Decline in Level of SRE Toil
The amount of routine toil that site reliability engineers (SREs) perform declined slightly in the last year even though IT environments in general are becoming more complex to manage. An annual survey of 300 SREs conducted by Catchpoint, a provider of an IT monitoring platform, in collaboration with VMware and the DevOps Institute suggests that […]
Using Chaos Engineering to Build Resilient Systems
Over the past decade, chaos engineering has become one of the most popular approaches in DevOps. It’s uniquely adapted to complex cloud-based systems and has the potential to succeed where more conventional approaches may not. Chaos Engineering Explained Traditionally, DevOps teams worked with systems hosted on single servers. They would analyze logs with a fine-toothed […]
- « Previous Page
- 1
- …
- 10
- 11
- 12
- 13
- 14
- …
- 17
- Next Page »










