Although a relatively new concept, site reliability engineers (SREs) have become crucial for DevOps teams, helping to solve an array of operational problems such as network availability and user experience. However, in previous years, some people have questioned the longevity of such a role. This article will discuss several reasons why site reliability engineering roles […]
Best of 2022: Day in the Life of a Site Reliability Engineer (SRE)
As we close out 2022, we at staging-devopsy.kinsta.cloud wanted to highlight the most popular articles of the year. Following is the latest in our series of the Best of 2022. By now, most of us are familiar with the concept of site reliability engineering (SRE). The term was originally coined by Google and SRE has […]
Best of 2022: SRE Vs. Platform Engineering: What’s the Difference?
As we close out 2022, we at staging-devopsy.kinsta.cloud wanted to highlight the most popular articles of the year. Following is the latest in our series of the Best of 2022. Site reliability engineering (SRE) teams and platform engineering teams share similar goals—like maximizing automation and reducing toil—and similar methodologies. But they have different priorities, and […]
Blue-Green Deployment: What Are the Options?
Blue-green deployment is a change management strategy for software releases. Blue-green deployments require two identically configured hardware environments. One environment is active and serves end users while the other remains idle. Blue-green deployments are typically used for applications with strict uptime requirements. First, new code is released to staging environments and fully tested. Once the […]
Switching From FluentD to Vector Log Aggregation Tool
Log files are extremely important to the data analysis process as they contain essential information about usage patterns, activities and operations within an operating system, application, server or device. This data is relevant to a number of use cases across an organization from resource management, application troubleshooting, regulation compliance and SIEM and business analytics and […]
SRE Survey Reveals Major Technical and Cultural Challenges
Catchpoint, in partnership with Blameless, today published an annual survey of 559 site reliability engineers (SREs) that found 59% of respondents didn’t view tool sprawl to be a major concern. Another 40% said tool sprawl is a moderate (33%) to serious problem (8%). More than half of respondents said they build as much as 30% […]
Of Max and Min: The Non-Interference Prime Directive (for Visibility)
Last issue, I cited CPU utilization as an example of a metric that is often misused to describe/explain/infer system performance and asserted that improved visibility can help to overcome such misuse. In this issue, I will expand on what improved visibility means and present two case studies that illustrate a recent antipattern that I’ve noticed […]
Of Max and Min: When Performance Engineering Plans Go Awry
Modern software developers have access to powerful tools and services which allow them to quickly develop, demo and deploy fully functional applications. But what happens when the single-user prototype satisfies all required functionality but the user says the system seems slow? Or if the initial implementation turns out fine for one user, but multiple users […]
Scaling Predictive Analytics With AIOps to Drive Next-Gen SRE
Enterprise systems are only as valuable as they are reliable, in the sense that they don’t suffer excessive breakdowns. Otherwise, companies experience costly downtime and added stress for engineers due to the additional burden of managing issues. This critical function of ensuring systems run reliably and optimally, at production scale and with minimum human intervention, […]
Kintaba Creates the First Conference Dedicated to Incident Response – Techstrong TV
John Egan, Co-founder and CEO at Kintaba, talks about IRConf, the first conference fully dedicated to Incident Response. The company behind the conference is Kintaba — a modern incident management platform for the entire organization. The video is below followed by a transcript of the conversation. Alan Shimel: Hey, everyone. Welcome to Techstrong TV. We […]
- « Previous Page
- 1
- …
- 6
- 7
- 8
- 9
- 10
- …
- 17
- Next Page »










