SRE - Tagged - staging-devopsy.kinsta.cloud

When the Structure Becomes the Culture

June 9, 2026 by Andrea Valenti

Why micro teams and rotation reshape culture, not just throughput, in modern SRE. Most SRE leaders design teams around the systems they own. We designed ours around movement. We introduced micro teams expecting a throughput story: smaller groups, tighter scope, faster work. Some of that arrived. What we had not budgeted for was how much […]

The Death of the Four Golden Signals: Designing Telemetry for Non-Deterministic Infrastructure

June 5, 2026 by Jyostna Seelam

In complex software systems, our traditional definition of operational health has always been comfortably binary. For over a decade, site reliability engineering (SRE) teams have relied on the industry-standard ‘Four Golden Signals’ — latency, traffic, errors and saturation — as the ultimate truth of platform stability. If our API-response times are hovering at sub-100 ms, […]

Agentic SRE: The Next Frontier of Reliability

May 29, 2026 by Neel Shah

Agentic SRE is the evolution of site reliability engineering where AI agents help observe systems, reason over telemetry and take bounded operational actions under human-defined guardrails.

On-Call: The Silent Force Shaping Engineering Culture

May 27, 2026 by Heinrich Hartmann

There is a silent force shaping engineering culture inside every technology organization. It affects productivity, team morale, psychological safety, and long-term retention. And yet, it is rarely discussed in executive meetings or reflected in meaningful KPIs. That force is on-call. On-call is one of the most direct touchpoints engineers have with the reality of the […]

The Five Biggest Mistakes Organizations Make When Implementing SRE

May 12, 2026 by Akash Thakur

From cargo-culting Google’s playbook to rushing AI-powered observability into production before the fundamentals are in place, here’s where SRE transformations quietly go wrong, and how to course-correct.

Lightrun Adds Ability to Dynamically Pull Telemetry Data from Live Apps

March 30, 2026 by Mike Vizard

Lightrun has added an ability to dynamically pull missing telemetry evidence from live application environments without having to deploy additional instrumentation to its namesake site reliability engineering (SRE) platform that is based on artificial intelligence (AI). Company CEO Ilan Peleg said the Lightrun AI SRE platform includes a sandbox deployed via a software development kit […]

PagerDuty Extends Scope and Reach of AI SRE Platform

March 23, 2026 by Mike Vizard

PagerDuty has extended the capabilities and reach of its artificial intelligence (AI) agents to enable them to be invoked directly from within the Slack messaging platform. Additionally, the AI SRE Agent that is embedded within the PagerDuty Operations Cloud platform can now also leverage the Model Context Protocol (MCP) and an expanded library of application […]

Komodor Extends Reach of AI SRE Orchestration Framework

March 18, 2026 by Mike Vizard

Komodor today extended the reach of its orchestration framework for artificial intelligence (AI) agents by adding support for Model Context Protocol (MCP) servers and the OpenAPI specification. Company CTO Itiel Shwartz said those capabilities will make it possible for IT teams to more broadly orchestrate AI agents that are being used to investigate and remediate […]

Five Great DevOps Job Opportunities

March 16, 2026 by Mike Vizard

Weekly DevOps jobs roundup, this week highlighting top roles in Massachusetts, New Jersey, Chicago, Charlotte and Seattle, with pay ranges and hiring trends to help DevOps pros advance careers.

AI Is Forcing DevOps Teams to Rethink Observability Data Management

March 12, 2026 by Alan Shimel

As AI coding tools accelerate software delivery, they are also intensifying a problem DevOps and SRE teams have been dealing with for years: the unchecked growth of observability data. In this conversation, the founders of Sawmills argue that telemetry volume is no longer just a cost issue. It is becoming a data quality problem that […]

Sign up for our newsletter!Stay informed on the latest DevOps news

Sign up for our newsletter!
Stay informed on the latest DevOps news