AI
Stop Worshipping ‘Global Availability’: A Practical SLI/SLO Bucketing Playbook
Global availability hides real failures. Learn why bucketed SLIs give a truer picture of reliability—and how SRE teams can align alerts with real business impact ...
AWS Sage: Your AI Assistant’s Gateway to the Cloud
AWS Sage is an open-source MCP server that gives AI assistants unified, intelligent access to AWS with cross-service discovery, dependency mapping, impact analysis, and automated incident investigation ...
Autoptic Unfurls AI Change Resilience Agents to Analyze Telemetry Data
Autoptic launches AI-powered change resilience agents that analyze DevOps telemetry to detect root causes, prevent incidents, and boost confidence before production changes ...
Testlio Adds AI Tool to Generate Dashboards on App Testing Service
Testlio enhances its testing platform with AI-generated dashboards that correlate over 100 signals, helping DevOps and business leaders spot quality, risk, and AI-related issues faster ...
DevOps Didn’t Fail — We Just Finally Gave it the Tools it Deserved
Charity Majors delivers a wake-up jab to DevOps, sparking debate on its evolution and success. Is it a failure or just getting interesting? Alan dives into the discussion ...
GitLab Delivers on AI Agents Promise to Automate DevOps Workflows
GitLab today made generally available an agentic artificial intelligence (AI) platform that automates software engineering tasks ranging from planning to application security. Coinciding with the release of version 18.8 of the core ...
Yes, We Know AI Isn’t a Person
Let’s get the obvious out of the way right up front. AI isn’t a person. Thank you, Captain Obvious. We’re all on the same page. And yet, when we announced that AI ...
Your AI Agents Have a Blind Spot: What DevOps Teams Need to Know About Cross-LLM Security
Explore the challenges of AI agents in DevOps pipelines, highlighting the importance of model-aware detection to improve security and reduce vulnerabilities ...
Lessons from 2025: The Year “Agent Mitigation” Became a Thing
Explore the emergence of agent mitigation as a formal discipline in response to 2025's AI failures, highlighting best practices for secure and reliable AI agent deployment ...
Part 3: The Zero-Touch Infrastructure: Architecting Systems That Fix Themselves
Part 3: Discover how autonomous SRE transforms incident management and system reliability, enabling self-healing systems that reduce reliance on human intervention ...
Part 2: From Reactive to Predictive: Training LLMs on Your Incident History
Part 2: Discover how to harness incident history and AI to predict and prevent operational issues before they escalate, improving efficiency in Site Reliability Engineering ...
Part 1: Death of the Toil: How AI Agents Are Replacing Traditional Runbooks
Part one of a three-part series: Discover how AI-driven reasoning agents are revolutionizing SRE practices by eliminating traditional toil and enhancing incident management ...

