Multi-Agent System Promises Faster Bug Detection and Resolution

IT outages cost companies over $14,000 per minute. IBM Research’s Project ALICE uses multiple AI agents to help engineers find bugs faster and restore systems.

Software bugs are expensive. When a critical system goes down, every minute of downtime costs revenue, frustrates customers, and strains engineering teams scrambling to find the problem.

The challenge isn’t just fixing bugs. It’s finding them in the first place.

Modern cloud systems are complex, with layers of interconnected software spanning dozens of microservices. A slight change in one part of the stack can cascade into a significant incident affecting thousands of customers. Engineers often spend hours, sometimes days, sifting through logs, metrics, and traces to pinpoint the root cause.

Project ALICE, a new experimental system from IBM Research, aims to change that. Short for “Agentic Logic for Incident and Codebug Elimination,” it’s a multi-agent system that automates the detective work of tracking down software bugs.

How Project ALICE Works

Instead of relying on a single AI agent, ALICE uses multiple specialized agents that work together sequentially. Each agent handles a specific part of the investigation.

When an incident occurs, an incident analysis agent starts by gathering observability data from the affected system. This includes metrics, logs, and traces that show what was happening at the time of the failure.

Next, a code context agent builds a dependency graph that maps how different software components connect. It identifies which microservices are most likely relevant to the problem, narrowing down the search area for engineers.

Finally, a code analysis agent, powered partly by IBM’s CodeLLM DevKit, works to pinpoint where the bug actually lives in the codebase. It generates a detailed report and creates a GitHub issue with the exact observability information and code details engineers need to fix the problem.

These agents communicate via the Model Context Protocol (MCP), an open standard that enables them to interact with external models and tools that also use the protocol.

“Project ALICE shows how agentic AI can take on the hard tasks, such as finding the real source of failure in systems too complex for any one engineer,” according to Mitch Ashley, VP and Practice Lead, Software Lifecycle Engineering, The Futurum Group. “Coordinating specialized agents across observability, dependency mapping, and code analysis saves hours of manual investigation. This is the direction ITOps is moving: multi-agent systems that understand context, share evidence, and hand off work cleanly among humans and digital coworkers.”

Early Results Show Promise

IBM’s site reliability engineers have started testing ALICE in their own workflows. The early data looks encouraging.

Using ITBench scenarios, a set of standardized IT operations benchmarks, the team measured a 10% to 25% improvement in root-cause identification when agentic code analysis was added to the debugging process. The research was presented at the NeurIPS 2025 conference this month.

That improvement might sound modest, but in the context of IT operations, it can translate into significant time and cost savings. If ALICE can help engineers identify issues even 15% faster, that could mean the difference between a five-minute outage and an hour-long incident.

Beyond Bug Detection

The long-term vision for ALICE goes beyond just finding bugs. The IBM Research team wants the system to autonomously identify, strategize, and fix bugs. Eventually, it could even scan for potential failure points before incidents happen, a shift from reactive to proactive system management.

This is particularly important given that roughly 27% of unplanned outages are caused by software updates. If ALICE can detect problematic changes as they’re made to a codebase, it could stop incidents in seconds before they affect customers.

The work on ALICE is part of a broader effort at IBM Research to build automation tools for anyone who relies on software to monitor systems. The team recently developed an “undo button” feature for agents who are still learning optimal problem-solving strategies. They’ve also partnered with Kaggle to create leaderboards that help engineers evaluate which models and agents work best for specific IT operations challenges.

What This Means for DevOps Teams

For DevOps professionals, systems like ALICE represent a practical application of agentic AI. Rather than replacing engineers, these multi-agent systems handle the tedious, time-consuming work of log analysis and code investigation. This frees up human engineers to focus on strategic decisions, architecture improvements, and complex problems that still require human judgment.

The key difference with ALICE is its multi-agent approach. Each agent specializes in one aspect of the debugging process, allowing for more thorough investigation than a single general-purpose agent could provide.

As these systems mature and prove their value in production environments, we’ll see more organizations adopt similar multi-agent architectures for IT operations. The challenge will be integration, getting these tools to work smoothly with existing observability platforms, incident management systems, and development workflows.

For now, Project ALICE remains experimental. But the early results suggest that multi-agent systems could meaningfully reduce the time and cost of managing modern software infrastructure.