Ai2 released a family of open-source coding agents that can be trained on private codebases for as little as $400. The models, called SERA (Soft-verified Efficient Repository Agents), perform on par with the best open-source coding agents while requiring far less compute and infrastructure.
The strongest model, SERA-32B, solves 54.2% of problems in SWE-Bench Verified—a standard benchmark for evaluating coding agents. That puts it ahead of previous open-source leaders while requiring just 40 GPU days to train on two NVIDIA Hopper GPUs.
But the bigger news is what Ai2 is releasing alongside the models: The complete training method, generated data, and tools needed to adapt these agents to your own codebase. This means development teams can now train coding agents that understand their internal APIs, custom frameworks, and organization-specific conventions.
The Cost Barrier is Gone
Most coding agents today are expensive to build and impossible to customize. Training a competitive model typically requires large-scale reinforcement learning infrastructure and substantial compute budgets. Ai2’s approach changes that math dramatically.
The method reproduces the performance of SWE-smith (a previous synthetic data approach) at 57 times lower cost. It matches SkyRL, an open-source RL system, at 26 times lower cost. Reproducing the performance of top industry models like Devstral Small 2 costs around $12,000—still significant, but within reach for many organizations.
The key innovation is something Ai2 calls “soft-verified generation.” Traditional methods generate pairs of incorrect and corrected code, then carefully test each pair to ensure accuracy. This verification process requires complex infrastructure and generates high costs.
Soft-verified generation takes a different approach. It generates patches that are only partially correct—but that’s fine. Just as different code can reach the same correct solution, agents can learn from imperfect examples. This removes the testing bottleneck and dramatically reduces the cost of generating synthetic training data.
Learning Your Codebase
The real test comes when adapting these models to specific repositories. Ai2 tested this on Django, SymPy, and Sphinx—the three largest repositories in SWE-Bench.
The results show that smaller, specialized models can match or exceed larger general-purpose models. SERA-32B, after training on just 8,000 samples from Django, achieved 52.23% accuracy—better than the 100B+ parameter teacher model that scored 51.20%. The cost: $1,300 in compute.
This matters because closed models haven’t had access to your internal code. They don’t know your data pipelines, your API conventions, or your specific engineering practices. Training on that data makes them useful for real work.
The training uses standard supervised fine-tuning—no custom RL infrastructure required. Ai2 designed the pipeline so that teams without deep ML expertise can run it.
Performance and Speed
SERA models work with Claude Code out of the box. Ai2 collaborated with NVIDIA to optimize inference performance across their accelerator lineup.
Running in BF16 precision on four H100 GPUs, SERA reaches about 1,950 output tokens per second with a 16K context window. At FP8 precision, throughput jumps to 3,700 tokens per second with minimal accuracy loss. On next-generation Blackwell B200 systems, SERA scales to around 8,600 tokens per second.
At 32K context length, SERA-32B achieves 49.5% on SWE-Bench Verified—comparable to Devstral Small 2 (50.0%) and GLM-4.5-Air (50.5%). At 64K context, SERA-32B reaches 54.2%.
What You Get
The release includes models ranging from 8B to 32B parameters, all generated training data, the complete training recipe, and a CLI tool that launches an inference server with just 2 lines of code.
Everything is open. You can reproduce the results, inspect the data, and customize the approach for your needs. The training pipeline is intentionally simple—standard supervised fine-tuning on generated trajectories.
“Coding agents are increasingly differentiated by how cheaply and precisely they can be adapted to real codebases, not by raw benchmark scores. Ai2’s SERA release makes repository-aware agents affordable and practical by publishing the full training recipe and data approach, turning customization into a default capability rather than a research exercise,” according to Mitch Ashley, VP and practice lead, software lifecycle engineering, The Futurum Group.
“As AI-generated code volume grows, value moves to agents that understand local context and engineering intent. Open, reproducible pipelines like SERA bring agent tuning into everyday development workflows, raising expectations for openness, adaptability, and cost transparency across developer platforms.
Why This Matters
Bringing the cost of strong coding agents down from hundreds of thousands of dollars to a few hundred or a few thousand changes who can participate. Research labs, small development teams, and individual organizations can now build agents that understand their specific codebases.
The case that one researcher at Ai2 built SERA shows how accessible the approach is. You don’t need a large ML team or distributed training infrastructure to get state-of-the-art results.
The full release—models, training recipes, CLI tools, and data—is available now. For teams looking to deploy coding agents that understand their internal systems, the barrier to entry just dropped considerably.

