AI coding agents are becoming more capable, but evaluating them is harder than it looks. Most benchmarks focus on a single dimension of agent capabilities; for instance, the popular SWE-Bench benchmark only focuses on fixing issues on open source Python repositories. Real-world software engineering involves fixing bugs of course, but it is a lot more […]
An Operational Approach to Benchmarking Microservices
Benchmarking microservices often falls behind on the engineering team’s to-do list. However, those that aren’t are missing out on a key component. The fundamental goal of benchmarking is to better understand the software, and test out the effects of various optimization techniques for microservices. In this post, I’ll describe an effective approach to benchmarking microservices. […]


