Injecting failure into your infrastructure to test your services resilience has been gaining popularity. Most people think of Netflix and their use of “Chaos Monkey”, in fact, they have an entire “Simian Army” of tests and drills they run to make their systems more reliable. At PagerDuty, we couldn’t just copy what has been published […]

