Today, any business that deploys software faces an obscene amount of expenses. It has to pay for cloud hosting, assuming of course that it’s among the 92% of companies that use the cloud. It needs trustworthy networks to connect its apps to its employees and customers. It probably pays for a suite of cybersecurity software in a bid to stay ahead of the ever-growing list of cybersecurity threats. All of these costs, which are unavoidable for any company that wants to host modern software, add up to higher overall IT bills.
But there’s a source of excess spending that has grown tremendously in recent years: Observability. If a company deploys software, they also need to monitor and observe that software to ensure that it meets performance and availability requirements. In recent years, the cost of observability has skyrocketed.
In fact, according to Charity Majors of Honeycomb, the total cost of observability at an organization today is equivalent to up to 30% of total infrastructure spending. That’s a truly astounding figure, if you think about it. For every dollar spent on the infrastructure that powers applications, 30 cents ends up paying just for the tools and services needed to make sure the apps running on that infrastructure are doing what they’re supposed to.
On top of this, spending on observability tools has become very unpredictable, meaning companies have no idea what they are going to be paying next week, next month or next year to get the insights they need to manage their applications and infrastructure. In this sense, observability is worse than taxes. With taxes, at least you know ahead of time what you’re going to have to pay in most cases.
Getting Observability Spending Under Control
Observability costs won’t go down on their own, but there are practical steps that organizations can take to get observability spending under control–and they can do it without sacrificing the critical visibility they need to manage complex systems.
To prove the point, we’ll discuss the reasons why observability has become so expensive. It’s partly about technology, but it also has to do with who gets to make decisions about adopting observability tools and how cultural inertia inside organizations impedes the ability of practitioners to migrate to more cost-efficient observability solutions.
Then, we’ll talk about actionable strategies for getting observability costs under control. By taking advantage of some new tools and strategies, observability spending can be reined in while also–and this is the best part–actually increasing the organization’s ability to observe and manage complex systems.
Why Observability Costs so Much
There’s no one simple reason why observability costs for the typical organization today have increased. Instead, several factors have conspired to bloat observability bills.
Drowning in Data
One issue is the sheer volume of observability data that organizations have to collect.
That’s partly because businesses deploy so many individual applications–207 on average, according to Okta–each of which generates observability data, to say nothing of the infrastructure on which the apps are hosted. But it’s even more so because of the fact that cloud-native architectures have vastly increased the number of application components and services that we need to observe.
Instead of having one monolithic app and one host server to monitor, today, companies might have two dozen microservices, a Kubernetes orchestration plane, a bunch of nodes and perhaps an underlying IaaS platform, all of which they have to observe just to keep a single app running healthily. That adds up to tremendously more observability data than businesses had to manage in the past.
Viewed from one perspective, having more observability data is a good thing. The more data there is, and the more granularity that can be associated with particular parts of the stack, the more insights DevOps teams can glean, at least in theory. But the problem is that, in many cases, only a small fraction–perhaps as little as 1%–of the data that is collected and ingested into observability tools is actually ever used to detect or troubleshoot performance issues.
Granted, part of the reason we collect so much data but use so little is that it’s impossible to know ahead of time which observability data we’ll actually need. But that’s not an excuse for paying for data that never serves a useful purpose. If we’re going to spend so much on observability data, we might as well use more of it while also finding ways to pay less for it.
Inefficient Data Collection
The problem of ever-growing data volumes is compounded by the challenge of inefficient data collectors, which translate to higher infrastructure costs.
Here’s why: The traditional way to deploy observability tools is to rely on agent-based data collectors that run essentially as standalone applications alongside the applications they are monitoring. As a result, the observability agents suck up a non-insignificant amount of CPU and memory, which require more infrastructure resources to support them and, thus, higher costs.
Hence, one of the grand ironies of our times: To figure out whether their applications are consuming resources efficiently, businesses deploy observability software that actually decreases resource efficiency, while also bloating costs.
Complex Pricing Models
The pricing models of the typical observability tool are, in a word, complex. Most vendors charge in part based on how much observability data is ingested into their tools (this is usually the biggest contributor to overall cost), but they might also charge based on how many agents are deployed, how many users the tool has within the organization, and which features are wanted to access. The costs might also vary depending on where the data is chosen to be stored, how long of a retention period, how often it needs to be accessed and so on.
Complex pricing makes it hard in many cases for organizations to cost-optimize their observability spending. Determining exactly which features and usage tiers are needed can be a tough job, and companies might easily end up overspending as a result (after all, who wants to risk underspending and ending up with observability gaps that lead to critical failures?). Plus, unlike, public cloud services, there are no cost-optimization tools designed to help businesses “rightsize” their observability tool deployments or architectures.
Organizational Inertia
Historically, most observability and APM solutions were designed first and foremost for developers. To use them, developers had to instrument observability into applications. Then, it fell on the operations or DevOps team to connect the apps to APM tools and figure out what to do with the observability data they generated.
There’s nothing wrong with involving developers in the observability process. However, the pitfall of the developer-centric approach to APM and observability that organizations have tended to follow is that developers typically don’t actually know very much about what the operations team needs to observe. After all, developers build apps; they don’t monitor or troubleshoot them. The result of this is that developers design apps to generate all sorts of observability data that may or may not be useful (which is why, again, organizations end up paying for a whole bunch of data they don’t actually use).
This might not be much of an issue if it were easy for the operations team to come in and say, “Hey, we’re collecting all this data and it’s dumb. Let’s be more strategic.” But sadly, it’s not. Organizations being organizations, effecting change is not easy, especially if the change means getting people to adopt new tools or practices. So, businesses end up stuck with observability solutions that may have worked well in the past, but that are not efficient by modern standards, due simply to organizational inertia.
Cost Unpredictability
As mentioned, the problem with observability costs today is not only that they’re too damn high. It’s also that they’re too damn unpredictable.
Observability vendors don’t hide their costs, so those are easy enough to predict. But those costs, again, are based largely on how much data is ingested into the observability tool. And how many engineers do you know who can predict with any kind of accuracy how many logs and metrics their applications will generate from one day to the next?
Most can’t because data volume is tied to application demand, and that’s inherently unpredictable. After all, if we knew exactly how many requests our applications were going to receive from one moment to the next, we probably wouldn’t really need observability at all. But we don’t, and part of the core purpose of observability tools is to ensure that we can detect the issues that arise due to unexpected software usage patterns.
I am not exactly criticizing observability vendors here, but rather pointing out that their pricing models capitalize on a fundamental limitation of modern applications: The amount of observability data they’ll produce is unpredictable in most cases, which can lead to wild fluctuations in observability spending.
Keeping Observability Costs in Check
Just as there’s no one single cause of high or unpredictable observability costs, there’s no one trick that can rein the spending in. But there are several strategies that, used in unison, will bring observability costs under control:
Analyze Data at the Source
The more data is moved in order to observe it, the more it will end up costing. By the same token, the more data analyzed at its source, the lower the cost will be.
This is why, for instance, Kubernetes observability will cost much less if the data is collected and analyzed right inside nodes, instead of shipping it out to an observability platform first. Not only will the results be faster (because there is no need to wait for the data to move) but data that isn’t critical will be able to be ignored , which reduces how much data is ingested into the observability tools and cuts down on overall costs.
Keep Data at the Source
In a similar vein, not all data needs to leave its source at all. There might be datathat needs to be kept handy in case it is needed later, but can not be analyzed right now. Instead of moving it into external storage, which will cost more, it should be kept right where it originated – inside the nodes. It’s there if needed, but there is no need to payi extra for it.
Add Efficiency With eBPF
I mentioned before the grand irony in which organizations end up wasting infrastructure resources to deploy observability tools: There’s a pretty simple solution for that. It’s called eBPF, and it’s an ultra-efficient way to collect observability data. Unlike conventional observability and APM software, eBPF data collectors run directly inside the operating system kernel instead of running as user space applications. That leads to much lower levels of CPU and memory consumption. With eBPF, there is all the observability needed, without sacrificing resources in the process.
Affordable Observability
The cost of deploying software these days is going up overall, but observability costs can actually go in the opposite direction. Thanks to new observability tools and techniques, it’s possible to collect all of the data neeeded, without allocating a tremendous portion of the overall IT budget to it.
So, say goodbye to bloated budgets, and say hello to affordable observability.