Log Analysis, or Log Clog

By: Chris Riley on January 20, 2015 11 Comments

Log Analysis has the ability to transform teams. It centralizes communication, increases responsiveness, moves efforts from reactive to proactive, reduces time to resolution TTR, and allows increasingly complex environments to grow without fear. But, if log analysis becomes the bottleneck, it reverses all these benefits.

*** Author note: The credibility of this post has come into question do to my relationship with two of the four vendors mentioned in this post. My company Fixate IO has had a financial relationship with these vendors. However the position, and opinions are all my own.

As with all the new DevOps tooling, you will find that it sometimes creates more problems than it solves. Chef and Puppet encourage server sprawl with rapid un-managed provisioning. Selenium decreases functional oversight when script coverage is not the same as a human. And log analysis when limited, or poorly fed, can overlay issues across the entire environment. The good news is with the right tool, and implementation these things can be avoided altogether.

First the problem. Because of the deep reliance on log analysis for organizations that have implemented it, if anything goes wrong with the platform it can hamper the entire delivery pipeline. The reason for this is that it is well connected to nearly every component of your environment, or it should be; that is why logs are so valuable. And it is sticky, there is a high amount of lockin when you choose a log analysis platform. Normally lockin is bad, but in DevOps analytics is a must and it needs to be a consistent format and “language”. Finally it has two potential bottlenecks, throughput and storage limits.

I’m assuming that if you have implemented log analysis you did it correctly. Which means you were deliberate, before implementing you asked questions and documented expected results, you designed or evaluated your logs beforehand to know how you would tag and query on them, and you evangelized it, not hoarding it for operations only. If you did not do these things, and you already have a system up and running. You have far greater problems.

Even when you have a great log analysis platform implementation a few things can still go wrong. They are:

1.) Not enough throughput: Not being able to get the logs in when they are created is a huge bottleneck. It also kills the value of response time. If you don’t get the logs stored near real-time you can’t analyze them, and thus you can’t react fast enough to interesting insights. This can also lead companies, because throughput is an additional expense, to limit what they transfer. Avoiding the benefit of an integrated environment, and forcing a selection bias on the logs you do store. You should always be motivated to store more, provided what you store has been architected to answer real questions.

2.) Now storage limits. Similarly, if you are limited on how long you can store logs for you lose the value of historical analysis. Historical analysis is useful for not only reporting at your next Quarterly Business Review (QBR). Where you will be questioned on the root cause, and impact of a major outage. Historical data can also let you know where your delivery pipeline weak spots are, in order to enhance them as you developers do their code.

3.) And finally, they can turn your team into zombies. If you do not properly design your queries and the logs you are querying. Then your team will not only waste time trying to get answers. They will become addicted to log deep dives, it’s kinda fun, just like Reddit and Facebook during the day. Oh and! They will start creating their own reports, so that at the end of the year you have so many saved reports you do not know what is good, what is bad, what to save, and what to get rid of. And eventually you blame the platform.

This list can double as criteria for the log vendor if you do not yet have a log platform, or thinking of changing. Some vendors make the problem worse by pricing models which foster deep dives, and encourage the log dumping ground. Some of the market leaders and open source tools are the most guilty here, along with the open source tools. Both of these also encourage log analysis to be an operations only tool. Which is completely against DevOps.

But there is good news. One is, spend a little effort and no matter who the vendor is, you can fix it yourself. Don’t expect the vendor to make DevOps or analytics successful for you. They can’t, and they won’t. Put in a little effort in damn it!

But also the more modern platforms, and the purpose built platforms, have seen these issues, and often built around them. Solutions like Loggly, LogEntries, NewRelic Insights, and Sumo Logic.

In particular, LogEntries, a tool I have familiarity with, has spent specific effort to start with analysis and insights first, to steer clear of the query first mentality. And if you get to query, then they offer a regular expression based language, a standard language. Not a language you have to go to special proprietary language lockin school for. They also allow for higher storage limits, by allowing externalization of logs to long term AWS storage. And finally they have a program that allows even free tier users to get more throughput. Via the Dropbox style, share with a friend. Showing their understanding of how important throughput is.

If you have already implemented an older log analysis tool, that encourages collection, not analysis. You may find that it will serve you better in other ways, and can consider the move to a more modern approach.

But forget the tool. The biggest problem is you. And your inability to think beyond the now. It is the future that is going to burn you. At that point in time you can cut bait and leave to a new company who is hopefully doing the right thing, avoid the clean up, or you can start slipstreaming the better practices now. They include:

Pre-mortem, or post mortem if you are already setup
Invest in bandwidth and storage, do not limit yourself in either. And stop log favoritism.
Start sharing reports. Yes this can create its own set of problems too, but they are all solvable.
Think about what you want from a set of logs, before you start storing them.
Do not trust the vendor to help your processes or culture, only execution of their tool.

It is up to you to pick the right tool AND implement it the right way. You are capable of building out a great logging system that does not become the bottleneck, and give your software delivery pipeline some strange degenerative disease.

Chris Riley

« DevOps need to pay more attention to security

Gartner gives Docker security an ambivalent thumbs up »

Comments

Jakk says

January 20, 2015 at 1:17 pm

really dude, your credibility just went out the window – company A pays you to speak favorably on their behalf while you dog the competition of company B while you collect cold cash from company A:

Founder & DevOps Analyst
Fixate IO

July 2014 – Present (7 months)|Livermore, CA

“We are Tech Evangelism as a Service.The Developer Tools market is fragmented and complex. Peer-to-Peer recommendations are king. We help ISVs navigate the market with a content strategy executed by an army of DevOps evangelist.”

Log in to Reply
Chris Riley says

January 20, 2015 at 7:17 pm

Jakk,

Thank you for your comment and very valid concern. I do have a vested interest in some of the companies in the DevOps space, and more than one of the companies listed above have paid my company at some point in time to produce content for their blogs. But more importantly, having seen common trends and tested a plethora of tools, I have personal preferences.

I would be happy to talk to you about it offline and even outline where and what I contribute to. Fixate (like all analyst firms) makes money from someone- generally from selling reports, getting sponsors for those reports, and producing content for ISVs. It is up to the discerning audience what value that content provides. I’ve been met with similar reactions to yours, and from many Gartner quadrants :D.

Nonetheless, what you are really saying to me is that the point was lost. The point I’m trying to make is that the customer’s approach to implementing the tool and not allowing throughput and storage is often the reason that log analysis fails. I know that it’s easy to get hung up on examples, but they only serve to illustrate a greater point. If I didn’t use real world examples then I would not be concrete about what I’m preaching, which is this: don’t be tool-focused and be deliberate.

I do take your comment seriously and hope that my central point won’t be lost amongst the examples I use going forward.

Log in to Reply
Jim says

January 20, 2015 at 9:23 pm

I’ll just leave this here:

http://en.wikipedia.org/wiki/Journalistic_objectivity

Log in to Reply
- Chris Riley says
  
  January 21, 2015 at 12:02 am
  
  Thanks Jim. And I stand by the fact that my review of poor log analysis implementations are objective. Having been an implementer and tester of nearly all the Log Analysis tools. Heavy user at my previous IaaS employer. And consultant for companies who have utilized them. Leveraging examples is a very common way to illustrate your point. But like Jakk it seems that the examples became the sole focus for the reader.
  
  Log in to Reply
Jim says

January 21, 2015 at 9:34 am

Jakk is only partly right. Readers expect staging-devopsy.kinsta.cloud and other sites that cover a general area to have impartiality or fairness in their content. Not only *was* your content definitely slanted against Splunk as you called it “grandpa” (now edited out) and there are “modern platforms”, you actually link to content from the competitor you’ve been paid for. If you’re going to be paid to promote a vendor, don’t do it on an impartial site like staging-devopsy.kinsta.cloud. If you’re going to do things like “head to head” comparisons like theVerge, Engadget, CNET does–that’s understandable.

Because you did the same thing over here: https://staging-devopsy.kinsta.cloud/blogs/log-analysis-log-hoarding/

You have done a great service to Logentries, and a huge disservice to journalism and the readers of staging-devopsy.kinsta.cloud. When you’re on a site like this, you live and die by the objective journalism that lives far longer than the stuff you’re paid for by a vendor. If i was staging-devopsy.kinsta.cloud, i’d be pissed about some of your posts as they cast a negative light on this website. Are you Jayson Blair, not yet.. but you’re heading in that direction.

Moreover, it appears you’ve edited this content (taken out phrases like “Splunk is grandpa”). A proper journalist would have used strikethrough styling so the reader is aware of what the previous edits were of why these comments are being written.

Log in to Reply
- Chris Riley says
  
  January 21, 2015 at 10:28 am
  
  Jim,
  
  I hope I’m not starting to sound like a broken record. But I will say it again. I use examples. When I use examples they come from experience. This is a selection bias http://en.wikipedia.org/wiki/Selection_bias, and i’m sure while there are people immune to selection bias, I am not one. Nor have I met one. You and Jakk are focusing on what I say about tool examples, not what I say about Implementation.
  
  The post is not about the examples. It’s not about tools at all. It is about poor implementation of log analysis. My charter in DevOps is to turn focus on the right way to do things, and not expect the tool to do it. And i’m sure in the future I can do a better job turning the focus on my core point. You and Jakk might have a preference for one of the vendors in the space, and that is ok. And all vendors, friends of vendors, and their resellers have an opportunity to make a case for their product in an open forum. If that is what you are doing, I would hope you would do it more directly.
  
  Log analysis is not an enigma. And it should not be difficult to look beyond what the tool is, and pay attention to the ideas around better implementation. Which is what I talked about and I stand by.
  
  Splunk: There is no strikethrough option. And this change was the result of not properly giving the vendor the chance to respond before the blog was posted. If you have a suggestion for a followup post that is a round table between myself and them, that would be interesting.
  
  Log in to Reply
  - Jim says
    
    January 21, 2015 at 2:37 pm
    
    Chris, I’m sorry you are sounding like broken record. If readers are not focusing on what you want them to, pull the post; rewrite it; rethink it. I’m not endorsing or demoting a product, rather criticizing a lack of journalistic integrity and as a result calling in to question the integrity of the entire website staging-devopsy.kinsta.cloud. Let me again, stress these were your words you wrote in your post.
    
    If this website has a content management system that allows you to actually put HTML or style code in the your post you might want to investigate the “strike” or “del” tags available in HTML<5 and 5+.
    
    http://www.w3schools.com/tags/tag_strike.asp
    
    Regarding your post, I disagree with #5. The vendor may have expertise and experience in log analysis/management that can be useful to me. Sometimes a vendor can be a catalyst or a guide to help organizations change things at times. Finally, proofread your posts. Sentences shouldn't begin with the word "And" or "Then". Try to make the written word different than the spoken word.
    
    Log in to Reply
    - Chris Riley says
      
      January 21, 2015 at 3:29 pm
      
      Questioning my writing style could be a never ending conversation. I am no Shakespeare.
      
      My post in fact goes against the ideas that all the listed vendors have. Your point below. Which is that tools are not first, they are not DevOps, and vendors do not understand your operation. And I stand by my experience and knowledge there.
      
      Thank you for having something worth discussing. In my experience I have never seen a vendor with the proper expertise to assist in implementation beyond their feature set. This is not bad, how could they? They posses only a small scope of the entire tooling market. This is what my soon to be published O’Reilly book discusses.
      
      Also it’s never just one tool, it is the interplay with all other people and processes. The integrated environment DevOps asks for. This holds true from when I was apart of document imaging, ECM, and SharePoint markets. Any enterprise software deployment essentially. And the primary reason why SIs are doing so well. Here is why I think this is. Often operations nor the vendor have full visibility into developer operations. And operations have too many production related tasks on their plate as well. Thus they often do not consider all the moving parts. So it’s easy to miss that you might be creating a greater problem than you are solving. Is your experience different than mine? 30% of my time is spent consulting for enterprises moving to DevOps from waterfall, or very fast waterfall which some call Agile. And I lead with metrics, which is system logging most of the time. We do a pre-mortum. And out of this you quickly realize that without planning your queries ( questions ) in advance, you can anticipate there will be a lot more time spent searching for information, than using it. I spend most of my effort on people, not tools. People, then process.
      
      Log in to Reply
John says

January 23, 2015 at 9:48 am

Chris,

You should probably exclude yourself from these types of Blog posts – considering you’re blog posts, etc., have been paid for by Logentries. (https://blog.logentries.com/author/chris-riley/)

He is actually an on-going paid contract for the Marketing team at LE.

Log in to Reply
- Chris Riley says
  
  January 23, 2015 at 9:01 pm
  
  John,
  
  Thank you. But this post was not paid for by LogEntries. Nor have any of my previous posts been. And you are not aware what my companies working relationships are. So i’m not sure how you can that assumption. I have public profile pages on 20+ blogs dating back to 2010. I have created content for the following companies. Applitools, MongoDB, RayGun, Loggly, LogEntries. As well as regular contributor to O’Reilly and Gigaom. My co-workers have contributed to a number of others that including competing products, and there will be more. The content we create for customers is always published on their blogs. And any content published on the 5 public forums I contribute are not paid for. And my own opinion.
  
  With that said. I do have a bias, as do you for your favorite products and Schlumberger Limiteds products that they use. But this does not change the key point of this article which is log analysis implementation. And someones opinion about my bias will not change my content, which is honest, from experience, and I stand by it.
  
  I think staging-devopsy.kinsta.cloud would be better served if you attacked my points, not how I make money or who my affiliations are with. Then we can have a conversation about what Log Analysis means for DevOps.
  
  Log in to Reply

Trackbacks

6 Reasons why Splunk might be bad for you | Chris Riley says:

January 21, 2015 at 5:12 pm

[…] to the controversy around this recent post on staging-devopsy.kinsta.cloud where I originally mentioned Splunk. I thought it would be appropriate to write a […]

Log in to Reply

Sign up for our newsletter!Stay informed on the latest DevOps news

Comments

Trackbacks

Leave a Reply Cancel reply

Sign up for our newsletter!
Stay informed on the latest DevOps news