While we have been building our SaaS company, we have been investing much time in researching and deciding which tools to include in our DevOps toolkit. We’ve based these decisions on our years of experience in the IT industry, dealing with infrastructure for the most part. From building a petabyte-scale, data analytics infrastructure, our architecture, tools, and processes have become key components of our technology and operations. We’ve taken great care in selecting, benchmarking and constantly improving our tool selection.
As a company that is building a solution on top of an open-source stack (ELK), our team is heavily involved in the open source community, contributing to multiple projects such as Camel and Kafka while customizing tools to fit our needs. The vast majority of tools we use internally are open-source ones. By sharing the toolset that we’ve collected and honed over time, we hope to foster a discussion within the DevOps community on what further improvements can be made.
With that, we welcome you to enjoy browsing through the following list that we’ve created. Some tools you may have known about for years, while others may be new. Of course, we invite any and all feedback, especially from those of you who are using alternatives!
Must-Have DevOps Tools
1. Nagios (& Icinga)
Infrastructure monitoring is a field that has so many solutions… from Zabbix to Nagios to dozens of other open-source tools. Despite the fact that there are now much newer kids on the block, Nagios is a veteran monitoring solution that is highly effective because of the large community of contributors who create plugins for the tool. Nagios does not include all the abilities that we had wanted around the automatic discovery of new instances and services, so we had to work around these issues with the community’s plugins. Fortunately, it wasn’t too hard, and Nagios works great.
We also looked into Icinga, which was originally created as a fork of Nagios. Its creators aim to take Nagois to the next level with new features and a modern user experience. There is a debate within the open source community about the merits of Nagios and its stepchild, but for now we are continuing to use Nagios and are satisfied with its scale and performance. The switch to newer technology, such as Icinga, may be appropriate in the future as we progress.
2. Monit
Sometimes the simplest tools are the most useful, as proven by the simple watchdog Monit. Its role is to ensure that any given process on a machine is up and running appropriately. For example, a failure occurs in Apache, Monit will help to restart the Apache process. It is very easy to setup and configure and is especially useful for multi-service architecture with hundreds of micro-services. If you are using Monit, make sure to monitor the restarts that it executes in order to surface problems and implement solutions (rather than just restarting and ignoring the failure). You can do this by monitoring Monit’s log files and ensuring that you are alerted to every restart.
3. ELK – Elasticsearch, Logstash, Kibana – via Logz.io
The ELK Stack is the most common log analytics solution in the modern IT world. It collects logs from all services, applications, networks, tools, servers, and more in an environment into a single, centralized location for processing and analysis. We use it for analytical purposes (e.g., to troubleshoot problems, monitor services, and reduce the time it takes to solve operational issues). Another use for this tool is for security and auditing (e.g., to monitor changes in security groups and changes in permissions). After receiving alerts on these issues, it is easy to act on unauthorized users and activities. We also use ELK for business intelligence, such as monitoring our users and their behavior. You can set up your own ELK or buy it as-a-service. We’ve written a guide for the community on using ELK to monitor your application performance.
Disclaimer: Logz.io is our ELK-as-a-service that we use in our own environment. You can say that we eat our own dog food.
4. Consul.io
Consul is a great fit for service discovery and configuration in modern, elastic applications that are built from microservices. The open-source tool makes use of the latest technology in providing internal DNS names for services. It acts as a kind of broker to help you sign and register names, enabling you to access service names instead of specific machines. If, for example, you have a cluster of multiple machines, you can simply register them as a single entity under Consul and access the cluster easily. We praise this tool for its efficiency, although we still feel there is more that can be done with it. If you also use it, it would be great to hear about your own use case.
5. Jenkins
Everyone knows Jenkins, right? It’s not the fastest or the fanciest, but it’s really easy to start to use and it has a great ecosystem of plugins and add-ons. It is also optimized for easy customization. We have configured Jenkins to build code, create Docker containers (see the next item), run tons of tests, and push to staging/production. It’s a great tool, but there are some issues regarding scaling and performance (which isn’t so unusual). We’ve explored other cool solutions such as Travis and CircleCI, which are both hosted solutions that don’t require any maintenance on our side. For now, however, since we’ve already invested in Jenkins, we’ll continue with it.
6. Docker
Everything that can be said about how Docker is transforming IT environments has already been said. It’s great…life changing, even — (although we’re still experiencing some challenges with it). We use Docker in production for most services. It eases configuration management, control issues, and scaling by allowing containers to be moved from one place to another.
We have developed our SaaS solution with a twelve-layer pipeline of data processing. Together with Jenkins and Docker, we have been able to run a full pipeline across all layers on a single Mac. It would be wrong to say that there aren’t any complications with Docker, as even small containers can take a significant amount of time to build. However, we want to ensure that our developers are as satisfied as possible and enable them to work rapidly. With all of the management involved in storage, security, networking — and everything surrounding containers — this can be a challenge.
We see Docker progressing and look forward to welcoming the company’s new management and orchestration solutions. For those who might be having issues with Docker, we’ve also compiled a list of challenges and solutions when migrating to Docker.
7. Ansible
Again, simplicity is key. Ansible is a configuration management tool that is similar to Puppet and Chef. Personally, we found those two to have more overhead and complexity to our use case– so we decided to go with Ansible instead. We know that Puppet and Chef probably have a richer feature set, but simplicity was our desired KPI here. We see some tradeoffs between configuration management using Ansible and the option to simply kill and spin new application instances using a Docker container. With Docker, we almost never upgrade machines but opt to spin new machines instead, which reduces the need to upgrade our EC2 cloud instances. Ansible is used mostly for deployment configuration mostly. We use it to push changes and re-configure newly-deployed machines. In addition, its ecosystem is great, with an easy option to write custom applications.
8. Collectd/Collectl
Collectd/l are nifty little tools that gather and store statistics about the system on which they run and are much more flexible than other tools. They allow users to measure the values of multiple system metrics and unlike other log collection tools that are designed to measure specific system parameters, Collectd/l can monitor different parameters in parallel. We use these two tools to measure customer performance parameters and ship them to our ELK-as-a-Service platform. We’ve specifically wrapped a Collectl agent in a Docker container and push it with Ansible to all of our servers. It collects information every couple of seconds and then ships it to ELK to allow us to run reports and send alerts. If you’d like to see a specific example of how we do this process in our environment and how others can do the same, we’ve created a guide for everyone.
9. Git (GitHub)
Git was created 10 years ago following the Linux community’s need for SCM (Source Control Management) software that could support distributed systems. Git is probably the most common source management tool available today. After running Git internally for a short period of time, we realized that we were better suited with GitHub. In addition to its great forking and pull request features, GitHub also has plugins that can connect with Jenkins to facilitate integration and deployment. I assume that mentioning Git to modern IT teams is not breaking news, but I decided to add to it to the list due to its great value to us.
Other picks?
The modern DevOps world is full of outstanding and unique open source tools—it’s a jungle out there. We found the tools listed here to be the best in breed and think they should be included in every DevOps engineer’s shortlist.
Or am I wrong? What open source DevOps tools are in your toolkit? I’d love to hear your own recommendations and experiences in the comments below.

Nagios? No, just no. We have got collective Stockholm syndrome with that streaming pile of crap
What do you guys use for alerts/monitoring?
Shinken, which is also rubbish. Sensu has the principles so far in my opinion
Nagios is great alerting tool because of what it doesn’t do. If you are trying to combine your alerting and diagnostic tools thats when all solutions stink. Nagios + graphite for the winning combo.
Finally a alerting and diagnostic solution that do not stink have arrived, Prometheus + Grafana
The big question I have is are you using any of the openstack capabilities of collectl? For example you can monitor VMs and see cpu/net/disk usage by instance ID. I’ve also written a tool to allow collectl to read swift stats by grabbing statsd metrics and writing them to a file that looks like /proc and which collectl can then read. The beauty about this mechanism is anyone can access that file whereas normally one requires exclusive access to statsd.
And of course if you use colmux, you can monitoring multiple machines from a single terminal window AND sort the output by any column.
-mark seger
We’re running on AWS (at least for now..) so we’re not using any openstack capabilities of collectl. Didn’t hear of Colmux before but we’ll definitely have a look.
Tomer.
So no real config management tool ? I mean, Ansible is great for deployment, but Devops is about Automation + Measurement + Sharing right ? You should have a look on what Normation is doing with Rudder. That’s Puppet for 2015 people.
Cool. Haven’t heard of it before. We’re looking for something that will sit on top of Ansible/kubernetes and do all the production/staging integration. It needs to be open-source and (preferably) also offered as SaaS. I’ll definitely check out Rudder.
Do check for Spring as well.
I would think that someone would mention IBM UrbanCode Deploy and Release, which I think would replace a good portion of the tools listed here.
What about Maven ! it is a awesome complement for Jenkins, I think without maven there would be a lack of functionallity in Jenkins/Hudson/CloudBees
How about chef and puppet?
Great list…but too short.
Check the #100Days100DevOpsTools campaign’s curated list: http://theremotelab.io/blog/updates_about_devops_world/
Gradle
There are so many build tools, and if you’re platform has first class support for one I think that’s usually your best option. I think Gradle is pretty specific to what platform you’re developers are using. So with that in mind I don’t think it belongs on this list.
Elixir has mix, node has npm, go has… go, etc
Check out containership.io — deloy, manage, and scale containerized apps on any cloud.
An interesting miss would be JFrog Artifactory universal repository manager (Free trial jfrog.com/artifactory/free-trial/) which also servers as a private docker registry (https://www.jfrog.com/confluence/display/RTF/Docker+Registry)
Open Source Tools are great and support Rapid Delivery and DevOps. One missing tool in the DevOps toolkit is a decent APM solution. There are no easy to use intuitive open source APM tools that provide system and business insights so much needed for DevOps. So AppDynamiccs, New Relic or DynaTrace is a MUST for any DevOps shop.
Great. Devops is bringing devlopment and operational teams together. Devops tools helps in managing the servers and also for the organisations who provides devops services.
Hello Tomer,
Nice list – most common tools.
I wanted to introduce you to ProductionMap (www.productionmap.com) which is an open source visual IDE for DevOps to develop automation
How about below list ?
Openstack
Splunk
Sensu
Redis
Docker
Nginx
Saltstack
Jenkins
splunk is open source?
Redis isn’t a devops tool. Openstack, sensu, and splunk are platforms. Docker and Jenkins are already mentioned.
Great, thanks for sharing such a useful information, keep sharing like this …
http://www.marketingsolutions.pk
Vagrant so you can stand up your lab/staging/production environments on your dev machine, test out what you’ve just changed, then push it.
Hookdoo: https://www.hookdoo.com
You should separate the between the tools that are agents offering a run-time service and tools that the devops programmer uses. For example: Jenkins, Ansible are tools that devops programmer uses, but Consul.io offers a service at run-time.
Hi, Is all these tools uses in DevOps?
Regards,
Disha, DevOps Geek
Thanks for given information about DevOps 9 tools,we are waiting for this type of useful information articles in your blog DevOps Training Institutes in Hyderabad
Nagios is old, and there are much better solutions in modern day. Monit is limited, and isn’t nealy as useful as it makes itself out to be. Nagios can be powerful, but its clunky and difficult to manage especially to scale. OOB nagios doesn’t do much beyond alerting and tracking the alert state. You still have to setup a graphing interface. So not only do you have the DB to back it, but nagios running in front of that, and then getting the proper output from nagios into your graphing systems. It turns into an operations nightmare, because you end up having several decoupled components that you need to scale and manage. IMO you’re better off just cutting the middle man out, in this case nagios is the middle man.
I’ve found that InfluxDB with Grafana works really well. You can pair that with collectd, telegraf, statsd, etc to get data into influx, and influxDB supports horizontal scaling in a very modern way with a replication factor so no need to worry about managing partitioning and replication.
As for monit, most init systems provide a way to watchdog a process, and if they don’t there are already a million and one ways this can be done with the wheels we got. Sure monit can launch a script to react to a failure based on metrics, but this is a sloppy way of doing things. You should already have things like logrotation or log aggregation(like ELK which is mentioned) to prevent things like logs over running your disks. If you’re servers are OOM or high CPU I don’t know if I trust a script to just start killing shit, and even then the kernel OOM killer can be configured to know which services it should avoid killing. Trying to react to these issues with scripts is just a terribly sloppy way to handle things. If you’re running out of disk space for something other than logs you should fix the bug or get bigger disks. Any other issues you can configure away without the need for some kind of over the top watchdog. Having a script try to outsmart the OOM killer, just rimrafing all the logs, or doing some hail marry attempt to automatically “fix” your issue is just a long time spent setting up a device that shoots you in the foot.
As for these other tools, a lot of these are great. Jenkins is a great platform, consul as well as etcd are amazing tools for a multitude of reasons in a clustered environment, Ansible is a great tool since it relies on SSH but can be configured to pull if you have too many servers to handle that but having the ability to initially use SSH to setup pulling on a schedule is very useful, ELK is incredibly useful for logs as Elasticsearch has very powerful text analysis for digging through unstructured log data, collectd is a staple tool for all things systems related. Overall these are good tools, but I can’t help to feel that Nagios is past its prime and no longer suites modern needs as well, and Monit has always just been useless.
And here is the Top 10 Automation Testing Tools for 2018 : https://medium.com/@briananderson2209/best-automation-testing-tools-for-2018-top-10-reviews-8a4a19f664d2
Digital Media Trend provides best services of digital media marketing, SEO, Social Media, SEM, and Media Buying in Lahore, Pakistan
For Best: Digital Media Marketing Companies In Lahore, Pakistan
Digital and Social Media Services In Lahore, Digital Marketing In Pakistan, Media Buying Services
For Best: Digital and Social Media Services In Lahore
digital media trend is providing best services of graphic designing for the purpose of digital media marketing, branding, brochures, info graphics etc.
For Best: Best Graphic Designing Services in Lahore
I am biased (I work on the product itself at Microsoft) but I hope you would consider Visual Studio Team Services (VSTS), our DevOps cloud hosted services. We’ve got hosted Git, Build (cloud hosted, Windows, Linux and Mac), Release, CI/CD pipelines, Agile, Kanban, Test, Package Management and a ton of extensions in our marketplace for various 3rd party products and OSS. Just really wish more people would know about our service. The best part, a good amount of the capabilities are free and developers can use which parts they need. Thanks for your consideration folks!
is there any good tutorials which connect all of them but no just list all the names, I know it depends, but really useful and helpful as a good example. so just goog it.
Can you help me with application used to monitor and to see performance of apps? Thanks in advance.