I’m a bad programmer.
There, I got that out of the way.
The truth is, I’m a sysadmin that writes code infrequently, on an island, and out of disciplined requirement, release, or QA practices. I need to run an ETL that kicks a file transfer. I’ll write some code. I’ve got a workflow that needs a crappy but functional web UI. Now I’m a developer. But in between these kinds of projects are days, and sometimes weeks of meetings and planning and dozens of things that are not code, which means by the time I come back to actual programming, I’m starting fresh every time.
I don’t think I’m alone. I think there are a lot of sysadmins in the world with functional perl chops, but because they don’t code daily, they don’t approach problems with an inherently code-focused approach. Systems are still running, the internet is still on, so this way of solving problems must be OK. It is, but there are drawbacks.
Rolling changes out to 4 dozen of your 8 dozen systems and verifying each received a 4 line change to that logship script with scp? Good luck with that. The old SA just left, and now you need to find all his stuff? GLHF. Some report hasn’t been sending for 4 days, what changed?! Yeah… yeah.
Where software development got it right
Great systems need three things: consistent configuration, maintainability, auditability. Good dev & deploy practices get you all three. Version control systems like git provide a great mechanism to manage code, version, and collaborate in reliable and predictable ways. Issue trackers, like Jira or Rally that integrate with version control let you easily tie specific changes back to the original reason for the change. Deployment systems like Jenkins and Capistrano give you one stop interfaces to track what got deployed where and when and by whom.
Take it as given that few SA teams are using all of these approaches for their work today. Probably you’ve got a ticket system, and you definitely get yelled at if you don’t comment your scripts. Maybe those comments link back to the work order number? Migrating what you do today to a more DevOps focused approach isn’t going to happen overnight. So how do you start?
Version control or GTFO – Yeah. I know. It’s on the list! No. Stop what you are doing, and gather all of your nginx.conf’s, your limits.conf, pg_hba.conf, and commit them to a git repo. I can’t emphasize this enough – if you are not enforcing version control in your systems administration team you’ll reap the whirlwind.
Change, Commit, Test. Welcome to the new DevOps discipline. You’re going to hate it at first, but 7 months from now when the webservers are flopping because somebody changed something, you’ll be grateful for a quick and reliable rollback.
Puppet or Chef or Ansible – get a basic setup. One master, three nodes. My starting point here was user management, I setup a basic user-add class, and let it manage ssh keys. Let that soak for a couple weeks and slowly roll it out to the rest of your infrastructure. Ten bucks says you find a dozen hosts with inconsistent UID’s that you’ve been meaning to fix for years.
Once that feels comfortable, move some basic scripts into your setup, roll those out slowly. Take the db configurations, commit them to the repo, and move ’em over to configuration mangaement too. The point is, moving to configuration management systems is not an atomic event across your entire infrastructure, iterate against it. Make small weekly changes that roll up to quarterly progress.
Here’s where the rubber meets the road:
“Git is going to suck because deploying is so much easier when you just vim a file on a server” – Every Sysadmin Ever
If you’ve got a repo, and you’ve got say, puppet… it’s a pretty easy step to have commits to that repo pulled and auto-deployed to your puppet managed nodes. Which means rollbacks can be too.
Pay off your tech debt – I don’t have to tell you about the 4 dozen scripts you’ve got out there, unmanaged, brittle, and needing constant tweaks to keep them lined up with system changes. These are the bread and butter of system administrators, and can provide a great opportunity to implement change. Next time you have to touch that script for a new SCP server, stop. Stop! Do this instead:
- check it in to your git repo, with some meaningful comments about what it does
- add it to your configuration management system (puppet/chef/ansible)
- if it’s bash or perl, rewrite it as python. It’s going to take a bit more time to rewrite all your scripts, but spending this time gets you more comfortable writing code regularly, and gets you more familiar with a modern scripting language.
Walk the path
Whatever your system or your goals, the hardest part of adopting DevOps approaches is going to come down to time. Reading through this myself, I’m struck by what an insensitive jerk I am. “Hey guys I know you totally have tons of free time, just starting doing everything you regularly do but add a ton of extra work too. Kthxbai”. The reality is, I probably am a jerk, but you don’t have to think about moving to DevOps as a shining city on a hill that you can reach with enough late nights. It’s a journey, a path, a way, not a place to arrive.
Find a way to start making small changes to your existing processes that line up with the best of the development world. Decide to add 10% more time to a proposed project to allow for it to be handled with these methods. If you begin to turn your steps onto the DevOps path, you’ll quickly realize you’ve already arrived at your destination.

I am having a hard time believing, in both theory and admittedly semi-weak self-practice, that the First Steps can really be distinctly and linearly encapsulated as:
1. Version Control (fully, all the way, everything, pass no further until done)
2. Content Management (partially, iteratively)
I was hoping step #1 could be as loosely-defined as #2, but I see you were quite clear the opposite is so, with “gather all of your nginx.conf’s, your limits.conf, pg_hba.conf, and commit them to a git repo.”
Maybe I’m dense. I’ve heard the “Version everything!” mantra repeated a thousand times. But no one wants to talk about the design and layout of your versioning scheme. They sweep it under the rug of “It’s flexible! Do what’s best for your team and your company!” However, I immediately find this perilous. You have a dozen servers. They can run anywhere from one to thirty major services. Just because they run the same service, say PostgreSQL, that doesn’t mean they want to be configured the same way. And no, I don’t mean “configured differently” because of machine-specific details like OS version, amount of RAM, or IP address. I’m perfectly aware these can be abstracted away in your Content Management recipes. No, I’m talking they are configured differently because they are tasked to do fundamentally different things. Yes, I’m aware Content Management has concepts like “flavors”, “roles”, “environments”, etc., to again abstract away commonality between machines and let you gracefully manage the differences. Yes, yes, yes. I am aware.
But this brings me back to my first point. How can you permit the luxury of iterative “recipe writing”, yet absolutely demand full version control right at the jump? If I don’t have “flavors” or “roles” designed yet, then in my VCS, how do I distinguish pg_hba.conf on Machine M1 from pg_hba.conf on Machine M2? Remember, I don’t have “recipes” for PostgreSQL — I’m not on that iteration yet. If I did a hierarchy, do I do it by machine and then service? By service and then machine? Or just machine and copy the file path into my VCS structure? What if a service installs files all over the filesystem like /etc/FOO/FOO.conf for configuration, /etc/rc.d/init.d/FOO-service for init scripts, /usr/bin/FOO-{this,that,metoo} for CLI commands, and /etc/profile.d/FOO.sh for environment setup? What if I can install two different versions of the same package, bound on different IPs/ports, yet on the same machine? How would I wrangle all that versioning then?
Why doesn’t anyone seem to ever talk about this? I don’t think it’s fair to immensely oversimplify the VCS-side of configuration management into a single sentence. It’s not something you can do once in a single afternoon and never look back. I assert it, too, needs iterative design, planning, and execution — probably close to lockstep with the Chef/Puppet iterations. Because you know there will be one forgetful (or maybe just vindictive) admin on one afternoon who will make a local change, not tell anyone, and be undetected for months. You’re going to need drift detection. Barring that, you can fallback on what I call Poor Man’s Drift Management, which is just snapshotting to VCS all your important files on a cron trigger, every day. At least you have crude auditability back to roughly when it happened. But hey, not all these cron triggers will look alike. They take some finesse. Don’t break your neck getting it all out the first day. Embrace the iterative way.
This is a great question, and one I struggled with mightily when I first tried to setup our systems at Craftsy. Your point gets to a real difficulty in this style of writing – I can’t provide every imaginable solution, but I should provide more than just vague ideas. Specifically with regards to version control, I went through two iterations, and structured them like this:
>postgres configs
->hostname
–>relevant configs from that host, no logs, no binaries, etc, just configs
->hostname2
–>relevant configs from that host
etc
This got me to a hobo-but-functional way to get on a box, checkout, change, commit, push, test. This worked really well at small-ish scale, when we were 40 or so hosts. From here it was pretty easy to leverage the scheme with puppet like so:
hostname.pp
file { '/data/pgdata/pg_hba.conf':yada yada
source => 'puppet:///files/pg_configs/hostname/pg_hba.conf'
}
Now, I can just go to my puppet file server, checkout, change, commit, push, and let puppet push the change out.
This was great, as I say, until we got above a couple dozen boxes. At that point, you don’t actually want to modify httpd.conf 16 times. So I moved to a structure like this:
git:
>httpd
->stage-httpd
–>httpd.conf; default.conf; ssl.conf &c
->dev-httpd
–>httpd.conf;default.conf; ssl.conf &c
->prod-httpd
–>httpd.conf etc
Now in puppet I create a class, stage-http, in that class include the files referenced above, and included the class from any stage webserver’s node.pp. It sounds like you get that idea totally, not trying to be pedantic just explaining how our VCS approach started, and evolved over time.
My point was really not to suggest moving to VCS is easy, it’s just that it’s important enough you have to start with _something_. I hope this clarifies what I was getting at, and double extra bonus points maybe is helpful to you in thinking of how to carve up your own environment. If you think this topic deserves its own post, I’m happy to flesh it out more!
-Matthew
Hi Matthew,
I run a company that develops Pipeline Orchestration tools to support agile and continuous delivery strategies. Recently, I was asked: “What level of devops maturity do we need to achieve before we should start to think about automated delivery workflows for code and configuration?” While I didn’t have a good response, the answer seems to be somewhere beyond the practical first steps you outlined in this post (maybe part 2?).
My question is how far beyond version control (git) and automated server configuration (Puppet, Chef, Ansible) do you have be before it makes sense to start thinking about automated configuration delivery and orchestration? Is this the logical next step after automated server configuration or do you believe there are additional intermediate steps?
I’d love to get your thoughts!
– Dennis