This article is discusses sane system administration for a group of similarly-configured machines. It's also the basis for a talk that I'd like to give at the Spokane Linux User's Group.
Due credit: this is essentially my digested version of the concepts presented in the paper Bootstrapping an Infrastructure, and the derived material at http://www.infrastructures.org/. Some bits, like unified user accounts and NFS home directories, aren't important to me, so I haven't included them here. And I've been a lot more specific about choices of tools than the infrastructures.org folks have—you'll find a general Debian bias. =)
Due disclosure: I'm only in the initial stages of implementing for myself what I describe here. I'm writing this while the goals and principles are fresh in my mind. Actual mileage may vary, especially around section 5.
Your current situation may look something like this: you manage
several machines—could be a few or it could be hundreds. These
machines could be servers, routers, or workstations. You have
certain preferences and practices for what you like to have in common
for all your machines.
- Security updates should run at 2:10am.
- All my web servers should have these bits of configuration.
- All my firewalls should have these particular rules.
- I really want curl and wget on all my machines, and pv is a neat
little utility that I want everywhere, too.
But there are problems:
- You find that your machines, in aggregate, easily diverge from
your ideal vision, gradually decaying into chaos.
- You find yourself doing the same thing over and over again. Shell
loops and SSH can only get you so far, and mistakes lead to even
greater divergence among machines.
You want the things that should be in common to machines to actually
be consistent. You want to quit repeating yourself.
Here, we'll be discussing what to do about your situation.
1. Figure out how to install a consistent machine image, automatically
if you can.
I'm using Debian preseeding because I use Debian, and FAI is too
complex and hard to learn. FAI can manage machine configuration,
and kind of needs to to some extent, but that overlaps with what I
want Puppet to do (see below). Debian preseeding is "as simple as
possible, but no simpler."
2. Learn the hierarchy of data.
Your infrastructure is divided into a number of machines. The data
on each machine is divided into system and non-system data.
System data means installed programs, initialization scripts, and
other stuff that comes from your OS distribution. System stuff is
boilerplate. It's easy to replace this.
Non-system data is everything that makes a machine unique. It is further divided into configuration and local state.
Configuration is easy: parameters that define what a machine is
supposed to do, and how it's supposed to do it. This includes the
selection of installed software packages, and most of everything in
Local state is the non-configuration, non-distribution-supplied
data that your applications need in order to operate. This would include the
web root hierarchy of a web server, the mail spool and mailboxes on
a mail server, or the zone files of a DNS server. Note that this
includes both machine-generated state (mail spool directories) and
human-generated state (web roots). The local state includes most
of everything in
3. Understand key goals when it comes to machine data.
System data can be thrown away and replaced easily, because you can
reimage a new machine consistently.
Configuration data should largely be consistent across machines.
- Some things will always vary: hostname, IP address, and
- Some things you'll want to be standardized: NTP configuration,
smarthost configuration for non-mail servers, OS package
repository selection, a set of software packages you want to
have available on all machines, and so forth.
- Some things will be standardized, but only for certain classes
of machines: web servers will always have Apache with certain
configuration bits, mail servers will always run Postfix with
other configuration bits—or whatever you prefer, of course.
A machine's configuration is important—of course you don't want to
lose it. But we'll be managing configuration from a central
location, so if it gets physically lost on a particular machine, we
can recreate it with the configuration management system (discussed
Local state is the crux of a machine's data. Your web server just
won't work correctly unless you have the right HTML pages in place.
You cannot lose this data, so you'll back it up regularly. With
the methodology described here, it won't be hard to make this
3. Implement a system for centralized, managed configuration.
You have an ideal vision of what the common configuration of your
machines should be. You need the ability to express that
configuration, as well as the parts that differ on each machine.
The differences are conditional on certain variables: what's the
MAC address or the hostname of the machine? What's the role or
class (web, mail, etc.) that I've assigned to it? And so forth.
You need to be able to express this information in one place. When
you make a configuration change, your running machines will
synchronize to reflect that change.
A configuration management system should include:
- A procedure to express configuration in this fashion, and
- The tools for your machines to apply the configuration.
Puppet and Cfengine are contenders for this position.
Add a client for Puppet or Cfengine to your standard machine
installation image. Your new machines will automatically configure
- Installing a machine from your installation image results in a
plain, stripped-down configuration.
- The configuration management system updates the new machine's
configuration to whatever it needs to be, as you define in the
You'll also want to log, timestamp, and document your changes to
the managed configuration. You'll want to see which of your admins
made what change, and why. Which files were touched? What were
the lines that were altered? What did the file look like before
Applying a revision control system—such as Git, Darcs, or
Subversion—to your managed configuration makes this possible.
When you do this, you have a complete history of configuration
changes, and can roll-back changes that turn out to be mistakes.
4. The golden rule.
Now that you have a configuration management system in place,
follow our golden rule for infrastructures:
Never, ever change the configuration data or system data of a
machine in a way which is outside the control of your
configuration management system.
(...also known as "cowboy sysadmining.") It's very tempting to do,
because it's the quick and lazy thing. It's also quite common in
many (if not most) IT shops. However, this causes long-term
A violation of this rule could be considered a defect or an error
in the machine. When you break this rule, a machine diverges from
your ideal vision. One of two things can happen:
- The configuration management system detects the error and
corrects it, undoing your work.
- The error goes undetected, and the machine is permanently
different from all the other machines in your infrastructure.
Possibility #2 is the most dangerous of the two. In this case:
- The machine behaves unpredictably. It's different from all the
other machines of its type, so it's hard to reason about its
- Personnel trained to Do The Right Thing by using the
configuration management system will probably not be aware of
the variation. There's no automatic documentation or audit
trail, like there is with a revision-controlled configuration
- If the machine needs to be replaced, the new machine will not
carry this change, since the change isn't tracked. So, the
replacement of a machine is now a dangerous and risky
I would go so far as to say it's not worth doing configuration
management if you won't stick to this golden rule. After all, you
wanted predictability in your infrastructure, right? You wanted to
keep the important stuff in common, right?
If you find yourself with a compelling reason to compromise on
this, think carefully about how to do it
safely. Develop a procedure comparable to how you would update the
managed configuration, to make sure that your infrastructure
doesn't gradually descend into chaos once again.
5. Reap the rewards.
Need to add a new web server? It's as simple as:
- Imaging a machine,
- Updating your managed configuration to say "the machine with
MAC address <foo> has IP address <bar>, and it's a web server,"
- And letting the machine configure itself.
Now just drop in your web content.
Need to replace the failed hard disk of a critical server? This,
too, should be pretty easy:
- Image a machine.
- Restore the backup you made of the local server data onto the
- Update your managed configuration to say "this particular
critical server no longer has the MAC address <foo>. Now it's
- Watch the machine configure itself.
If you did everything right, you'll see your replacement machine
come up and take off running. Just like that.
Welcome to the future.
A. "But that would be putting all my eggs in one basket. And
then someone pwns my basket."
It's true. Implementing a configuration management system on a
centralized server is a security risk. That's because each of your
managed machines will do whatever your configuration management
server tells them to do, automatically. If a bad guy cracks your
management server, they could change the configuration to include
r00tsh3ll-3.1.337.tar.gz from my web server, untar it
/tmp, and move this executable file to this system binaries
directory, and run it."
But this is a risk that can be managed. And if you manage it well,
it's probably less risky than what you're doing now.
You have to log into your servers for maintenance somehow. In
non-managed situations, that "somehow" is probably "with SSH from a
system administrator's workstation or laptop." If a cracker gets
access to the admin's computer, they can spread their compromise to
each other machine that it has access to.
With a centralized, formal configuration management server, you'll
be able to lock that server down and make sure nobody's visiting
Flash-laden porn web sites on the critical server.
See also: http://www.infrastructures.org/