rsync,
and some scripting skills. It helps a lot if you know what a
configuration management signature for a host looks like.
In ksb's model of configuration management the most important idea is that every level pushes information to adjacent levels to feed-back relevant information. This is how bugs get fixed, how site policy gets applied, how better code gets deployed. In truth, the whole configuration management structure doesn't function without mechanisms to both send and receive feedback between structural elements.
Hostlint creates a report that reveals the
difference between an ideal host and the host it is frisking.
hostlinthostlint does is rsync
a directory loaded with shell and perl scripts from
a locally trusted service run by Operations to a temporary directory
on the target host.
Then it cd's into the temporary directory to
run a check script called site.
Site reports issues with the
instance's configuration signature in an easy to parse format.
The site application's working parts are just as simple.
It looks in current directory for scripts that end in
.hlc (for "host lint check"), and runs each
one, capturing the output from the list. If the output from every
check script is empty, then site outputs an all clear
message, else it outputs the list of differences reported.
Thus the output is never empty for any host.
The site script comes from a repository I call
hostlint-policy.
It is actually globally
visible inside the production network, as any site policy should be.
Knowing what is expected is how you meet expectations, right?
crontab runs
hostlint at least once a week on every host.
E-mailed output from that tasks is processed on a central reporting host to
collate and prioritize the messages. The Admins review the feedback report
every Monday to prevent minor errors from becoming bigger issues.
(The jobs are staggered across a 4 hour window, so the reports do
not all come in at the same time, and we even-out the
rsync service's load.)
When a new instance is created, after the process finishes the
final reboot, it runs hostlint to report
any out-of-date items the installation process has installed. This is
actually quite often the case in some minor way (version of a manual
page or script). This offers the admin a chance to update the build
process, as well as fixing the instance just created.
Part of the triage list for a production issue is to run
hostlint. Some application could have been
mistakenly removed, back-revisioned or upgraded. This is a quick check
that can be compared to the last e-mail report to see what may have
changed.
-V option to local tools
-V switch. This makes
checking the versions of most local tools as easy as running them
with that switch and parsing the first line of output. In fact that
is exactly what versions.hlc does.
ident on local configurations and manual pages
ident
to pull the RCS
identification string out of the comments in each page. If your site
uses some other revision control, you'd have to use the apropos tool
to extract the correct token's value.
rpm -i
uniq's manual page.
Changes at that level really don't need human attention, you'll pick
up the new one when you build a replacement instance.
/etc/resolv.conf.
If this file is misconfigured your life gets harder really quickly.
Checking the options and
search lines for sanity has saved me many
hours of debugging.
/var/log/security). In those
cases you might us op to escalate a check
command with an in-line script. See
op's HTML document
for details.
In effect hostlint checks level 4 (running hosts)
against level 5 (site policy) to assure that every host conforms.
If you have site policy statements that do not impact the contents of any file,
version of a product, release of a package, or existence of a login,
group, netgroup or network route -- then you can't check it with
hostlint.
hostlint helps the Adminhostlint has saved me more effort
than any other script I've installed.
Since petef and I wrote hostlint to
check the versions of every local tool it has found regressions and
missing tools for me, which has saved me a lot of debugging.
Regressions happen when you restore a filesystem from backup media
(with tapes, sbp, or even a copy from a host
you thought was identical). Or when you replace a mother board,
network card, or other component with firmware or a tracked
serial number.
Build processes get out-of-date when upstream packages change, or when peer groups update their site policies without telling you.
A list of local tools, packages, and expected configuration files makes
an excellent outline for teaching new staff what they need to know.
Adding new checks to the hostlint repository
is a great warm-up to getting superuser access.
I've put injunctions in the accounting checks to forbid accounts for people that have left the organization. The makes auditors very happy.
hostlint is the tool for you. Most
of the checks do not require superuser access, those that do might
be given an op rule.
If you've not read it, then you should read about
netlint in the
HTML document.
$Id: hostlint.html,v 1.8 2012/07/11 17:20:44 ksb Exp $