To understand this document

You need to have a working knowledge of rsync, and some scripting skills. It helps a lot if you know what a configuration management signature for a host looks like.

In ksb's model of configuration management the most important idea is that every level pushes information to adjacent levels to feed-back relevant information. This is how bugs get fixed, how site policy gets applied, how better code gets deployed. In truth, the whole configuration management structure doesn't function without mechanisms to both send and receive feedback between structural elements.

Hostlint creates a report that reveals the difference between an ideal host and the host it is frisking.

Close the loop with hostlint

All hostlint does is rsync a directory loaded with shell and perl scripts from a locally trusted service run by Operations to a temporary directory on the target host. Then it cd's into the temporary directory to run a check script called site. Site reports issues with the instance's configuration signature in an easy to parse format.

The site application's working parts are just as simple. It looks in current directory for scripts that end in .hlc (for "host lint check"), and runs each one, capturing the output from the list. If the output from every check script is empty, then site outputs an all clear message, else it outputs the list of differences reported. Thus the output is never empty for any host.

The site script comes from a repository I call hostlint-policy. It is actually globally visible inside the production network, as any site policy should be. Knowing what is expected is how you meet expectations, right?

How to use this application

A mortal application login's crontab runs hostlint at least once a week on every host. E-mailed output from that tasks is processed on a central reporting host to collate and prioritize the messages. The Admins review the feedback report every Monday to prevent minor errors from becoming bigger issues. (The jobs are staggered across a 4 hour window, so the reports do not all come in at the same time, and we even-out the rsync service's load.)

When a new instance is created, after the process finishes the final reboot, it runs hostlint to report any out-of-date items the installation process has installed. This is actually quite often the case in some minor way (version of a manual page or script). This offers the admin a chance to update the build process, as well as fixing the instance just created.

Part of the triage list for a production issue is to run hostlint. Some application could have been mistakenly removed, back-revisioned or upgraded. This is a quick check that can be compared to the last e-mail report to see what may have changed.

How that works

But to understand how all that works to explain some of the modules we coded, and the support I've added to every program I've distributed.
The ubiquious -V option to local tools
Every program that can accept a command-line option to output a version string has the -V switch. This makes checking the versions of most local tools as easy as running them with that switch and parsing the first line of output. In fact that is exactly what versions.hlc does.
Use ident on local configurations and manual pages
Since a manual page is not an executable we use ident to pull the RCS identification string out of the comments in each page. If your site uses some other revision control, you'd have to use the apropos tool to extract the correct token's value.
Use rpm -i
For any package manager we query for the release string of key packages. There is no reason that you couldn't look at every package, but that tends to inject a lot of noise into the signal. Hardly anyone wants to track the version of uniq's manual page. Changes at that level really don't need human attention, you'll pick up the new one when you build a replacement instance.
Check for missing (lines in) a configuration files
One that comes to mind is /etc/resolv.conf. If this file is misconfigured your life gets harder really quickly. Checking the options and search lines for sanity has saved me many hours of debugging.
Report versions of firmware, hardware inventory numbers, etc.
Most vendors support a limited range of firmware releases. If your machine is out of that range you might want to fix that. This is a site policy that is forced on you by the vendor, but you still need to manage it locally.
Privileged checks
Some files you may want to check are protected from mortal snooping (for example /var/log/security). In those cases you might us op to escalate a check command with an in-line script. See op's HTML document for details.
Check hashes or checksums of critical applications
If you are worried about unauthorized changes to production code you might even go that far.

In effect hostlint checks level 4 (running hosts) against level 5 (site policy) to assure that every host conforms. If you have site policy statements that do not impact the contents of any file, version of a product, release of a package, or existence of a login, group, netgroup or network route -- then you can't check it with hostlint.

How hostlint helps the Admin

I'm pretty sure hostlint has saved me more effort than any other script I've installed. Since petef and I wrote hostlint to check the versions of every local tool it has found regressions and missing tools for me, which has saved me a lot of debugging.

Regressions happen when you restore a filesystem from backup media (with tapes, sbp, or even a copy from a host you thought was identical). Or when you replace a mother board, network card, or other component with firmware or a tracked serial number.

Build processes get out-of-date when upstream packages change, or when peer groups update their site policies without telling you.

A list of local tools, packages, and expected configuration files makes an excellent outline for teaching new staff what they need to know. Adding new checks to the hostlint repository is a great warm-up to getting superuser access.

I've put injunctions in the accounting checks to forbid accounts for people that have left the organization. The makes auditors very happy.

Summary

If you need a mechanism to check the signature of a running host, then hostlint is the tool for you. Most of the checks do not require superuser access, those that do might be given an op rule.

If you've not read it, then you should read about netlint in the HTML document.


$Id: hostlint.html,v 1.8 2012/07/11 17:20:44 ksb Exp $