To understand this document

This document assumes you are quite familiar with the standard UNIX™ macro processor m4(1), and the shell sh(1). In addition, you should have a working understanding of the xapply(1) wrapper. While hxmd is not really a wrapper, it depends strongly on that technology and knows about several wrappers.

It also helps a lot if you have a specific configuration management application in mind, for example updating a specific set of files across many hosts.

I'll be the first one to admit that it takes a little up-front effort to make hxmd work for you. So we are going to start simple and work up to a full site configuration. By doing this, we'll see a continuous return on investment that should keep you motivated to continue. If you feel that you're wasting your time after the first check-point, then you have too few hosts to make this worth your time, so you might put it down until your administrative domain gets a little larger. I think you'll come back it eventually, but in the mean time you might use xapply alone.

What is hxmd

Hxmd is an advanced shell processor which runs a given control once for each host selected from a potentially large population. Some attributes are bound to each host to allow hxmd to select hosts, customize control, and filter data files or access caches of data. Here is the usage from the manual page:
hxmd [-Agnsz] [-preload] [-B names] [-c cmd] [-C config] [-E compares] [-F literal] [-G guard] [-j m4prep] [-k key] [-K filter] [-o attributes] [-Q ward] [-R req] [-r redo] [-X ex-config] [-Y top] [-Z zero-config] [xapply-opts] [xclate-opts] [m4-opts] control [files]
Rows in tabular configuration files define each host, as well as the host's key attributes. The text of the control is processed through m4 to customize the code for each target host, by using m4 macro definitions for each attribute. Optionally other related files or commands may be similarly processed for each host. The resultant custom shell script is executed once for each host selected via xapply.

The command line accepts xapply options to tune the parallel execution of the expanded control command. In addition to supporting either a wide or a sequential execution stream, xapply also provides some additional in-line text processing, and hooks into ptbw and xclate. See also the HTML document about the xapply stack.

The command line also accepts most xclate options to tune the collated output filter. It doesn't accept the -u option to bind xclate to a named local domain socket because hxmd is using xclate to recover the exit codes from the processes: any stray processes would interfere with that scheme.

How to use hxmd

The mechanics of hxmd may be decomposed into 4 parts:
The configuration data hxmd reads
The very first resource a site needs to use hxmd is a list of the target hosts. Without at least a list to process the tools is of no use at all.
A reason to process those hosts
A list of hosts won't do anything without a reason to visit each one, or process a task for each, or report on each. Some criteria to select a set of hosts to process is required.
An update to process
With a reason to visit the host, we still need a payload to deliver the update, poll for status or generate part of a report.
A way to close-the-loop on failed updates
Sometimes the update fails; then we need a way to retry updates that failed.

After we have those parts in place, we have the foundation available to build other structures on top of hxmd. We also touch on the evolution of tools which encapsulate hxmd, and how others would be designed, like msrc.

Configure a list of hosts

I'm assuming that you've already installed the tool-chain up to mmsrc ("micro msrc"), and can build the tools we refer to here. If you have not done that yet, you need to see the quick-start build document first, or install the RPMs.

The first step after that is to build a host configuration file; it is always called site.cf in the examples below (but you can change that to any name you like better). If you have an older configuration file for distrib, you should start with that one, as hxmd reads those too. If you don't own, one you should create a list of a few test hosts in a file. Later you should add more elements to make it complete, but for our purposes less than 10 hosts would be fine.

In site.cf you can include comments with the common octothorp (`#', aka hash mark) to end-of-line style. We'll add more attributes about each hosts here later. Mine looks like:

# ksb's test list
w01.example.com
w02.example.com
adm1.example.com
adm2.example.com

To check that configuration output the list of hosts with the command below:

hxmd -Csite.cf "echo HOST"

The hosts might not in output in exactly the order they appear in the site.cf file. I know we just implemented a complex version of cat, but that was not really the point. We can change echo to any program we like (ssh springs to mind) to run a shell command for each host in the file, just as xapply might. We should add xapply options to the command to leverage the features of the xapply-stack. As an example lets collect the same uname data from each host:

hxmd -Csite.cf "ssh -n HOST uname -nmp"

Since ssh takes some time to make a network connection, the order of the outputs for the hosts may vary across multiple executions of the previous example.

How that works

In fact, hxmd is building a process tree with xclate and xapply wrappers around the echo processes. We can run another program to see that tree; if ptree or pstree is installed try one of these:
hxmd -Csite.cf "ptree $$" | more
or
hxmd -Csite.cf "pstree -cG $$" | more

The process tree output from either of the above command might look like this:

  1111  ksh -i -o vi -o viraw
    51320 hxmd -Csite.cf ptree 1111
      51322 xclate -m -N >&6 -- xapply -fzmu -P6 %+ -
        51323 xapply -fzmu -P6 %+ -
          51428 /usr/local/bin/xclate 0
            51429 /bin/ksh -c ptree 1111  _
              51444 ptree 1111
          51430 /usr/local/bin/xclate 1
            51433 /bin/ksh -c ptree 1111  _
              51445 ptree 1111
          51431 /usr/local/bin/xclate 2
            51434 /bin/ksh -c ptree 1111  _
          51440 /usr/local/bin/xclate 3
            51441 /bin/ksh -c ptree 1111  _
      51324 hxmd -Csite.cf ptree 1111
	51450 (m4)
    51321 more

The process tree shows us a few aspects of hxmd that are pretty enlightening:

hxmd splits into 2 processes.
The first process (51320) runs the xclate sub-tree, the second process (51324) works with the first to start the tasks and process the m4 requests. You might see child m4 processes under the second hxmd process.
These 2 processes act as a frame around the processes that power the simple machine we've built with our xapply command template.
hxmd wraps an xclate -m around all sub-commands.
This xclate machine is like a "block and tackle" we are going to use to lift a load. The load is the problem we are going to solve, and the parallel tasks xapply manages are the "threads" that provide the tension. One axle is the parent xclate, while the leaf xclate's form the other axle.
hxmd forces the -u option to xapply
This provides unique names for the "threads" used by the block and tackle to lift the load. That allows hxmd to identify which of the many tasks failed (after processing).

We could do most of that with xapply -mu already, so we need to press a little more before we stop reading. The configuration management applications add a lot more value.

Adding some spice

A key element of systems administration is the dark place called "configuration management". In short, knowing what gets put (or got put) on each server, switch, disk, tape, and workstation and why it is (or was) there. We should keep the formative data about each element in the ITIL CMS (which used to be called the CMDB), but where exactly is that?

I propose that no single repository of authoritative information exists, or can exist -- it is too much to keep up-to-date, and not every structure can be coerced into consulting a centralized configuration management repository. A federated CMS, which keeps different data sources under the control of different masters, might work where a single repository doesn't.

To make that model work, we need a tool which joins these disparate data sources, allows us to use the joined stream to select items to act upon, then fits the data and items into the format we want (to take the action we require). That is exactly what hxmd does for us.

Team m4

To support our plan, we need to format many disparate data sources in a common way. One might try XML, YAML, or any database format -- and I did try a lot of those. As I did that became clear (at least to me) that each was more trouble that is was worth. For example, building a DTD for each data channel was a pain, and the DTD was always either out-of-date or effectively a meaningless place-holder for list of name=value pairs.

I needed a format that included the mark-up definition inside the data stream (like column headers do on paper). I wanted to be able to add "columns" at will, and maybe even change columns or column order in mid-stream. Effectively, I wanted the same format mjs and I used 20 years ago when we wrote distrib.

Over the years, distrib has been used (abused) to solve many problems it was never intended to solve. So hxmd is designed to be a platform to replace all those ad hoc solutions. After replacing distrib I planned to replace the old "master source" makefile structure with msrc. Before you read about that you should really finish reading this.

Each authoritative data source linked into hxmd keeps controlled data in an arbitrary format; as long as it can output in the format below when asked to join the team. In that way, we get the best of everything: a clean format that is easy to produce and self contained, and no restrictions a placed on how that data is actually managed behind that public view.

The common data format

Configuration data for hxmd is kept as a white-space separated table. Any line that starts with a percent-sign (%) denotes the column headers for each subsequent row, until the next header line, or the end of the file. Any line of the form
macro=value
is taken as a virtual column that applies to all subsequent rows, until the next definition of that name.

There is a special value called "dot" which is an unquoted period (.) which undefines a column or macro. When needed as a value, the period is enclosed in quotes (as below).

Since the values are white-space separated, we need a way to quote values containing white-space: either the C double-quotes ("..."), or the m4 default quotes (`...') may be used to suppress the special meaning of white-space, dot, or the other kind of quotes.

Since someone might not put in a header line we need a default. The default header line is just the name of the key macro given under the command line option -k (which defaults to "HOST"). So a file with no header lines should contain a list of hostnames, and maybe some macro definitions.

Each file starts over with the default header, if you want to catenate files use cat.

The meaning of a row

With all that said, we need to give meaning to those tables.

The tables represent macros used to process streams through m4. The names must be valid m4 macro names, and the values should make sense as the context of our expected m4 expanded stream.

Each row in a table represents some managed node (a server, switch, disk, tape, workstation, or other CI).

Identical spellings of a key represent multiple aspects of the same node, so rows are joined on the key macro. For example, some file might specify serial numbers for each node, while another would specify the location in the physical world:

%HOST    SNUMBER
adm1.npcguild.org	5271009A
adm2.npcguild.org	FR1729CAB
...
%HOST    LOCATION
adm1.npcguild.org	A2-13:3
adm2.npcguild.org	A5-19:7
...

These files could be given to hxmd as a single stream, or as 2 separate files (under -C). The difference that makes is subtle and explained under -B below.

The main concept here is that, to hxmd, the only important macro is the key; the others are just attributes of that element. Any row that doesn't define the key macro is meaningless, and causes hxmd to complain. When a header line doesn't define a HOST macro you might be in trouble.

It is possible to change the key macro name (under -k), but it is a much better idea to code a processor around hxmd that forces options such as -k, then calls hxmd indirectly. This tactic prevents a lot of command-line botches, and is far easier to document. See below.

Because it is possible to change the key macro on the command-line, you may use the percent-sign (%) as the value of the key specification in any header specification. So a another way to write the first example file above would be:

% %    SNUMBER
adm1.npcguild.org	5271009A
adm2.npcguild.org	FR1729CAB
...

The meanings of other macros

Any other macro has only the meaning you assign to it, or the meaning a processor coded on top of hxmd assigns to it. For example, the master source builder msrc uses many attribute macros; see the introduction to msrc for more details.

On a related note: sometimes a configuration file may contain attributes that you do not actually need or want defined. For example:

%SHORTHOST	%		OWNER		SNUMBER		COLOR
adm1	adm1.npcguild.org	`Bob Builder'	5271009A	silver
adm2	adm2.npcguild.org	`Tom T. Thumb'	FR1729CAB	black
...
If I do not want OWNER defined by this file, I need to find a way to direct that column to the bit-bucket. It is hard to parse the quotes in each line to replace the value with dot, but it is pretty easy to replace the word "OWNER" in the header line with dot. A sed spell or perl -pi -e fixes the header without dealing with the quoted values. By changing the header to:
%SHORTHOST	%		.		SNUMBER		COLOR...
We ask the configuration file processor to ignore the column with the owners in it. And it does. This is not often required when you control all the configuration files, but if you exchange data with other operational organizations you might have to filter their files.

In some cases you want a column to have a common value for most elements, but a unique value for a few elements. In that case set the macro to the common value, include it in the header, and use the token .. where you want the common value.

COLOR="black"
%SHORTHOST	%		OWNER		SNUMBER		COLOR
adm1	adm1.npcguild.org	`Bob Builder'	5271009A	silver
adm2	adm2.npcguild.org	`Tom T. Thumb'	FR1729CAB	..
adm3	adm3.npcguild.org	`E. Ripley'	LV426		..
...
This prevents the repetition of the common value, or allows the common value to be set with a -D from the command line or from a master source control file. The output from -o allows such global definitions by using double-dot to label any undefined macros requested for each host.

As with a dot, you can quote the double-dot token to remove the special meaning.

A common mistake is to change a common macro from a header to a fixed value in the wrong order:

%SHORTHOST	%		OWNER		SNUMBER		COLOR
...host definitions...

COLOR="white"
OWNER="CoreTech"
%SHORTHOST	%			SNUMBER
host10		host10.npcguild.org	4807526976
The error here is that the new header declaration removes the binding of both COLOR and OWNER after the fixed assignment. The header change must come before the fixed binding:
%SHORTHOST	%			SNUMBER
COLOR="white"
OWNER="CoreTech"
host10		host10.npcguild.org	4807526976

Attributes from more than a single source?

There are 5 ways to input an attribute macro for m4 processing.
-C config
All configuration files provisioned under -C create a definition for any undefined hosts, plus set any nonexistent macros for already defined hosts. The first definition for an attribute is the one that sticks.
-X ex-config
All configuration files provisioned under -X only set any nonexistent macros for already defined hosts. No hosts which are only mentioned in such a file are defined by their mention.
-Z zero-config
All configuration files provisioned under -Z set default values for macros. Normally all macro=value lines are discarded at the end of each file, under -Z the macros in effect at the end of the files are taken as default values. These are applied after every other file has been read.
the m4 option -D name=value
The -D and its parameter are passed on the m4 as given. Any macro defined under -D is visible from m4, but not visible to -B checks.

These are also recorded at the top of any -o configuration file generated by hxmd, unless the name is prefixed with an exclamation mark (!) or an octothorp (#, aka hash mark). When presented with the punctuation prefix the name is presented in the merged file as a comment line (without the value).

When visible and used as a configuration source for -C, -X, or -Z these macros are as visible as any other.

the command-line option -j's m4prep
Any macro defined in these files is visible from m4, but not visible to -B checks. There is no built-in why to convert an arbitrary m4 source file into a configuration file.

A special note about the default values provided under -Z: the values must not be bound to header attribute names. At the end of a configuration file, all the header attribute names are discarded (there is an example of this in the manual page for hxmd in section 5).

Explicitly returning to the default key before any default values are defined is a good idea, because of the restriction above. The common idiom in a file designed for -Z is:

...
# Default values reason
%%
MAX_SLIDE=`landslide'
MIN_SLIDE=`rule'

How do we elect hosts to process?

Now that we have a list of all the hosts and their attributes, we need to put them to work.

Some tasks may operate on a complete set of hosts; more often tasks want to work with a subset of the total population. There are 4 hxmd options that form an election process to pick a set of hosts for a task.

Yes, the best model to put here is an election. Each host can suggest a host that should be included in the task at hand. Most hosts suggest themselves, some hosts might suggest more than a single host, some might suggest none.

First a host must be defined in a configuration file. Usually that means under -C config, but in rare cases hosts might be defined under -Z zero-config. (The later is considered poor form by the author.)

The easy way to elect a group of hosts is to use a configuration file that only includes the hosts you want. This works well because any host that is never defined cannot be elected, and by default each host suggests itself.

hxmd -C site.cf ...

If you can't find a configuration file that has only the hosts you want, you can build a Venn diagram from multiple configuration files, using -B number. For example, the intersection between two files (ying.cf and site.cf):

hxmd -C ying.cf:site.cf -B2 ...

While the disjunction may be had with:

hxmd -C ying.cf:site.cf -B!2 ...

The first example selects hosts that are defined in 2 files, the second excludes those same hosts.

By using an attribute macro name rather than an integer, we elect each host that has that macro defined in a configuration file. The m4 option -D doesn't count as a definition of the attribute for this option (as it would apply equally to all hosts, making the specification useless).

The names parameter may specify a comma separated list of attribute macros that must all be defined for each elected host. Each may also be prefixed with and exclamation mark (!) to indicate that the host must not have the attribute macro defined. For example, to elect those hosts from both files with the attribute macro YANG defined:

hxmd -C ying.cf:site.cf -B 2,YANG ...

Some suggestions for additional forms of -B have been proposed and rejected.

-B THIS|THAT (or)
This is better coded undef -E as
hxmd -C site.cf ok=ifdef(`THIS',ok,ifdef(`THAT',ok)) ...
-B THIS^THAT (exclusive or)
This is better coded undef -E as
hxmd -C site.cf -E ok=ifelse(ifdef(`THIS',yes),ifdef(`THAT',,yes),ok) ...
-B -1 (all but one)
The issue here is that -Z configuration files may define hosts, so we don't know what the expected total is for any given host. This is just too error-prone.

Just the existence of a macro might not be specific enough, we may need to check the value. For that task, the option -E compares an attribute macro to a value as either a string or an integer using m4.

In the string case the compares expression begins with the macro name (name), then either an equal sign (=) or an exclamation mark (!), then an m4 expression. For example:

hxmd -C site.cf -E COLOR=blue ...

For an inequality, we would use the exclamation mark:

hxmd -C site.cf -E COLOR!none ...

Hxmd differentiates between string and integer operators by the spelling of the relational operator. The integer operator are taken from C: equal (==), not equal (!=), less-than (<), less-than or equal (<=), greater-than (>), and greater-than-or-equal (>=). At least the last 4 of these must be quoted from the shell (some shells treat exclamation mark as special as well, which is sad).

Like the string comparison the first word should be an attribute macro name, but may also be a signed number. Followed by the operator and an expression that m4's eval macro can resolve. For example, to elect hosts with an AGE greater-than 7:

hxmd -C site.cf -E "AGE>7" ...

Note that the macro name can be a complete macro expression with parenthesized parameters:

hxmd -C site.cf -E "COMPUTONS(CPU,NPROC)>1000" ...

Usually those matching criteria are enough to elect the correct list of hosts. However, there is a larger hammer we can use to describe indirect relationships.

Say that we want to elect the NFS server that a host depends on, rather than the host itself. If we have an attribute macro NFS_EXPORT bound to each host we can select those that provide an NFS export service with

hxmd -C site.cf -B NFS_EXPORT ...

But that only self-elects each host. Assuming that the attribute NFS_SERVER contains a host from the site.cf that the target host depends on we can use a guard under -G to nominate that host, rather than ourself:

hxmd -C site.cf -G NFS_SERVER ...
We might also include -B NFS_EXPORT to prevent the election of a host without the export attribute definition in-scope.

When our NFS servers are not listed in site.cf, we must add some other file to the -C list, otherwise they'll not be eligible, since they are not defined at all. Note that the guard expression can be an arbitrary m4 expression, and may include more than a single host (separated by white-space).

What order are hosts processed in?

As configuration files are read, the first definition of each host specifies the order processing. Any zero-config files are read first, then all config files, and lastly each ex-config. Files to the left (on the command line) are read before files of the same class given to the right. All this makes the order of processing hard to predict if any file in the list is not one your process created, or some command options are passed in environment variables.

When a specific order is required, the m4 markup specified by a guard may do more than elect a host, it wraps the elected host in markup to reorder it via m4's divert macro.

When we need to reorder a selection list, we divert each host to a selected diversion (read more about diversion in m4's manual page).

The command line options -Y and -j set up any macros needed for such logic. Each top specification is an m4 directive placed at the top of the election stream. This is usually used to include a file of macro definitions, which enclose the hosts elected by the guard specifications. It is really "poor form" to include a hostname in a top macro to manually elect it. (Which is not to say that is doesn't work; it just smells like it is there for the wrong reason, and maybe went bad due to poor hygiene.)

To express an example, we'll update my site.cf to include a COLOR:

# ksb's test list with colors
%HOST			COLOR
w01.example.com		blue
w02.example.com		red
adm1.example.com	blue
adm2.example.com	green

Then sort red hosts first, then blue, then anything else:

hxmd -P1 -C site.cf \
	-G "divert(ifelse(COLOR,red,1,COLOR,blue,2,3))HOST" 'echo HOST COLOR'

Which outputs:

w02.example.com red
w01.example.com blue
adm1.example.com blue
adm2.example.com green

We had to specify -P1, because the parallel factor tends to obviate the effects of such logic. On the other hand, the file creation processes are always done sequentially, in the order the hosts are elected. Note that such a sort is stable, which is handy under -o below.

The -j specification includes a common m4 markup file in each phase of the process. This file might contain macros to make the diversion process look simpler on the command line:

hxmd -P1 -j rainbow.m4 -C site.cf \
	-G "SpectrumOrder(COLOR)HOST" 'echo HOST COLOR'

How can I remember what was selected?

From time to time, a single instance of hxmd must execute another to update some other aspect of the target hosts in concert with the current task. In that case, it would be more than handy to remember the results of the current election. This would allow additional updates to target exactly the same population, or even a related set (subset, super-set, or complement) of hosts.

The command line option -o remembers a list of attributes for each of the elected hosts in the elected order. The filename containing the election results is available in the attribute macro HXMD_U_MERGED.

As an example I'll use the site.cf with the COLOR attribute (above):

hxmd -P1 -C site.cf -o "COLOR BLEACH" \
	-D BACKGROUND=cyan \
	-G "divert(ifelse(COLOR,red,1,COLOR,blue,2,3))HOST" \
	'[ %u -eq 0 ] && cat HXMD_U_MERGED'

The test expression [ %u -eq 0 ] && is a common idiom to limit the output to the first host elected, but not the only way to do that. See below. If you take that clause out, you should see multiple copies of the elected hosts. Here is the output with the clause in place:

BACKGROUND=`cyan'
%HOST COLOR BLEACH
w02.example.com red ..
w01.example.com blue ..
adm1.example.com blue ..
adm2.example.com green ..

Notice that no hosts defines the attribute BLEACH. We can collect attributes that are not set (yet) then merge in other configuration files (usually with -C or -X) to add their values down-stream. Because the output represents the undefined value with the double-dot token, the value may be specified later on the command line with a -D specification, or from a zero-configuration file given on the command line.

Note the -D option in the command: the value is represented above the header line. This is an attempt to preserve command line definitions, but it doesn't always do what you want. One reason is that such definitions are now defined for -B, another is that you might not want the command-line definition passed on; this is true for msrc's interface to hxmd. For both reasons one may suppress this feature with a prefix of ! (or #) on the specification as:

hxmd -P1 -C site.cf -o "COLOR BLEACH" \
	-D !BACKGROUND=cyan \
	-G "divert(ifelse(COLOR,red,1,COLOR,blue,2,3))HOST" \
	'[ %u -eq 0 ] && cat HXMD_U_MERGED'
which replaces the in-line definition with:
#BACKGROUND
%HOST COLOR BLEACH
w02.example.com red ..
...
The comment is there to help someone debug why the attribute was not preserved.

How do we process each node?

Both the control and files are run though m4 once for each host with the attributes in-scope for that host defined. Each control text itself is processed, while the contents of each of the files are processed into temporary files.

The m4 processes mentioned above are provided the m4-opts provided in the command-line specification given to hxmd, then each of the m4prep files provided under -j, then the text to be expanded on stdin. The m4prep files serve the same purpose as -Q or -Y headers lines: they allow the inclusion of some initialization code at the start of every m4 filter run. It would be poor form to produce any text at all from any of these files. (They are files because m4 doesn't have a command line markup specification parallel to our -Y.)

After that processing is complete, the set of parameters is passed on to xapply for execution. The output from control is sent as text, while the name of the temporary file is sent for each of files.

Normally the parameters after control are interpreted as filenames by hxmd. Under -F, more (or fewer) of those parameters may be taken as literal text. The default value for literal is 1. That expresses that a single parameter (the left-most) is to be processed as text, and all remaining strings are files.

The reason one might want to change this centers on using xapply's dicer. Using the dicer (or mixer) on the temporary filenames hxmd generates is less than useful. Using those tools on an attribute macro might be very useful. With appropriate use of -j, a substantial amount of data may be collected in a short expression on the command-line.

hxmd -I -- -j report.m4 -F2 "echo HOST:%[1,2]" "report(HOST)"
In this example, the dicer expression after the echo command uses the output from the report function. I'm assuming that the file defs.m4 contains the definition of the report function, and that this function is useful enough to share with other team members. That way the second text parameter may use it to generate a well-formed report on the host, without hard coding it on the command-line.

Let's change site.cf to replace COLOR with the old distrib macro SHORTHOST. Here is a new site.cf, with an error in it:

%HOST SHORTHOST
w01.example.com w01
w02.example.com w02
adm1.example.com adm1
adm2.example.com adm1

This was a common error in older configuration files, where one builds SHORTHOST by hand. To compare the SHORTHOST of a host to the part before the first "." in HOST we might use:

hxmd -F2 -C site.cf \
	"[ _%[1.1] = _\`'SHORTHOST ] || echo HOST %[1.1] SHORTHOST" HOST

That code does output the expected results (adm2.example.com adm2 adm1).

There are 2 other cases where setting -F might help: setting it to zero and to a negative number.

Setting literal to zero forces the control parameter to be interpreted as a file, rather than literal text. This script is only executed after it is processed thought m4. This allows a long control command to be kept in a file, under revision control.

Setting literal to a negative value takes arguments on the right as literal values. This allows a script (as above) to bind attribute values to positional parameters.

Adapter logic between hxmd and xapply

Xapply requires a fixed cmd parameter, but the control parameter hxmd has expanded through m4 processing might be unique for each host elected. Looking at the traditional xapply usage it looks like bridging that gap would be very hard, and it would be without the xapply expansion %+. The "shift and eval" function %+ provides is explained in xapply's section on %+.

The hxmd command line option -c allows the specification of a replacement for the default %+ that hxmd uses to point xapply at the filtered control parameter.

As an example we could force a date command before and after each control:

hxmd -c "date;%+;date" -C site.cf 'echo HOST'
or to wrap an op (or sudo) around each command:
hxmd -c "op %+" -C site.cf 'echo HOST'
This is usually not needed, even when you script a very complex hxmd process. A driver shell command built on hxmd might use this hook to setup and tear down structural elements.

When hxmd calls hxmd

Some m4 attribute macros are defined by hxmd to help one hxmd call another instance:

HXMD_OPT_C
The list of files read under -C. The paths are converted to absolute references so that a change of working directory will not (by default) change the file referenced. When stdin is referenced, it is copied to a temporary file, and that filename is used in place of -.
HXMD_OPT_X
The list of files read under -X. With the same path conversions as above.
HXMD_OPT_Z
The list of files read under -Z. With the same path conversions as above.
HXMD_U_MERGED
Under -o this is the file of merged attributes for the unique hosts elected.
HXMD_U_COUNT
The count of the total number of unique hosts defined.
HXMD_U_SELECTED
The count of the number of unique hosts elected.

Other than HXMD_U_COUNT, these are only defined when hxmd had valid data for them. For example, HXMD_OPT_Z is left undefined when no file was read as a zero-config.

The phases of m4 processing

As hxmd builds m4 streams to process the various files, it includes a synthetic macro HXMD_PHASE to help included files self-configure for the context in which they are included. This is actually mostly useful when you are debugging complex processes that are based on this stack.

The macro can have any of these values:

selection
The current m4 stream is part of the selection of target keys.
integer
As each of the files is processed this integer increments (starting at zero).
filter
The current m4 is constructing the exit code list. (See the next section for the retry process flow.)
redo
The current m4 is constructing redo command.

As a simple example, we'll output the phases for 2 positional "files" (actually literal strings to make it fit on a single line):

$ hxmd -Cauto.cf -E HOST=localhost -F2 'echo HXMD_PHASE' 'HXMD_PHASE'
0 1

How might I retry failed processing?

Updates processed by hxmd usually depend on the target hosts being available. When a host fails an update, it would be great if an agent process could ask for a redo.

To detect failed commands we need to examine the exit codes from each of the control processes and take an action on that list. To do this, hxmd processes two m4 streams: a list and a command. Hxmd doesn't specify the format of the list being constructed: a shell script, or a configuration file in the hxmd style, or many other text data file formats are all possible. No limits are placed on the the command either, it may have a loader line to run any command processor.

In the context of these m4 streams, several synthetic attribute macros are defined by hxmd:

HXMD_U
The value that xapply gives for %u for each host.
HXMD_STATUS
Usually the exit code for this host's process. If the m4 processing for a command failed, then the code will be 1000 plus that exit code. If any m4 process (of the command itself) exits with a signal, then the status value is 2000 plus the signal number. When xapply receives a USR1 signal it short circuits any remaining tasks (use kill -USR1 %p to trigger this in your command). Any short circuited commands are assigned the synthetic status 3000.
HXMD_0
The name of the list file we are building. By default this is a 3 column list of "HOST HXMD_STATUS HXMD_U".

The list is generated in much the same way the original election process generated the list of elected hosts. In this case, we start with the list we elected and select for the hosts that we need to retry, so not every host needs to be passed on to the command. Normally, this file is used as input data to the command. This lets the retry command act as a filter for the list, if needed.

To frame the list, each ward specified on the command-line (under -Q) is output at the top of the m4 input stream, as -Y's top directives were in the election process.

A block of m4 markup is output for each elected hosts. It starts with the attribute definitions defined for the host, then the attributes listed above, and all -r options to form the selection criteria. Also included are all the macros defined in the recursive call above. This script is run under HXMD_SHELL, SHELL, or /bin/sh, which ever is first available of that list.

The filter command is sent though an m4 process with the same attributes defined, except for the host specific attributes. The results of that stream are the retry command.

If the command starts with a pipe (|) then HXMD_0 is used as stdin and the pipe symbol is removed. Otherwise the command should read (incorporate) HXMD_0 to process the list of selected hosts.

The default value for filter is a shell command to page the generated list. The default list contains (in order) the values of HOST, HXMD_STATUS, and HXMD_U. To see those defaults specify -Q as a no-op, like dnl:

hxmd -Q "dnl" -C site.cf  ... :

No hosts elected

Another situation in which one might view as a retry case is the case when no hosts were ever elected.

The xapply option -N triggers when no hosts are elected, but the option is not run through any m4 processing. In this case, it is more typical to code a script with a name like fallback:

hxmd -N "./fallback %0" -C site.cf  ...
Such a script should exist before the hxmd process begins. The exit code from this process is passed back from hxmd.

Another common fallback is to use a make target, to trigger the build process only if no hosts are elected:

hxmd -N "make fallback %0" -C site.cf  ...

Other tricks and abstractions

I wouldn't be doing hxmd justice if I didn't put some more context here: we don't use it in isolation, it is part of a larger tool-set. So here are some of the linkages.

Fetch an element by number

Once in a while, it is handy to be able iterate over the hosts in a configuration file outside of hxmd. I almost never do this, but in the next section we'll look at a case where I just want the first element from the configuration to take an action, which amounts to the same thing.

Let's code a guard to extract a host from the current configuration file by number. Let's start with number of the key to match on the command-line as a define specification to m4:

$ hxmd -DPick=1 -C ...
Then we should add the logic to make that into a loop counter:
hxmd -DPick=1 -G "define(\`Pick',expr(Pick+0-1))ifelse(Pick,0,HOST)" \
	-C ... 'echo HOST'

The best part of that logic is that it works with -E and -B. If you add a limit on the COLOR of the host, it will pick the N-th one of the color you selected. This allows an external program to index a configuration file by a counting number, which is a clean abstraction. I almost never use it myself. The end of the loop is the empty string, or you can fetch the total number of hosts selected with the first key request, that's up to you.

You could write a guard to pick the odd or even hosts from the list even easier. Note that you can't use HXMD_U because it is only defined in phases after selection.

Forming a posse

Currently hosts are pretty cheap, and virtualization allows us to construct new images almost at will. Once we have hosts running we need to be able to group them into abstract clusters, clouds, gangs, teams, or whatever you want to call it: I call them posses.

As an example, we'll base posse membership on the services configured for each host. This is an abstraction I use a lot to manage both applications and support facilities on my hosts. You can create posses from other data sources, but since we are using the tools at-hand we'll stick with this method in the examples here and in msrc's documents.

The service feature allows each host to specify a list of "services" attached to the attribute macro SERVICE. The list is space-separated and most of the services are lowercase words (by convention). Also, the underscore (_), colon (:), comma (,) and the commercial at sign (@) characters are never part of a service name.

The support macro SERVICES answers the question, "Does the current host support the service named $1?"

For example, the host "sulaco" supports the "httpd" and "msrc" services in this configuration file:

SERVICES=`ifelse(-1,index(` 'SERVICE` ',` '$1` '),0,yes)'
%HOST
SERVICE="http msrc"
sulaco.example.com
SERVICE="msrc"
nostromo.example.com
SERVICE="terraform lunch"
lv426.example.com

Because each host only knows the list of services it supports, it is hard to check from within the processing of any host to ascertain if any other host supports a given service. We'll talk about host to overcome that in msrc's description of posse indirection.

We need some script leverage to help us build posse list at run-time. I'm going to make the script general enough that you could use any m4 and macro attributes at hand to make your posses. We are going to call mk with the marker Posse, and a submarker of the name of the posse we want to extract. The command selected needs to output the list of member hosts for the given posse to stdout. By adding the code to the the configuration itself, it will always (we hope) know how to extract posse lists from itself (the idea being that a single file should agree with itself, while the script and any given configuration file might easily drift).

In then example above, we insert this spell near the top of the configuration file:

# $Posse(*): ${hxmd-hxmd} -C%f -P1 -E "yes=SERVICES(%s)" 'echo HOST'

We may later add specific marked lines before the default glob match for services that are not covered by that default spell, but we'll leave that for another time. See the mk HTML docs for more on why mk is so useful.

There is example code for this in the dmz.sh script, in the if block with the only export command in it. I'll build a shorter example here:

#!/bin/ksh
# Posse script: usage service-list configs
sList=${1?'missing service-list'}
shift
set -e
MK=
for cFile
do
	cPath=$(efmd -C $cFile -T HXMD_OPT_C dnl)
	xapply "mk -smPosse -d%1 $cPath" $(echo $sList |tr ':,' '  ')
done |oue

In the example above, we use efmd to filter change the specification of a configuration file into a path in the filesystem. We've not yet talked about efmd, but it is a report generator that take most of the same options as hxmd but doesn't launch as many shell processes. In the real code, we use efmd to extract the path to each configuration file. The call mk on the configuration files to extract the posse membership lists. That process might use any tool it needs, and any attribute in the elements to extract the list.

In Real Life ™, we would also use efmd to extract the posse membership list, because it is way faster:

# $Posse(*): ${efmd-efmd} -C%f -E "yes=SERVICES(%s)" -L
By filtering all those lists through oue, we get a list of only the unique hosts the (see HTML document on oue for details). This outputs all the hosts in the "apache" posse from dmz.cf once when run as:
$ posse apache dmz.cf

Listing all the unique values of any attribute

By changing the key macro, we can form a unique list of any attribute's values.

For example:

hxmd -Csite.cf -k HOSTTYPE 'echo HOSTTYPE'

This also has the feature that elements that do not define a HOSTTYPE output an error on stderr.

Avoiding resource impact on the source host

Hxmd takes care not to "fork bomb" the local host. This is a real issue for programs like ssh which consult (and update) a common file, as the contention for that file may reduce performance dramatically. The logic behind the slow-start code is explained in the manual page, as is the command line option -s which disables it.

Like most of my tools, the option -n prevents execution of the processed control directive. Unlike other tools, this output ends up on stderr. This is because xclate is translating the stdout from xapply to the widow stream, because it is not part of the output from any managed task.

Passing the -W option to xclate on the command line will direct those messages to another file:

hxmd -Csite.cf -W /tmp/todo.sh -n 'ping HOST'
cat /tmp/todo.sh

The file should be a clean script, because, with -n set, there is no way to get other widow output from xapply. Be aware that the script doesn't have the same effect as the original source, for one thing any cd commands are cumulative, and any logic that exit's the script aborts all update actions.

More importantly, the invariants established by ptbw are not enforced. This makes use of this output as a shell script quite unsafe.

Changing the key attribute name with -k is normally only done by a structure built on top of hxmd. See below.

Complex boolean logic is not hxmd's strong point, but with -o and a pipe we may build any conjunction or disjunction needed. Use the HXMD_U_MERGED file as output from the first hxmd:

hxmd ... -o "" '[ %u -eq 0 ] && cat HXMD_U_MERGED'

Another way to do this is with efmd, which is an extraction filter that processes the same configuration files as hxmd. See the HTML document for a few examples. This tool might not be installed on your system, as it is an add-on to the msrc tool-chain. It also can produce a merged configuration file on stdout, under the correct options, which is useful in recipe files that merge multiple realm's configurations for a common task.

This does have the bug that hxmd still forks a process for each host, all but 1 of which simply exit.

Another very clever trick is to use the -K option to save the HXMD_U_MERGED file produced by -o. For example:

hxmd -o "" -K "cp HXMD_U_MERGED $TFILE" ...

This limits the use of -K for any actual redo logic.

The command-line configuration file options (-C -X, and -Z) all accept the directory name -- (double-dash) as a synonym for the first absolute path in the HXMD_LIB environment variable, or the built-in default path.

With that in mind we note that when a specification for a configuration file is given as a directory it is taken as a request for the name of the program suffixed with .cf (under -C), .xf (under -X), or .zf (under -Z). With those 2 rules we can use:

hxmd -Z --:my.zf ...
to ask for both the default zero configuration file, and my.zf. Sometimes you just want to add a feature to the site configuration, not override the whole thing.

Forward looking options -- gtfw

There is a forward-looking option (-g) to ask hxmd to wrap itself in a gtfw, if there is not presently any gtfw diversion open in the environment. For the purposes of this document that feature is not important, later in the appropriate document we'll explain why that is here.

Using ptbw options to allocate resouces to each task

Ptwb allows tasks to access require limited resources, see a the HTML document for ptbw. The key options hxmd passes down to xapply are -A, -J tasks, -R req, and -t tags. There are two ways to access the resources ptbw allocates for a each task. One is xapply markup (viz. %t), the other is the append mode under -A. I prefer the append mode, because that is more useful under msrc.

Under an explicit -c cmd specification hxmd doesn't tinker with the command template provided. Otherwise it tries to build a template that does what you meant, in light of the SHELL set (sh, csh, and perl are all recognized) and the presence of -A.

Under -A tokens are passed as unevaluated parameters

The cmd template built when append mode is set adds an expression on the end of the command to catenate the ptbw tokens as additional parameters on the end of the expanded command. Under sh this would be the string " $@", under csh this would be the string " argv[*]", under perl this would be the string " @ARGV", the default is to be Bourne shell compatible. That way, when xapply appends the tokens as postional parameters, they get appended to the command as parameter. This works well for most applications.

I'll use auto.cf to do some examples:

$ SHELL=/bin/csh hxmd -dX -AR2 -Cauto.cf -EHOST=localhost 'echo HOST'
hxmd: xclate -ms -N >&6 -- xapply -fzmu -sAP6 -R 2 %+ $argv[*] -
localhost 0 1

$ SHELL=/bin/sh hxmd -dX -AR2 -Cauto.cf -EHOST=localhost 'echo HOST'
hxmd: xclate -ms -N >&6 -- xapply -fzmu -sAP6 -R 2 %+ $@ -
localhost 0 1

$ SHELL=/usr/bin/perl hxmd -dX -AR2 -Cauto.cf -EHOST=localhost 'print "HOST"'
hxmd: xclate -ms -N >&6 -- xapply -fzmu -sAP6 -R 2 %+ , @ARGV -
localhost01

Note that the last example doesn't output quite the same string, to make it a work-alike we should define a subroutine to make it pretty:

$ SHELL=/usr/bin/perl hxmd -dX -AR2 -Cauto.cf -EHOST=localhost 'sub pretty {
	print join(" ", @_), "\n";
}
pretty "HOST"'
hxmd: xclate -ms -N >&6 -- xapply -fzmu -sAP6 -R 2 %+ , @ARGV -
localhost 0 1

When you provide litteral references (as $1, $argv[1], or $ARGV[0]) in your control specification, you may not want hxmd to add suffixes to your cmd template. In that case, specify -c%+ to suppress the automatic generation code triggered by -A. Note that if you changed c (under -a) then you'll have to adjust the % to compensate.

Tokens are passed as active parameters (under -J, -R, or -t) and without -A

Without a -A specification hxmd appends xapply markup similar to " %t*" (adjusted for c) to the end of the cmd template it builds for xapply. This also copies the postitional parameters to the command, but they are also evaluated by the shell (where they are not evaluated under -A).

That means that a token containing backquotes would be evaluated by the shell (but not m4). Also any environment variables will be expanded. I use a set -x to help trace such evaluations by the shell:

$ echo '`hostname`' >/tmp/my.cl
$ echo '$TERM' >>/tmp/my.cl
$ SHELL=/bin/sh hxmd -dX -t/tmp/my.cl -R2 -Cauto.cf -EHOST=localhost 'echo HOST'
hxmd: xclate -ms -N >&6 -- xapply -fzmu -sP6 -R 2 -t /tmp/my.cl %+ %t* -
localhost nostromo.example.com xterm

$ SHELL=/bin/sh hxmd -dX -t/tmp/my.cl -R2 -Cauto.cf -EHOST=localhost 'set -x;echo HOST'
hxmd: xclate -ms -N >&6 -- xapply -fzmu -sP6 -R 2 -t /tmp/my.cl %+ %t* -
+ hostname
+ echo localhost nostromo.example.com xterm
localhost nostromo.example.com xterm

Replacing m4 markup with a make recipe (the shell)

If you can't build the file you want with m4, you can fall back to a make recipe by the specification of a directory in-place of a file on the command-line. The directory must contain a marked-up recipe file named Cache.m4.

Using this feature might drive you to change the preload specification on the command-line. Usually the 2 slot preload is enough to keep process creation running smoothly. If some cache updates take longer than the control task, then a specification of a larger preload might help a little. Usually it doesn't help, since task file creation is defined as a sequential operation, but it does keep the process creation tasks flowing while any control is blocked.

Cache control markup

The Cache.m4 recipe file is marked-up with m4 so it can be customized for each target. All of the common markup available in the context of a file specified on the command-line is available. In the context of that file, there are 3 additional macros defined:
HXMD_CACHE_RECIPE

This is the absolute path to the temporary file built by running m4 over Cache.m4 in the context of the current element. This allows the recipe to refer to itself for recursive calls to make, should any be needed. It also allows for the use of mk markup in the processed recipe file.

This is also really useful for debugging, as you might cat the processed recipe file to stderr.

HXMD_CACHE_TARGET

This is the target provided to make to produce the requested content. It is made from some string manipulation on the directory path specified.

We trim the path to the basename (only characters to the right of the last slash /). Then we remove any leading dots (.), and any extender after the right-most dot. If that string is empty, we replace it with the value of the key macro. For example, here are some common idioms:

Path Resulting target Description
/tmp/ksb ksb The last component of the directory path is the target recipe
/tmp/ksb/. key The key macro for the host is the target recipe
/tmp/ksb/.all all Use the generic all, usually a symbolic link alias to a directory
/tmp/ksb.host ksb Use the .host suffix from msrc to force the directory into MAP rather than SUBDIR

In the .all case above the node, .all is usually a symbolic link to . -- which accesses Cache.m4 from the current directory. In other cases, it is a symbolic link to a common cache that multiple spells share and reuse. These are both a bit hackish, but it does exactly what you want at very little cost. It is not common to build an actual cache directory with a leading dot in the name.

HXMD_CACHE_DIR

The path to the cache directory. This allows reference to files in that directory via m4 include or paste directives from Cache.m4.

This hides more of the details of the cache inside the "object" itself. Changes to the local policy for a given cache should be local to the cache directory and as opaque as needed to the client spells. Since the build process chdirs into this directory, it is largely not needed for anything else. We try not to build "come from code" into any cache logic, as that would be poor form.

Per-target cache updates and generation

Under the various paths from the table above, each of the listed targets are updated. Additional targets might be enabled with symbolic links, or by creating other directories that chain back to the original directory.

The utility of the structure comes largely from the arbitrary shell pipeline that make launches. To keep our examples easy to understand, I use echo commands below. We could use, for example, a call to efmd with -C HXMD_U_MERGED, a remote access to the target host to fetch data, or consolidation of previously generated HXMD_1 and HXMD_2 files.

Merging multiple data sources with a cache recipe is really powerful. There is almost nothing you cannot build with this tactic.

Given the example above the /tmp/ksb/Cache.m4 file might contain:

dnl $Id: Cache.m4,v ...
`# Updated in the context of this directory, m4 processed for each target
all:
	echo "Everybody needs a common cause"

ksb:
	echo "called remotely for 'HOST`"

nathan.stack.org:
	echo "This is for Mark Twain"

generic_fetch:
	make -f Control $@
'dnl

The generic_fetch target represents any file that is not custom for each host, but might be managed by the common logic below (usually listed in FETCH).

Per directory cache control logic

There is a cultural notion that the per-target marked-up recipe should be partnered with a plain-text recipe file Control, or Control.mk. This file provides targets to control the cache itself: common data gathering, purge operations, reset operations, and sanity checks. That way, no instance of m4 is required to perform these common tasks. The cache recipe might know how to chain to the control recipe for some operations, or even paste the whole file into the processed control file.
# $Id: Control,v ...
# cache control recipe file for ntp.conf, msrc 2009
FETCH=	interface.cl

quit: FRC .SILENT
	echo 1>&2 "Use msrc (or dmz) to build from the parent directory."
	false

init: ${FETCH}

# All the FETCH target logic...
interface.cl:
	rsync -arH ... interface.cl

purge: FRC
	rm -f ${FETCH} temp.cl a.out core *.core errs lint.out

FRC:

Notice that the only silent markup (.SILENT or leading @ on individual commands) used in this file is for the quit target. When hxmd executes make, it specifies -s on the make command line. When you explicitly mark a target as silent, it is much harder to debug it as you test the recipes.

The default recipe file in a cache directory

It is not typical to use the default recipe name (viz. Makefile) for the tasks above. The default recipe file is usually used to recover files from the local configuration management structure and keep msrc from acting on the cache directory:
# $Id: Makefile...
# Keep msrc from acting on this cache directory.
INTO=	_Not _an _msrc _directoty, _rather _a _cache _directory

GEN=
SOURCE=	Makefile Cache.m4 Control ...

quit: FRC
	@echo 1>&2 "Use msrc (or dmz) to build this."
	@false

source: ${SOURCE}

${SOURCE}:
	co -q $@

FRC:

# hook for msrc to build stuff before a push
__msrc: source

The __msrc target might be used from the parent directory to setup for a push, but any command-line use of msrc in the directory outputs:

msrc: Makefile: Not an msrc directoty, rather a cache directory

Cache cleanup triggers

Usually, the parent recipe file has a cleanup trigger that cleans all the subdirectories. This is often a hook in Msrc.hxmd under the msrc structure, but could be done in a shell script that calls hxmd in at least 3 ways.

These usually hook into the control recipe, rather than the cache interface:

Use an explicit command at the end of the script.

cd ${CACHE_DIR} && make -sf Control clean

Use the redo command to clean the cache, as needed:

hxmd -K "cd ${CACHE_DIR} && make -sf Control clean" ... ${CACHE_DIR} ...

But it is possible to use -K to process the marked-up cache recipe in the context of the redo command. For example:

hxmd -r "define(\`END_GAME',1)include(${CACHE_DIR}/Cache.m4)divert(-1)" \
	-K "make -sf HXMD_0 clean" ... ${CACHE_DIR} ...

In that example, the cache recipe may have markup to catch END_GAME, or not. This spell doesn't scale out when there are multiple caches in a single instance. In that case, you'll have to apply some other structure. Under modern hxmd versions "END_GAME" may be replaced with a check for HXMD_PHASE having the value redo.

The cache could also generate a cleanup script as it collects data, then a hook in the structure could execute the script to purge the cache. In other cases, efmd might create a passible recipe file from Cache.m4, or cleanup commands from a list of targets processed by the cache, (viz. from a copy of HXMD_U_MERGED).

The last idea is to have the init target reset the cache at the top of each run. This wastes some space, as the cache is never clean for very long, but disk-space is usually cheap. Leaving the bread-crumb trail has the upside that it is easier to debug the cache logic after any faulty update.

Read and update logic

The other use for the cache directory is "read and replace" or "fetch and update" logic. In the context of Cache.m4 we know that we are accessing each host in turn. We might reach out to the host to fetch the current contents of a file we want to update. In the example below, I use /etc/motd to show how to fetch a file.

To fetch the message of the day: build a directory motd.host and a cache recipe in motd.host/Cache.m4 like:

dnl $Id...
`motd:
	ssh -o "BatchMode yes" 'ifdef(`ENTRY_LOGIN',`ENTRY_LOGIN@')HOST` -nx cat /etc/motd
'dnl

Test that with a request to cat the directory via hxmd:

hxmd -d CD -Csite.cf -E SHORTHOST=sulaco cat motd.host

By adding -d CD to the hxmd command-line (above), we see the intervening process called make with this invocation:

... chdir motd.host
make -f /tmp/hxtfkXIhHF/makeJ7yVh7 motd >/tmp/hxtfkXIhHF/My/Cache.m4

That make ran:

ssh -o "BatchMode yes" sulaco.example.com -nx cat /etc/motd
to get the message of the day file from the selected host into a temporary file.

It always looks strange to cat a directory on the command-line, but because hxmd replaces that directory with the output from the logic above it all works fine. Moreover, we can process that file with any filter we like to update, the send it back to the host if we like. That's more of what msrc does, which is why we named the directory motd.host (so msrc would send it back to the target host as motd), but that's for later.

A working example -- /etc/hosts

A good example of a cache configuration task would ber building the /etc/hosts file for each host in a large population. This is a good example because we have to run dig or host to map the HOST to its IP address, and a make recipe is just better than m4 for that task.

Once a machine is up on the network it really doesn't need much from the hosts file; after all it has a nameserver to chat with to get name-to-IP and IP-to-name mappings, right? But while the host is booting, it doesn't have a network to look up IP addresses to configure the network interfaces and routes, and that goes double for host hosts that run the nameservers.

To prevent a deadlock when all the machines in the data center need to restart after the lights come back on, we need to have a minimal /etc/hosts installed on every host. By "minimal" we mean it needs to have the host's interfaces mapped back to the FQDN of the host, and "localhost" mapped for the loopback address.

Let's build a prototype hosts file with a cache directory (I would call this file hosts.host/Cache.m4):

`# $Id...
# Output a minimal /etc/hosts to install to get the network going.

'HXMD_CACHE_TARGET`:
	echo "# hxmd generated proto hosts file for 'HOST`"
	echo "127.0.0.1	localhost 'HOST ifdef(`SHORTHOST',` SHORTHOST')`"
	dig +short A 'HOST` |sed -n -e "s/^[0-9.:]*$$/&	'HOST ifdef(`SHORTHOST',` SHORTHOST')`/p"
'dnl

That has 4 levels of markup: for m4 (the `quotes'), for make (the $$), for the shell (the "quotes" and the pipe |), then the regular expression for sed. You can give yourself a gold star, if you picked all that up the first time you read the spell. Example output for my workstation:

$ hxmd -Csite.cf -E view=SHORTHOST cat host.hosts
# hxmd generated proto hosts file for view.example.com
127.0.0.1	localhost view.example.com  view
10.6.26.7	view.example.com  view

Local site policy might add other lines to the recipe: our default router, our NFS server, our time-base, or any hosts on an RFC1918 address space directly connected to a network we manage. All that logic would be added to the recipe file maintained here. This spell is a necessary part of getting the data center going after a total shutdown, which is easier to build before you need it (rather than by-hand on each machine's console). It really helps at my home, where the power fails about once a month.

Of course the output file gets stale as IP addresses change, but it might be refreshed (or compared to a refreshed copy) once a week or so. That way we can keep stale data from making the cold-start problem even worse. (Surely I know about DHCP, but both your DHCP and DNS servers need some static addresses to get started, and while they are coming on-line your other servers should be able to get past the first ifconfig or route command in their startup scripts.)

Configuration files with static addresses in them are a Bad Idea, but the few that we have to keep should be built with automation, not with an editor on each host.

Build on the stack hxmd provides

Hxmd is designed to be enclosed in a script, or other program. The number of command-line options one must type for a complete specification is daunting, and the chances of a spectacularly bad result from even a small typing error are high.

By putting a more polished interface (even a GUI) between the Customer and hxmd, one might be able to craft a harness for the power, without getting the moving parts too close to anyones fingers.

Hxmd reads either of 2 environment variables for additional option specifications to provide a unique form of encapsulation assistance. The standard HXMD specification is handled as any other of ksb's tools, that is the options specified in that variable are processed before the explicit command line options, and the variable remains in effect.

The HXMD_PASS environment variable is read in preference to the HXMD variable when it is set to a non-empty string. This has 2 effects: first it suppresses the Customer's normal environment specification, second it allows encapsulating programs to specify options of their own design.

The command line option -z removes the HXMD_PASS variable from the environment of every inferior process hxmd starts. It is typically set in the HXMD_PASS specification. For an example, see msrc's facility that reads hxmd options from files that end in ".hxmd" (viz. HXINCLUDE).

Since -z removes the pass variable, the Customers use of $HXMD is restored for any descendent processes.

For more compatibility with distrib

Build a file in /usr/local/lib/hxmd named site.zf which defines the local hosts type (MYTYPE, MYOS) then use -Z site.zf to get the backwards compatible distrib definitions. Keep that under a local master source policy, and/or build it from your local master copy of hxmd/lib.

Here is an example site.zf file:

MYOS=`80100'
MYTYPE=`FREEBSD'
And an example usage to produce a list of hosts with our OS type:
hxmd -Z site.zf -C site.cf -E "MYTYPE=HOSTTYPE" 'echo HOST'

Any script built on top of hxmd might set the zero-config policy file, and config file by default, like the dmz.sh script that is built on top of msrc.

Bugs

Processing any host named dnl or dumpdef is going to hurt. The use of m4 limits the possible values of every attribute.
$Id: hxmd.html,v 1.60 2012/08/12 17:04:04 ksb Exp $