To understand this document

This document assumes you are quite familiar with the configuration and command-line of hxmd(8).

What is `efmd`

Efmd is a macro processor based on m4. Unlike hxmd, which runs a shell command for each configuration element, efmd filters a stream of macro expanded text, once for all of the selected hosts. This tactic reduces the CPU time spent formatting simple reports, versus the overhead of fork'ing all the echo shell processes that hxmd requires for similar tasks. It also creates the output lines in a stable order: due to the parallel nature of hxmd output lines tend to come out slightly permuted.

The attributes bound to each host are interpreted identically in both hxmd and efmd to select hosts and customize macro expanded text. Other tools based on hxmd also reuse the same logic (msrc, mmsrc, and newer versions of distrib).

Shell interface

The command-line usage of efmd resembles hxmd, but doesn't include all the options from the xapply wrapper stack. Here is the usage from the manual page:

efmd [-n] [-B macros] [-C configs] [-d flags] [-D m4-option] [-I m4-option] [-U m4-option] [-E compares] [-F literal] [-G guard] [-j m4prep] [-k key] [-M prefix] [-o attributes] [-T header] [-X ex-configs] [-Y top] [-Z zero-config] [arguments]

Or for just a list of the key macros selected:

efmd -L [-B macros] [-C configs] [-d flags] [-D m4-option] [-I m4-option] [-U m4-option] [-E compares] [-G guard] [-j m4prep] [-k key] [-M prefix] [-o attributes] [-X ex-configs] [-Y top] [-Z zero-config]

Or for on-line help:

efmd -h

For common ksb-style version information:

efmd -V

In that usage the 3 highlighted options (-L, -n and -T) are not common to hxmd.

The configuration file format is exactly the same as mmsrc, hxmd and distrib (which is the whole point).

See the hxmd HTML document for more details about host selection and configuration. I'm going to assume that you have used hxmd to generate simple reports with echo and want to make those faster (or more efficient).

How to use `efmd`

Assuming you know how to use hxmd you should use efmd to optimize any code that just reports values from the configuration file and fixed text. This includes merges or extracts of the configuration files themselves.

For example, this spell is pretty common:

hxmd -C my.cf -B SPECIAL 'echo HOST'

That spell works fine, but also starts a sh for each selected item. With hundreds of selected items, the xapply/xclate and shell processing is all over-head for little work.

If might rephrase the spell as an efmd as:

efmd -C my.cf -B SPECIAL -L

For about 1,000 selected items, the results of those 2 spells are about the same (given -P1 they would be exactly the same). But the expense for the 2 runs are very different:

Command Wall clock User CPU System CPU

hxmd 8.46s real 4.49s user 17.98s system

efmd 0.42s real 0.36s user 0.06s system

Command	Wall clock	User CPU	System CPU
hxmd	8.46s real	4.49s user	17.98s system
efmd	0.42s real	0.36s user	0.06s system

The four-tenths of a second that is common is the time spent selecting the hosts, the rest of the hxmd cost is the time spent setting the xapply machine up and running all those shells and m4 filters (it is still quite speedy given the number of processes executed).

In the comparison above, I cheated and used the "show me the keys" option to efmd: I should have asked for the HOST macro to be more fair; that wording of the same spell would be:

efmd -C my.cf -B SPECIAL HOST

Which added another line to the table:

Command Wall clock User CPU System CPU

hxmd 8.46s real 4.49s user 17.98s system

efmd -L 0.42s real 0.36s user 0.06s system

efmd HOST 0.80s real 0.78s user 0.05s system

Command	Wall clock	User CPU	System CPU
hxmd	8.46s real	4.49s user	17.98s system
efmd -L	0.42s real	0.36s user	0.06s system
efmd HOST	0.80s real	0.78s user	0.05s system

The additional four-tenths second is the time it takes to process a second m4 filter to expand HOST for each selected definition.

Another similar use for `efmd`

When someone leaves a cryptic macro in a configuration file that you can't understand it may be handy to see the output m4 without running any dangerous shell commands:

$ efmd -Cother.cf "HOST IS_ZMD(DEFNET)"
imp .
nostromo yes
lv426 yes
sulaco yes
...

That output may tell you which part of site policy the macro belongs to, or it might leave you wondering why you even tried to figure it out. It is largely up to your local site policy how they plays out. No volume of comments will explain bad code, and most good code needs the why more than the how. If the output from the macro plus the name of the macro doesn't help you, then you don't know why it was coded -- so what is does (how it does it) isn't going to help anyway.

How is this processed?

Efmd selects hosts just as hxmd does: generating an m4 guard markup stream from the command line options -Y, -B, -E, and -G for the items selected from all the configurations specified under -Z, -C, and -X. It reads the output from that macro processor as the list of keys to select.

At this point under -L, we have the output, which is the list of selected keys. So we output those and exit.

Otherwise, we build a stream from the -T specifications and the command-line arguments. Under -n this stream is sent to stdout directly. Otherwise, an m4 output filter is pushed on to stdout to process the markup into the requested report.

The header specifications are output to stdout followed by a stanza for each selected entry. No footer is provided as the m4 macro m4wrap allows for that.

Each stanza starts with a list of pushdef macro calls to define the attributes of the current selection; the command-line parameters are catenated to this. Then, popdef macro calls withdraw the definitions of the attributes. The next stanza follows directly.

We handle the command-line arguments for each stanza as hxmd would (using -F to differentiate literals from filenames). Using open(2) to get to the contents of each file. We never cache the contents of each file, we re-read the file for each selected item. If the file is a FIFO we'll get a new connections for each instance.

However, stdin, when specified as a single dash (-), is only read once, then cached in a temporary file.

How can I get to `HXMD_U_MERGED`?

From time to time, the task efmd is working towards requires access to a merged configuration file. But efmd's temporary file, under -o, is usually deleted before any process reading stdout has a chance to open it.

If we want to let efmd execute commands to update some other aspect of each selected host while keeping track of the whole list of hosts selected, then we need to find a way to get to that file before efmd deletes it. I know that's odd, but it turns out that it is also useful, which is why we include -o option support at all.

Here is the hook that allows the access we need: when the environment variable M4_PATH is set, the value is used as the path to the m4 filter command for the output stream (but not as the one used to process selection and guard processing). That means you can substitute a program of your design in to take the place of the output m4 filter.

That only gets you part of the way. You still need to know where efmd put the merged file. Normally, the m4 macro HXMD_U_MERGED is expanded in hxmd's retry command, or in a host-specific file to point the way to the file, but in this case our script is the macro processor.

There are 3 ways to get access to the value we need: 2 take advantage of the process model, the other an invariant of the hook's command-line options.

Use -T to put a cp command at the top of the m4 markup

This catches the merged configuration file, but the only place to put it is to copy it to a known location to pick up later:

efmd -T "syscmd(\`cp 'HXMD_U_MERGED\` fixed.cf')dnl" -C ... -F0 report.m4 >output.1
hxmd -C fixed.cf -X other.cf ... >output.2
# use output.1 and output.2 as needed
rm fixed.cf output.*

This tactic works well as part of a make recipe (where the fixed.cf is a prerequisite for output2). Another case of this structure eating its own dog food:

...
all: output.1 output.2
	...

output.1: fixed.cf
	[ -f output.1 ]

fixed.cf: report.m4
	efmd -T "syscmd(\`cp 'HXMD_U_MERGED\` $@')dnl" -C ... -F0 report.m4 >output.1

output.2: fixed.cf output.1 other.cf
	hxmd -C fixed.cf -X other.cf ... >$@
...

Use the default m4 program and syscmd

Force m4 to act as a shell my wrapping every shell statement in syscmd. This is really gross and error-prone, but it works. Be warned that quoting the m4 markup for this on the command-line is super tricky (use a file). You end up with a raw macro stream that looks like:

pushdef(...)...dnl
syscmd(`hxmd -C 'HXMD_U_MERGED`...')dnl
popdef(...)...dnl
pushdef(...)....dnl
syscmd(`hxmd -C 'HXMD_U_MERGED`...')dnl
popdef(...)...dnl
pushdef(...)...dnl
syscmd(`hxmd -C 'HXMD_U_MERGED`...')dnl
...
popdef(...)...dnl

And you just re-implemented hxmd without the parallel processing, congratulations.

Specify a script in $M4_PATH that runs m4 on stdin

If we run an instance of m4, we can build the processed version of stdin, then use it (as a shell script, perl program, or input to another command). The key is that our parent efmd will not remove the merged file until we exit.

#!/bin/sh
m4 "$@" | exec sh

Similarly, we could catch the output from m4 in a file, chmod it +x and run it (assuming a #! loader line was included at the top of the stream). This is marginally better than all the syscmd calls, in that the shell is better at process control than m4, and we could use perl or some other processor. That processor could even be selected by the input markup, but that would be on that fine line between clever and stupid, wouldn't it?

Fixed command-line specification given to the script specified in $M4_PATH

Efmd always puts 2 fixed-place parameters on our command-line: the first is the word "-D", the second is a macro definition of HXMD_U_MERGED set to the temporary filename where efmd stashed the merged configuration file.

#!/bin/sh
# efmd gives us -D HXMD_U_MERGED=$tmp, otherwise exit SOFTWARE
[ _-D = _"$1" ] || exit 70
HuM=`expr "$2" : 'HXMD_U_MERGED=\(.*\)'` || exit 70
hxmd -C$HuM ...
...
m4 "$@"
exit 0

The other parameters are the -D, -I, and -U options specified on our command-line, followed by any m4prep files, then a dash (-) to specify stdin. Which is exactly what m4 should be provided.

Just get the merged configuration file

See Filter below.

As a configuration file filter

Filters read stdin and/or a file, then process that data set to output results to stdout. To make efmd a configuration file filter, we'd have to process configuration file(s) into an output stream.

We may specify input configuration files under -C, -X or -Z. Any one of those could be stdin. We may divert the merged configuration file under -o to stdout by ending the command line specification with:

efmd ... -T "paste(HXMD_U_MERGED)dnl" dnl

The "trick" is that we specify a null string for each processed host via the m4 markup dnl as the only argument. That just outputs the merged configuration as the only text in on stdout. If your m4 does not have a paste directive, use include; however that can provoke some unwanted quote processing in values that include spaces. There is no way I've found around a bad version of m4.

This allows for some complex boolean disjunctions in selection logic, but oue might be a better engine for that logic if you can keep for list solely in terms of the hostnames (or some other unique key). See the HTML document for oue.

Some versions of m4 emit #line markup before the first line of the included contents, which is taken as a comment by any program that uses the "hostdb.m" module to read the resulting file. Sadly, the name of the file is a mkstemp name, so diff almost always shows a difference in the output from multiple runs of the same filter command.

Common uses

Other than reports, efmd is commonly used to limit the effects of a spell to a very refined subset of target hosts, which are selected from multiple configuration files (aka. sources). Configuration files from disparate realms may need different selection processing to build a subset list of the desired hosts which will all be given to a final msrc or hxmd to update the whole super-set.

For example, to select all the hosts that provide a command and control service from many span-of-control areas, we may have to change the name of the service we are selecting; each realm might call it something different internally, but the encompassing organization may need to update them all en mass. After we have the complete list, we should apply the same update to all of them, but we can remember the realm (or other attributes) to compensate for details of each specific implementation.

# Example consolidation of 4 realm's data (usually part of a
# master recipe or a cache control recipe).
set -e
# (our local) realm1 calls it SERVICE "apache"
efmd -C realm1.cf -DREALM=earth -G "HAS_SERVICE(apache)" \
	-o "HOST HOSTTYPE" -T "paste(HXMD_U_MERGED)dnl" dnl
# realm2 calls it "httpd"
efmd -C realm2.cf -DREALM=air -G "HAS_SERVICE(httpd)" \
	-o "HOST HOSTTYPE" -T "paste(HXMD_U_MERGED)dnl" dnl
# the next calls it "http"
efmd -C realm3.cf -DREALM=fire -G "HAS_SERVICE(http)" \
	-o "HOST HOSTTYPE" -T "paste(HXMD_U_MERGED)dnl" dnl
# the last only uses hosts of class "www"
efmd -C realm4.cf -DREALM=water -I --/water -j class.m4 -E "www=CLASSOF(HOST)" \
	-o "HOST HOSTTYPE" -T "paste(HXMD_U_MERGED)dnl" dnl
exit 0

This example creates a stream on stdout that looks about like:

REALM="earth"
%HOST HOSTTYPE
mud.npcguild.org	SUN5
dirt.npcguild.org	SUN5
...
REALM="air"
%HOST HOSTTYPE
vapor.npcguild.org	FREEBSD
whirlwind.npcguild.org	DARWIN
...
REALM="fire"
%HOST HOSTTYPE
flame.npcguild.org	NETBSD
...
REALM="water"
%HOST HOSTTYPE
waterspout.npcguild.org	...

This adds another level of consolidation that large sites need. The output list is a complexity insulator for the update task because we removed the rules that generated the list, leaving just the list of hosts that run a web server. Any additional attributes needed to update a host are only conducted to the next step with intent, never by mistake.

It also implies that the aggregator recipe is maintained by the encompassing organization with cooperation from the federated realms. Without maintenance, the script that creates the list quickly looses its freshness, or bleeds complexity into the rest of the structure.

Pre-populate a cache

When it makes sense to pre-populate an hxmd cache directory before an msrc or hxmd run, one might use efmd with the output directed to the bit-bucket (aka /dev/null).

This moves the work up-front, but that's not better than letting hxmd run the cache operations itself most of the time. This might help when the cache operations should be compressed into a shorter window than msrc can drive with the extra per-task delay. In both cases, the cache population is sequential, as that is how it is defined.

If you need parallel pre-caching you'll have to code it in the Control recipe with hxmd or xapply -P. The init target in the Control recipe would be a good place to put this logic: never keep it outside of Cache.m4 or Control, because the default invariant is that each cache operation should be done sequentially.

Bugs

The common one to all these tools: hosts named for any m4 markup command (viz. "dnl.example.com", "unix.include.org") are almost impossible to manage with these tools.

An option to just output the merged configuration file would make the pipe usage shorter to type, but I'm pretty much out of options in this stack. In any case, the idiom with the dnl is actually not that hard to type or understand. If you had to quote the dnl from an enclosing m4, all the better.

$Id: efmd.html,v 1.25 2012/10/01 21:01:41 ksb Exp $