To understand this document

This document assumes you are quite familiar with the standard UNIX tools from section 1 date, sed, cat, and are very familiar with the shell itself, that is sh(1).

In addition you should have a working understanding of the shell's use of file descriptors and how they are manipulated (stdin, stdout, and stderr are 0, 1, and 2 respectively). A fair understanding of shell globs, parameter lists, command quotes, and the use of double-quotes or single-quotes (that is when to use each).

Lastly you must have a good working knowledge of the environment: how variables are set, common usage, and the common variables. See environ(7). A fine point here that some people forget: variables may be set for a single command, see sh(1) under Simple Commands.

What is mk?

In short mk extracts code and data from any file to give it to the shell for execution. A marker names each code fragment in each file, the marker is just a name for the code, it doesn't really mean anything to mk.

It might not strike you, at first, that marking commands in a text file is useful enough to warrant a shell program. After all we have scripts, make, ant, and any number of other processors that all run shell commands -- so why would I need mk? Why would I try to get other people to use mk?

The best answer is that you can't lose a related command if you have it in the file it processes. Contrast that to a make recipe file that might get separated from the program it builds, or a script that installs your favorite application from a CD which is not included on that CD: these might not be where you need them, when you need them. The other way to say it is that it is a configuration management invariant: when I use mk to operate on a file the information about how I did that operation travels with the file.

Looked at another way, this makes the markers into messages which, when sent to files, take actions like a `method'. In that model the file becomes an `object' that has methods and a concept of self, which matches the definition of object oriented.

We end up with an abstraction here that allows a file to manage itself. But none of that says anything about the contents of the file, other than the marked commands. That means we can put the marked commands in the comments of the file, not in the `payload' part. That allows the file to contain any language that allows comment like a program in C, perl, Java, or LISP -- or a text file marked up in HTML, *roff, XML. The file could even be a plain text file, if the marked lines are not too distracting.

What does a marker look like?

A marker is an alphanumeric string, and is always prefixed by a dollar sign $ to help mk find it in the text. To avoid finding shell variables as markers each instance must be followed by a colon : In the example lines below the same marker "Mark1" is the only one on each line:
$Mark1:
this is text with $Other a marker in it called $Mark1: echo $USER
Note that "$Other" is not a marker because it lacks a colon for before the white-space.

Later on we'll talk about other things that can come between the marker and the colon, but for now we're going to assume we don't need those parameters.

After the colon comes the "marked line", that's the part after the colon until the end of line (or the token $$). The marked line is expanded, in much the same way printf does, to create a shell command. That command mk gives to the shell for execution, unless -n is specified on the command-line. In the example above the first marked line has an empty command, while the second runs echo to print $USER.

Examples showing why and how I use mk

I want to remember how to use my synergy configuration file:
# comments about the file
#  $w01: synergyc w02.example.com
#  $w02: synergys --config %f --daemon

section: screens
	w02.example.com:
	w01.example.com:
end
...

On the "w01" host I need to run the client (synergyc) to connect to the main X server, on the main display I need to run the server part (synergys) in daemon mode with this file (called %f by mk) as the configuration file.

The reason I use the markup "%f" is that the file might be in a remote directory, or have a different name than it had when I created it. If I were to use the name "ksb.syn" in the server command, then mk would only run the correct command when invoked from the file's directory and when the file's name was exactly the same as when I created it: that's not quite as useful as substituting the current filename into the command.

To coax mk into running one of the commands we need to tell it the marker we want to index, and which file to search:

$ mk -m w01 ksb.syn

To take the a step farther lets build another marked line that works on either host:

# comments about the file
#  $Start: mk -m$(hostname | sed -e 's/\\..*//') %f
#  $w01: synergyc...

That shell command runs mk with a marker built from the name of the current host (with the domain name removed). So on either host I can use the common X startup command:

$ mk -m Start $HOME/lib/ksb.syn

On any host that doesn't match a marked line mk will exit non-zero, which is exactly what I wanted. If I wanted to trap that case I could add a * marker to catch the default case -- we'll explain that matching later.

The point of all of this is that mk allows us to put a memory of how to use a file into the file, one that automation can extract and use -- which is way more useful than a note to the reader that must be cut+pasted+edited every time it is used. It also allows failure as an option: you don't always have to match a command -- which is even more useful later.

Mk's theory of operation

To abstract this just a little I need to explain my configuration management model (just humor me for a few paragraphs, then we'll get back to making life better for you).

I break the configuration of a computer's software into 5 levels. Each level builds on the level listed above it. Together they form a structure which joins the layers into a workable, repeatable, and audit-able process -- without mandating any particular tool or procedure.

Source files with revisions
As we refine each program, document, or recipe we "check point" it so we can fall back to a known state, or show someone else the changes we made. Every revision control system from SCCS, to RCS, to CVS, SVN, or git -- they all do about the same thing at this level.
Work products with versions built from revision controlled files
Created from controlled files, build with make, ant or a shell script. Sometimes we even use autoconf to spice the recipe for us. But the point is that many files that were separately controlled become a work-product with a distinct role to play.
Facilities built from multiple products form releases
A RPM, or other package file, built from a specification of the products and how they need to be installed. This is also the level that keeps track of bug reports and feature requests. Each package is built from multiple products and files to become a new release.
Computer systems built from a signature of releases
A running instance of a list of the packages installed on a machine to create a unique host. Most of the files used to create the instance are from the release packages built above.
Site policy based on rules and procedures, documented in files
All the files created on a host that were built by that installation process that were not extracted from a package. The network configuration, the hostname (usually) and all the documentation used to manage the structure itself.

I believe one can fit any software artifact into one of those 5 buckets, and only one. There is some overlap in the implementation of site policy and controlled files: they are both kept in a file with a revision history (aka an audit trail). In both cases that file stands "alone", in that it is not processed or controlled by a recipe file, it must carry any "meta information" within itself. Which is what mk does, encoded as shell commands.

A classic example of this is the RCS (or CVS) keyword expansion. This allows tokens in the file like $Id$ to expand to the current value of the identification string for each file -- in that file. That lets other programs examine a file (with ident, or the like) to ascertain which revision is at hand. The same facility is often promoted to the version information for a compiled product; one of the key files used to build the product displays its revision as the version string for the whole program (document, configuration, etc.).

In much the same way we use mk to manage meta information about a file, but in this case we manage it in terms of shell commands, not identification markup.

Example meta information

Lets assume that a group had many automated scripts driven by cron that might be installed on each host under there control. Some of these clean log files, some stop and start applications, some run backups, others are bound to processes an outsider would have trouble believing -- in other words it looks just like your setup.

If we assemble all these scripts into a source repository and revision control them we've solved part of the problem: we have a clean source for them (rather than having each instance be a typing adventure for the admin that installs it).

But we still need a way to know where to install it, when to run it, and on which hosts it should be installed. If we comment the files with 4 mk marked commands, we can solve this pretty cleanly.

# $Where: ${echo-echo} /usr/local/etc/rollLogs
Tell the installation recipe where to install us, we could output different paths on different hosts.
# $Allow: [ -f /opt/www/etc/httpd.conf ]
Only exit successfully when a certain file exists, the installation script won't install us unless that file is installed. Other check are clearly possible here. The output from this script might be included as a comment in the crontab, in either case.
# $Clock: ${echo-echo} '4 2 * * *'
Suggest to the installation script when to run this task. Some advanced version of the script might adjust times when there are conflicts: that would be a site policy issue. Other simple tactics might include using a random number to put a minute within a range, so that not every host runs at exactly the same minute, but does report at the same offset for each interval.
# $Run: %f -S50 -I75 /usr/spool/uucppublic
If we are a script we run ourself with the options we want (data files, or tunes for the local host).
# $Run: /usr/local/sbin/hxmd -C%f -K 'replay ...' -Q...
If we are a configuration file we call the correct application with specification that includes our file name and related options.
Given the above the crontab line for this task might be:
4 2 * * * /usr/local/bin/mk -smRun /usr/local/etc/rollLogs

So this indirection allows us to pass options (like "-S50") to the script, and use it on multiple hosts under different paths, logins, options, and times. If we allow Clock to output more than a single time, the task could be triggered at offsets that are not easy to do with a single crontab entry. This is actually quite easy, by the way (with xapply).

To finish this structure you need msrc or at least hxmd, but that's another story. I'll not present the code to automate it here, since it would break the flow of this document.

Why is that a good idea?

The best reason I can give you is that it keeps the meta information with the job it manages. The revision log for that one files holds the audit trail for the task's code and for the task's deployment. In some respects that makes this structure more certain (alt. less complex) than a (lack of) structure that keeps the crontab on each host in a separate file. More than likely just on that one host with no revision control and no log of why anything was done.

Remember all those little task scripts you wrote and lost? Or the ones the admin before you wrote and you can't figure out how to use? Let's avoid that next time. A few comments and some marked lines and you are set. The most important comment being the one that points you back to the structure that installed the file and has all the revision history. Say it with me, "Dump tapes are not my revision control system."

In fact this puts a structure on how such things are done, but it is a structure that you can grow. When you need another marked command to handle an exception you can add it later. But you don't need to plan for every case up-front: you are never going to need the ones you thought you did. But you will need the some you never thought your would, I promise.

So I always execute cron tasks via mk. That way a configuration file that lists all the switches to poll can be in hxmd format or any other format -- with the only limitation being the file-format must include a comment markup. With a comment line to call the poller script with the file as a parameter. So

cron -(via the clock)-> configuration -(via mk)-> poll script with options
It just plain works for the right reasons. When you add another configuration file, you'll add a parameter to the existing cron update command, or add another task line to the table. When you add an element to poll, just update the configuration file. When you update the poller you may add a command-line option for the new feature, which might cause you to update the configuration file's marked lines, or not.

If you need to poll the same hosts for a different reason, then you can use that same configuration file with a different marker. That alone is worth the price of admission. There is no way to deduce which scripts might use the element list or the revision of the element list, unless you allows comments, so you're going to need comments anyway. When you allow comments you may as well embed the poll-command in them, because it is the best reference you can leave.

Contrast that to coding a script with the elements listed in a shell here document with a while loop processing them. There is no good way to update the poll-code without updating the list of elements, and no (good) way to reuse the list of elements. The script will fall into disrepair and become a liability, rather than a shared resource that is kept up-to-date.

Easy access to generic actions

I an very capable of typing:
$ groff -tbl -Tascii -man mk.man |less
when I want to view a manual page. I am also fine with typing
$ lynx -dump mk.html | less
to view this document. Remembering the command-line options to groff and lynx is not a lot to ask from a systems programmer, but it might be harder for a Customer to remember that the mk manual page includes tables (and how to display them).

I can mark the formatting command in the file, for example in an old-school nroff manual page:

.\" other comments like version markup
.\" $Display: ${groff:-groff} -tbl -Tascii -man %f |${PAGER:-less}
or in an HTML page I would use:
<!-- other comments like version markup
 -- $Display: ${lynx:-lynx} -dump %f | ${PAGER:-less}
 -->

Then I could tell my Customers to use the program Display to display either page. That program could be a short script:

#!/bin/sh
exec /usr/local/bin/mk -mDisplay -s "$@"
or even a symbolic link to mk. When mk is called with a name that is not "mk" it assumes that the name is the name of the marker it should find. That doesn't add the -s option, but it is close enough for most application.

To view two pages a Customer might type:

$ Display example.html fritter.man
which has the feature that it uses their $PAGER variable in favor of my choice of less, and works on any file I've marked-up with $Display:.

That would be really cool, but most products don't put marked lines in their manual pages, web pages and other documents. Which is why mk knows how to look for the "marked line" in another file -- if the file either can't contain a comment line (viz. a compiled binary program) or if the file is not one you should edit (viz. a system provided manual page). Keep reading to see how we solve this problem.

Percent Expanders

We need to introduce a few of the many markups that mk replaces in the selected text. The first group are all introduced with a percent-sign (%). The one we already have seen is %f which is expanded to the name of the target file.

The option -E is a nifty command-line switch that previews what an expander might output. We'll use that to explore some in this section. For example we may use -E to ask mk to echo the name of the file it is processing:

mk -n -E "%f" /etc/motd
That asks "without passing the command to the shell, expand "%f" for /etc/motd?" Which outputs a tab followed by the string "/etc/motd" and a newline.

There is a whole list of percent markups that you should read later, but for the time being here are the few we'll use in the next few examples:

%b - the name this instance of mk as called on the command-line
When a marked command needs to recursively call mk it should use %b knowing that that path worked for the invoking process, unless it has changed the current directory. To see a benign example of this use -E to expand a string and -n to see the results without running that as a shell command:
/usr/local/bin/./mk -n -E "%b" /dev/null
Run that command to see the extra ./ is preserved in the output. No matter what convoluted path is specified the %b expander provides the correct spelling. This is important in that some sites installed the program under the name dwim (do what I mean), or under some other name since the make replacement "muck" is often installed as "mk".
%F - the target file's basename
This expands to the last path component of the target file, with the last dot-extender removed. For example touch a file called /tmp/foo.tar.gz and run
mk -n -E "%F" /tmp/foo.tar.gz
And you should see the expansion foo.tar.
%P - the target file's path as given
Replace the capital-F with capital-P and you should get the name of the target file as you typed it. Neither extra slashes nor a longer path through a symbolic link will be compressed out.
%p - the target file's path minus the last dot extender
Like %P but remove the .gz from the end (in our example file).
%d - the directory part of the target's path
If the target contains any slashes this expander removes all the characters after the right-most.

This is not even close to a full list of the percent expanders, it is enough for you to read all the examples below, and most of the markup for simple tasks.

Rejection is not always a bad thing

Mk has the concept that some marked lines are not appropriate for the task at-hand. The simplest example is the lack of a target directory. If a rule only works if the target is specified with a leading directory path then %D might be your friend. It expands to the same string as %d, but rejects the proposed command when the directory path would be the empty string (that is to say that no explicit directory was provided).

Another example of this is the submarker which is provided under the -d option. Like a marker, the submarker has no "special" meaning to mk itself. It is just a way to pick (or reject) a marked line from a list of the (already) matching lines. The submarker specified must be matched when requested. Another way to say that is "the marked line without any submarker is rejected when any submarker is requested."

In a marked line the submarker is specified in parenthesis after the marker:

/* $Compile: ${cc:-gcc} -o %F %f -lgdbm
 * $Compile(debug): ${cc:-gcc} -DDEBUG -g -Wall -o %F %f -lgdbm
 */
the second marked line is selected if the specification -d debug is presented on the command line. When no submarker is provided then the first marked line matches before the second line is examined. In this way the presence of the submarker rejects the first marked line, and the specification of a different submarker (e.g. optimize) would reject both lines. We can't say that the lack of a submarker rejected the second line, because mk didn't look at it if it picked the first one. (See the command-line options -a, -i and -V.)

Another way to "match" the submarker is match any submarker, then use it in the marked line:

# $Info(*): echo "marker %m, submarker %s"
The %m is replaced with the marker, while %s the is replaced with submarker.

The "wild card" notation above allows any submarker or even no submarker to match the marker part. Otherwise a marked line without a submarker specification will never match a command that specifically requests a submarker. The marked line rejects the command in this case when no submarker is specified (so under -V you would see the notice "no submarker for file").

There are many other ways the expander might reject a marked line. Under the command-line option -V mk outputs a reason why each marked line is rejected, that can be very helpful when debugging rules. Here are a few reasons mk rejects marked lines:

%{ENV} reject the command unless $ENV is set in the environment
A useful example of this is the $DISPLAY environment variable that X11 uses. If you don't have one we should reject any X11 client command, like xpdf.
%{pos} reject the command unless -pos value is presented on the command-line
This allows a clear 1-level API between the mk invokation and the marked lines or any templates. This is much tighter because it doesn't polute the envonrment with (otherwise useless) variables.
%^ always reject this marked line, try another
Always reject this marked line. This sounds like a self-defeating idea, but later we'll see how to make it conditional, and it has less obvious uses beyond that.
%; end file search here
This is the last possible marked command, so if it doesn't work look no further.
%. end the current target search here
The marker you are looking for cannot match under the current conditions, we reject not only the current marked line, but the whole task specified.
%Yt reject by file type (t is one of b block special, c character special, d directory, f plain file, l link, p FIFO, or s socket, under Solaris D for a door, and under FreeBSD w for a white-out)
When a command would be dangerous (or silly) to run on a special device we can reject that command to search for another.
%Y~t reject file not of this type (t is one of the list above)
When a command only applies to a certain type of file we can exclude all others. For example newfs and fsck only work for disk special device: using a negated check would be a good idea for any such commands.

Let's look at an example using the type restriction:

# $Page: %Yf ${PAGER:-less} %f
# $Page: %Y~f ls -las %f | ${PAGER:-less}
the first line traps "plain files" and runs a pager on them, the second line trap "non-plain files" and runs ls on them with some options to show the type and mode.

The spaces between the %Y~f markup and the command are not required, I include the extra space to make the rejection-markup easier to separate from the command.

Marking files from the outside

As I said at the end of the previous section: some files you just can't mark-up. Mk has three features to help you detect that there is no marked line, and to find the correct command anyway. They work together to do that without looking endlessly through files that don't have any text in them.
The -l lines option limits the number of lines mk searches from any given file.
This is what keeps mk from reading an entire binary file looking for marked lines. After about 99 newline characters it stops. You can make it search a whole file (specify 0 lines as -l0), but that's usually fruitless (unless local policy is to put all the markers at the end of each file, which I've seen done).
The -t templates option provides a list of (marked up) filenames to search after the target file.
This is a good one, in that you can trap a marker in two ways by the name of the file you search and by the marker names in that file. We should talk about the filenames a bit later, but you can imagine that if mk always searched a file from /usr/local/lib/mk, say "defaults" then you could put a "$Display:" marker in that file. The problem would be that that command would then have to figure out how to display every type of file that might be requested.
The -e templates option provides a list of (marked up) filenames to search before the target file.
This is even better, it depends on the markup in the filename list to pick an apropos file to search. That file has special marked lines that know the contextual reason why we should (or should not) search the file itself. This will be come more clear below.
Under -e or -t the colon (:) may be quoted with a backslash (\) to remove the special meaning. Since colon is quite often used as a separator for configuration files this is hardly ever useful.

Examples of external markup

Suppose I have many configuration files for rsync that I could install on my servers. Each file had markup that tells which host(s) it might be installed on, and I'm going to use a real example and use efmd to pick the hosts. (If you've never used hxmd or efmd to manage hosts don't worry, this will still be a fine example.)

I'm going to break the spelling of the rule into 3 parts:

$List(hxmd.cf)
List the hosts in hxmd.cf that should install this file. This might use something like:
${efmd-efmd} -L -C%s -I -- -Y "include(class.m4)" -E www=CLASS
or
${efmd-efmd} -L -C%s -E "-1!=index(SERVICE,radiation)"
$Install(host)
Install this configuration file on host. Assuming your version of install can read stdin for the payload, this should run something like:
${ssh-ssh} -x root@%s /usr/local/bin/install -c -m0644 - /etc/rsyncd.conf <%f
$Compile(hxmd.cf)
Update all the hosts from hxmd.cf that should get this file now, by using the List and Install markers. We pick this one because it is the default marker, you might have good reason not to do this based on local site policy.

This might be implemented as:

xapply -i/dev/tty -P1 '%b -mInstall -d%%1 %f' $(%b -mList -d%s %f)
or when no configuration file was offered:
xapply -i/dev/tty -P1 '%b -mInstall -d%%1 %f' $(%b -mList %f)

Of these the List marker is the one most apt to change. Each file should pick a non-overlapping set of hosts for deployment, or some site policy must choose between the overlaps. But almost all the files could have exactly the same Install markup.

In that case we can put the default markup in a common file, for example the README file in the directory (which explains the markup and why it is done this way). Then the -t option might be set in $MK to specify that mk should consult that file for default rules:

MK="-t $PWD/README"

So what default rules might we put in? I would put in:

$List: ${false:-false} 'no default host list available'%;
$List(*): ${false:-false} 'no default criteria for "%s"'%;
$Install: ${false:-false} 'no host specified'%;
$Install(*): ${ssh-ssh} -x root@%s /usr/local/bin/install -c -m0644 - /etc/rsyncd.conf <%f
$Compile: xapply -i/dev/tty -P1 '%b -mInstall -d%%1 %f' $(%b -mList %f)
$Compile(*): xapply -i/dev/tty -P1 '%b -mInstall -d%%1 %f' $(%b -mList -d%s %f)

The two List commands give reasonable failure messages (especially under -V) when asked for data we don't know how to synthesize. Any request to Install without a submarker to name the host is an error, but with a submarker we install the file.

The two Compile commands should always work if the other markup is in-place, otherwise they should be harmless.

The default action

Sometimes there is only one action that a file really supports. For example a configuration file for a special-purpose tool really only wants to "run" that tool with the file specified on the command-line. In that case the markup in the file can change the marker from the default one to a marker that makes sense in the context of that file.

In mk terms this is referred to as a "hotwire": we shift to a different marker for the remainder of this file, returning to the original for any subsequent files. There are 2 expanders that map the expanded command to the new marker, which is a bit unexpected, I'm sure.

For example the default marker "Compile" makes little sense in the context of /etc/syslog.conf. But the marker "Restart" might be the most common (and harmless) action one could imply from a request to process that file.

So we have two options: bind a restart operation to the "Compile" marker, or make the "Compile" marker rewrite the request to ask for the "Restart" marker. I would never bind the first, I would rather fail the request than wholesalely distort the common meaning of Compile. In addition I might want other marked lines in the file (viz. "Stop", "Start") and "Compile" would be non sequitur in the list.

I would hotwire a match for "Compile" like this:

# $Compile: Restart%h
# $*: /etc/rc.d/syslogd %M
When a -m specified a marker we pass that marker (mapped to lowercase) to the /etc/rc.d/syslogd startup script. When the default marker is presented we map that to Restart and use the default rule to pass that to the startup script.

Other markup

Mk allows for 2 other markups. C-like backslash expansions to quote characters from the enclosing processor, and an `end-of-line' markup to ignore extra characters required by the enclosing processor.

Backslash markup

See the backslash markup in the expander document for a list of those. Mostly this is useful to put a literal newline into the shell command with \n.

The $$ end markup

There are three reasons why I included the $$ end marker. First the original author included it in the first draft of the program he posted. Second some comments might be enclosed on a single line, and we don't want the end of the comment to be included in the expanded command. Here is an example from C:
/* $Cleanup: rm -i *.o core
 */
Now imagine that another committer removes the "extra" newline to save space on the screen. The position of the newline doesn't matter to the C compile, but the marked line now reads:
/* $Cleanup: rm -i *.o core */
I put the -i in, in case someone actually tried to run this: you don't want to as the */ matches any directory directly below the present and might really ruin your whole day.

Better to write it as:

/* $Cleanup: ${rm:-rm} -i *.o core $$
 */
so your friends don't ruin the feng shui of your code, or worse, by joining the two lines. Also the indirection through the $rm environment variable allows the caller to replace rm with any command they'd rather apply.

The final reason is far less obvious. Consider the marked line below:

# $Hide: %^$$ $Hidden: some-marked-line

Since mk is line oriented it looks at that line and sees the Hide marker, if it is looking for that marker it tried to expand the marked line "%^" which rejects the command and skips to the next marked line. In that case the Hidden marker is never examined.

If some other processor (say sed) removes the pattern s/ *$Hide:[ %^]*$$ / / from the file then the hidden marker would become the first marker on the line, and hence visible to subsequent mk searches.

This nesting of marked lines is not so much a feature, more a proof that mk's markup can be quoted in cases where the payload of a marker is a marker to any number of levels. Really it just makes me happy, and I use it in shell here documents once in a while.

The hidden marker text is also available as %$ to the enclosing marker, which could be useful for a quine.

Shell command idioms

Shell programmers with 20 years of coding might grok the command hooks I employ in mk markup without any explanation; for the vast majority of readers we provide some blow-by-blow explanation of the mechanics.

How to fail gracefully

Sometimes there is no reasonable way to apply a verb (viz. Compile) to a given file. In that case we'd like automation to see just a non-zero exit code, but we'd like a human to be able to see why.

In that case we call the shell program false, but we give it a positional parameter that explains why we failed. Since false ignores any command-line positional parameters no output to stdout or stderr should be generated, but under -v a trace shows the reason as part of the command. For example matching the marker Compile against an HTML file makes little sense:

<-- $Compile: ${false:-false} 'HTML is not compiled'
 -->

Let local policy decide which is better: to hotwire the nonsense marker (via %h) or to accept the marker then fail the request.

How to succeed by doing nothing

One of the most overlooked commands in the shell is : (colon). This beauty does nothing as a built-in to the shell, so it doesn't fork any processes, so it costs almost nothing.

Just like false you can give it a parameter to ignore so that a customer (under -v) can see why no action was required. For example a byte-code file might include a comment that explains that Compile is already done for the file.

( $Compile: : 'already compiled' $$ )

Trap the key shell program

Reading the manual page for environ one can see that there are some shell environment variables that are used by many programs for common features. For example the $PAGER variable is used by most command-line driven programs that need to display text to a customer on a teletype device.

To make mk driven commands use the same convention we code shell expressions like:

# $Display: ${PAGER:-more} %f
The shell expanded $PAGER when it is set to a non-empty string, otherwise it expands the word "more". That lets the customer's value of $PAGER hold sway over the interface, with a sane default. And it is not so much to type that you shouldn't do it.

For programs that have no specification in the environ(7) manual page we use the convention that the name of the program (in lowercase) might be used as the hook. For example:

/* $Compile: ${cc:-cc} -O -o %F %f -lgdbm $$
 */
This allows gcc as a replacement for cc without editing the markup, which is the point of such markup (you should never have to edit it as a customer).

Nested markup

When there are two passes required to compile a file we may need to hide the inner marked line from the first instance of mk. For example a yacc, lex, or mkcmd source file also may be run through the C compiler. Here is an example from a mkcmd source file:
# $Compile: ${mkcmd-mkcmd} -n %F %f && %b %o -m%m %F.c
comment "* %kCompile: %k%{cc-cc%} -o %%F %%f %k%k"
The first line is marked with $Compile to run mkcmd over the file. The second line becomes a marked line in the comments in the resulting C source file, because mkcmd replaces the %k with mk's marker character. It is mildly surprising that the two tools have mutual knownledge about each other, until you think about how often they actually interact. The other name for %k, in mkcmd, is %<mkdelim>.

Here documents in the markup

A shell here-document allows in-line streams for command input, extracted from the script. Similarly mk allows in-line streams, extracted from the target file (or template). These here-documents allow data files or shell, expect, perl or other scripts to be included in the comments of a file. Unlike the sh), multiple here-documents may be active for the same command.

There are 3 expander markups that are primarily used to manage here-documents:

%J
Start a new here-document. The scans the current template from the line after the present marked line until we find another marked line that matches the current marker+submarker pair and contains the end markup (%j below). The text between the two marked lines is the contents of the new here-document. The name of the file is bound to %j and also the bound to the next available integer (%0, %1, %2, ...).
%j
End a here-document. Any scan for the end of a here-document must find a %j someplace on the candidate line. It may either be after the $$ terminator, or as part of the command template.
%|dprefixd
This markup removes the string prefix from each line in the here-document. The allows the document to appear as comments to processors that require a comment markup on each new line. The character d my be any character, traditionally I tend to use slash (/) or double quotes (").

Given those rules the example below diffs two here-documents: the first contains the lines "a", "b", "c" and is named %0, the second contains the lines "a", "c", "c" and is named %1. (The second may also be called %j if you like.)

$Compile(*): %J%Jdiff -U 1 %0 %1
a
b
c
$Compile(*): %. $$ %j
a
c
c
$Compile(*): echo %%0=%0 %%1=%1 %%j=%j

When run without any special options this outputs a unidiff showing a deletion of the "b" line and an addition of a duplicate "c" line. When run under -c the echo command may be selected to show the filenames created:

%0=/tmp/mkhereWywb06 %1=/tmp/mkhere2EWJsX %j=/tmp/mkhere2EWJsX

Note that the here-documents from the first command are still available in the chained echo command. Only the %. markup in the second marked line prevents it from including an active alternate command.

In some cases an empty here-document may be created just for the temporary filename (since mk will remove the file after the command completes for us).

The next example removes the double-slash comment markup from a named configuration file to reveal a template for a configuration stanza. Then uses that stanza, and the first few lines of the current file to build a more up-to-date version of the configuration file. The second marked command (under Update uses an empty here-document to create a temporary file we can put the updated text in, then rcs-lock the current file, update it, and commit the change with ci

(Note that the line marked with the ... is a continuation of the line above it, wrapped to allow the page to print correctly.)

// $Id... $
// $Rebuild: %|'//'%Jsed -n -e "1,/^..[ ]END_HEADER/p" %f; (cd /etc/namedb/parked &&
...	/usr/local/bin/glob -r 'db.*') | sort | xapply -nf "\n`cat %j`" -
//zone "%[1.-1]" {
//      type master;
//      file "parked/%1";
//};
// $Rebuild: %. $$ %j
// $Update: %J%b -smRebuild %f >%j && co -q -l %f && cp %j %f && ci -q -u -m"auto update" %f
// $Update: %. $$ %j
// END_HEADER

Here-documents give mk a new level of power: they allow much more meta-data in the mix. We used the here-document to build the template zone stanza and the xapply markup to trim the ".db" off the filename (which is assumed to be named for the zone.

Summary

Mk provides a way to embed commands related to a file in that file. These commands are expanded as they are extracted from the file with run-time parameters and file location to build the exact command needed in the current context. The command selected is only bound to the marker by the fact that they are on the same line in the file -- no binding between any marker and command exists within mk.

There is a lot of markup here you could use, most of it you'll not need unless you write lots of external (template) files. Access to xapply's dicer was added late in mk's evolution, so some "extra" markup is provided which could be replaced with dicer notation.


$Id: mk.html,v 5.26 2012/07/23 23:37:42 ksb Exp $