Usage

Without any special characters mk is pretty useful. With the addition of some printf-like expansions mk is a much more powerful processor. By snipping bits of text from the target filename, target file, a lookup table, the environment and the command line options passed to mk itself we construct a more specific and powerful shell command.

The compact notation presented here is used to express the same string operations that are commonly done with sed, expr, basename, or dirname in a shell script. The intent is to remove clutter from the command in the source file, and to make the customized command more portable and precise.

The other purpose of the expander is to enable some "meta" operations, like changing the marker or the submarker mk is processing. In some cases a file might not meet a criteria for the marker presented, mk offers expanders to prevent execution of any command in that case.

Purpose

By locating shell commands in the comments for the enclosing processor we keep those commands with the file (they can't get lost). The other common practice of using a Makefile (see make(1)) to keep recipe information is great, but it tells us how to combine the file with others into a product: the commands the author embeds with mk are often used to process the file as a stand-alone module.

The commands are selected and extracted from the target file, expanded for the present context, then passed to a shell for execution. This usefulness of this tool, like make, is driven by the use to which it is put. There are few limits placed on the tool by the implementation.

Common conventions

It is common convention to "comment" marked lines, even in a plain text file. This helps the reader ignore them while reading the document. The shell's comment style is used as a default by the author.

It is also common to keep the marked lines at the top of the file, mk only reads about 99 lines of the file by default while searching for marked lines. This prevents mk from reading a whole binary file (only to find gibberish at the end).

It is also common convention to use a shell variable to reference the "main" program we are about to run. For example ${cc-cc} to mean run "$cc if set, else find cc in $PATH" (see sh(1)). This allows the User's environment to replace "cc" with "gcc", "acc", or "bsdcc" as needed.

Markers and submarkers

A line mk is reading from the input file is called marked if it contains the marker specified, in the correct context.

A marker can be thought of as a verb, a message, a recipe, or just a name for a command hidden in the target file. A submarker can be thought of as a direct object, detail, ingredient, or a specific destination for the marker to act on (or against) the target file. The difference is in how you view mk's relation to the target, or the target's relation to the context in which it exists.

The marker mk searches for is specified with the -m command line option, or defaults to the name mk was called (in argv[0]), or the word "Compile" if the programs is called "mk".

The submarker mk searches for is specified with the -d command line option. The default is none.

When mk is processing a file it is searching either a template or the file itself for a line formatted like either:

... $marker: template $$ ...
... $marker: template
Looking for a marker with no submarker given.
... $marker(submarker): template $$ ...
... $marker(submarker): template
Looking for a marker with a specific submarker.

In each case the leading "..." is usually a comment delimiter, or white space (which is ignored). The trailing "$$ ..." is a trick to delete closing comment delimited for any enclosing processors that requires them. The template is expanded to form the command.

Examples:

/* $Compile: ${cc-gcc} -Wall -g -DDEBUG main.c -o main $$ */
A C program (a common case) with a line to compile a debug version of the program.
<!-- $Print: lynx -dump myself.html $$ -->
An SGML comment to convert an HTML file to text, via lynx.
On both cases the closing comment delimiter could moved to the next line and the double-dollar token removed. In some other processors this is harder to do.

Other examples:

# $Compile: Plunder%h
Map the marker "Compile" to "Plunder" for this file
/* $Compile(debug): ${cc-gcc} -Wall -g -DDEBUG %f -o %F
 * $Compile: ${cc-gcc} -Wall %f -o %F
 */
A common C comment to allow a debug version and a production version of the compile directive for the module.

Control expansions

We'll start with the control expansions. These are used to control the flow of execution inside mk, or the sequence of data sources selected and searched.
%.
Fail the target. The file is incompatible with the marker, submarker, or whole idea of this application.
%;
Fail the target, if this command is rejected (last eligible command). The target could only ever match this command under this marker. This prevents any subsequent commands, which might match the marker, from taking an action that would be really destructive (like removing a device file).
%^
Reject the template, but not the file. The command this marked line would execute is not correct, skip to another marked line. This can be used with the here document (%J) to build a temporary file for use in other templates marked commands (as %j is not reset per template).
$Test(*): %J: build %j%^
%m	Bar	no
%m	Test	yes
.	.	default
$Test(*): echo %<%0> $$%j
In that code %0 is the first here document in the file, so "mk -mTest" on that file outputs "yes". To make this even more useful use %| to remove leading comment symbols.
%g
Use the expanded command as a new file to search, do not continue reading the present file. The example below tells mk to search the Makefile for a command marked with "Compile", rather than this file:
// $Compile: Makefile%g
In the context of the result to a mapfile %g replaces the current mapfile with the result string, and starts the search over again.
%h
Look for the expanded command as the new "marker(submarker)". If the parenthesized part is missing only change the marker. This example redirects mk to change the default "Compile" marker to the "Display" marker for the nroff format file:
.\" $Compile: Display%h
.\" $Display: groff -man -tbl %f
%H
Look for the expanded command as the new submarker. For example we might map all submarkers to "debug" with:
# $*(*): debug%H
%=/exp1/exp2/
Fail unless the expansion of exp1 is the same string as the expansion of exp2.
%!/exp1/exp2/
Fail if the expansion of exp1 is the same string as the expansion of exp2.
%<mapexp>
Expand the mapexp, use that as a mapfile (see below)
%0, %1, %2 through %9
Replaced with the subexpression matched in the RE in a mapfiles, below. This is context sensitive, and only works in the context of result part of the matching line.
In the context of an inline here-document (%J to %j) this may be used to access more than one document. In that case %1 is the first %J document, %2 is the second and so on (even beyond %9).

Access mk's options

Since mk is going to run a shell command it stands to reason that is might want to call itself, or a program that looks a lot like itself.

In that case being able to pass our command line options on would be clever. Access to mk's command line options, and state:

%a
"-a" if mk has -a in effect, else the empty string.
%A
"a" if mk has -a in effect, else the empty string. Using the mixer, one might also write this as %(a,$).
%+
Proceed with this file (only) as if -a were specified on the command line. This is called step mode, see below.
%-
Proceed with this file (only) as if -a were removed from the command line. This also turns off step mode.
%b
The path to Mk's, that is as much of the path as mk was provided in argv[0].
%B
%[b/$]
The basename mk was called with. The name mk was called at execution time. Some subsystems call mk by several symbolic links: "Compile", "Run", "Clean", "Test", thus using mk as an "object oriented" message passing agent. This escape allows passing the "message" on to another program.
%c
"-c", if mk has -c in effect, else the empty string.
%C
"c", if mk has -c in effect, else the empty string.
%e
"-e" and the templates list provided if -e is in effect, else the empty string.
%E
the templates list provided if -e is in effect, else the empty string.
%i
"-i", if mk has -i in effect, else the empty string.
%I
"i", if mk has -i in effect, else the empty string.
%k
"$", which is mk's marker prefix.
%K
%k%k
"$$", which mk's end marker token.
%l
"-l" lines, or the default number of lines mk's searches from each target file.
%L
lines, mk's active notion of how many lines to search from each target file.
%m
marker, mk's active marker.
%M
lowercase(marker), mk's active marker in all lowercase.
%n
"-n" if mk's -n option is active, else the empty string
%N
"n" if mk's -n option is active, else the empty string
%o
-%A%C%I%N%V
all the single letter options to mk in a bundle
%O
%A%C%I%N%V
same as %o without the leading dash.
%s
mk's active submarker
%S
lowercase(submarker), mk's active submarker in all lowercase
%t
"-t templates", the active templates option as a command-line specification
%T
the active templates option without "-t" prepended.
%v
"-v", if mk has verbose set, else "-s"
%V
"v", if mk has verbose set, else "s"
%w
directory, the template directory the active template came from. As mk searches for a template under -t or -e this expansion record which directory we are presently checking.
%W
Same as %w, but if we are not searching the template options reject the command. Thus one could tell the difference between a template file as a template, or as a target itself. I doubt anyone would use that distinction.
%z
The template we are searching.
%Z
%[z/$]
The basename of the template we are searching.
%~
The default template directory mk searches, like mk's home directory.

Access to the target file

The name and contents of the target file are often useful to derive an apropos command. Most clues come from the name of the file, for example the extension after the last dot (e.g. "c" in "/tmp/main.c").

Functions of the target file:

%d
The directory the target file is in, or the empty string.
%D
Same as %d above, but reject the command rather than present the empty string.
%f
Full path to target.
%F
Basename of target (without leading path and dot extender).
%G
Tail of target (last component, starting after the last slash).
%p
The target name, up to the last '.' including any directory part (also spelled %q.).
%P
The target basename, up to the last '.', reject the command if no '.' in target (also spelled %Q.).
%qx
The target name, up to the last x, reject the command if no x in target.
%Qx
The target basename, up to the last x, reject the command if no x in target.
%r
The name RCS would call the cache file for the target.
%R
The name RCS would call the cache file for tail of the target.
%ux
The target name after the last x, reject the command if no x in target.
%Ux
The target basename after the last x, reject the command if no x in target.
%X
%x
The extension on the target (also spelled %u. or %U.).
%y
One letter from "f", "c", "b", "d", "p", "s", or "l" depending on type of the target (file, character special, block special, directory, FIFO, socket, or symbolic link).
%Yx
Fail if the type of the target is x.
%Y~x
Fail if the type of the target is not x.
%#!
Full load path, if file is type 'f' and loaded with "#!".
%#/
Tail of load path, if file is type 'f' and loaded with "#!", (also spelled %[#!/$]).
%# [nbytes] [@seek] [%] fmt [size Insert data from the target file. Read nbytes of data (taken as a decimal integer) at position seek (default 0). Then use the printf formatted fmt to output them in units of size (b for byte, w for word, or l for long). Note that fmt cannot contain a '*', since there is not a way to pass the width parameter to sprintf(3).
That deserves and example. If we know the target name ends in ".C" it might be compact'd data, or some other type. To confirm it is compact output we look for compact's magic number (0x1fff) as the first two bytes in the file:
	$Info: %=/1fff/%#%04xw/echo %f is in compact format
%{ENV}
%`ENV`
%"ENV"
expands to "${ENV}", reject the command when $ENV is not set.

This is largely used to see if an X11 DISPLAY environment variable is available, to avoid forking failed X client application. Since the command is rejected when the variable is not set we can prefer the X version of an application (e.g. browser or spread sheet) then the text only version.

%[expression separator field...]
Expand the expression as normal, then apply the xapply dicer rules starting with separator and field, as in xapply the separator and field specification may be repeated as needed.

The expansion is broken into fields at each occurrence of the character separator, then field number field is selected. A negative fields inverts the selection to mean "all but field". In the case of a literal blank the separator is any non-zero number of white space characters. A backslash may be used to remove special meaning from space, backslash, digits, and close bracket.

%(expression mixer)
As in xapply, process the expression then select characters from it with the mixer. The rules for the mixer are too complex to fully explain here (see explode's dicer.html for details). In brief, ranges of characters indexed from 1 to the end of the string are selected by index from the left (integer), index from the right (~integer), last character ($), the whole range (*), or augmented with a quoted string (`text', or "text"). Results may be filtered again with repeated application of these expressions in parenthesis.

Mapfiles

These little "lens" files are used to map strings via regular expressions to other strings, and are accessed via %<mapexp>. For example a files magic number to the name of the program that builds that type, or unpacks that type of file. See the default templates for many clever uses.

Blank lines and lines that start with a hash (#) are ignored. All the other lines in the file should be matching lines.

Matching lines have three columns separated by white space, all of which are expanded before being used:

  1. a "test string", an missing value is taken to mean the last one specified
  2. the regular expression to match against, which always has case insensitive set
  3. the result, if the RE matched the test string

If any of the expansions rejects the expansion the next matching line in the file is tested. If no line matches the expansion rejects the command. To put a literal hash or white space in the test string or the RE use the backslash escapes below.

As a bonus the strings matched by \(..\) pairs are available as %1, %2, %3... to %9. And %0 is the whole matched string.

As a corner-case the empty string may be specified as \e.

Backslash processing

Since it is hard to embed a newline in a line oriented command processor mk supports the standard C backslash (\) escapes. This might also allow the comment character(s), from the file's native processor, to be included in the embedded command. For example if double-dash (--) is the comment ending token, and the command needs a double-dash option (e.g. --help) in it one could use any of these expander forms:
	\055-help
	-\055help
	-\e-help
To break up multi-character tokens I prefer \e, viz. "*\e/" to avoid a C comment termination.

In some files the comment character is all but impossible to include in a comment. For example a hash (#) character might have to be expressed as a \043 to hide it from the native processor, viz. make.

The list of backslash escapes

spellingexpands to
\\ a literal backslash
\n newline
\t tab
\s space
\b bell
\r carriage return
\f form-feed
\v vertical tab
\e the empty string
\$ a literal dollar, often used to defeat $$
\000 .. \777 literal octal ASCII codes
\elseany other character is taken as a literal

Extended data access

%*
The actual marker and submarker found to match the ones specified. These are given as marker or marker(submarker). Note that this expander will always fail in the context of a -E option, as there is no actual marker.
%j
The name of the filename of the current here document. A line with a %j and no preceding %J is always rejected. This prevents the null command from being selected at the end of a here document.
%J
A marked line with a %J in it builds a temporary file much as a shell "here document". Lines from the current file are copied to the temporary file up to the next marked line (matching the same marker and submarker) with a %j anywhere in it is read. By anywhere we mean before the marker, in the command, or after the end token $$. These lines (not including the last) are presented in the filename reported by %j.

Consumed lines are not re-inspected for marked lines. The example below outputs lines numbered from 1 to 4.

$Here: %J tail -r %j
4
3
2
1
$Here(*): %.%j thing

The line which ended the here document is inspected for a command if the controlling line is rejected, even when it ended the previous document. Mapfiles get their here document data from the marked file, not the mapfile (%J is not allowed in a mapfile, but %j is).

%|/expr/
Remove a leading string from each line in every subsequent here document (see above). The slash delimiter may actually be any character. The expression specified is expanded to produce the text to remove. This is often used to remove leading comment characters from the here document for other processors.
# $Here: %|/#/%J tail -r %j
#4
#3
#2
#1
# $Here(*): %.%j thing
%?
Under the influence of %J, %? expands to the text which follows the %j which terminates the here document. In the example above it would be "thing". Note that leading white space is consumed. A good use of this is to fetch data from the end of a block that was computed as the block was produced, e.g. the standard deviation and mean of a list of numbers.

This is also a good use of $$, we can end the current here document, and use the $$ feature to prevent the interpretation of the %j if that command is ever expanded.

...
$*(*): ${false-false}%; $$ %j end of the world
%$
If the marked line ends with a $$ token the text after that token is available as %$. When there was no $$ token this expansion rejects the command. In the example above the result would be " %j end of the world".

A strange side-effect of %J is that this expander sees the text on the end of the marked line that ends the here document, not the text on the end of the current template. This is thought to be a feature rather than a bug.

Step mode

Under the -a switch mk looks for all the matching marked commands. This is culturally used to follow a step-by-step recipe, much as a make(1) file, but the steps might also be stand-alone targets themselves.

There are two expanders that are used strictly for their side-effects to set and end "step mode": %+ to turn it on, and %- to end turn it off.

Here is an example from the regression tests:

$Compile: %+Step(*)%h
# This file checks to be sure mk honors %+ to do multistep tasks
$Step(1): true
$Step(2)=~0: false
$Step(3): exit 0
$Step(4): true%-
$Step(fail): false
When we are asked to "Compile" the file we shift the marker to "Step" and the submarker to "*", so we'll match al the "Step(n)" lines below. We also set -a (via %+) so we do not stop at the first one. On step four (Step(4)) we execute "true" and end the step mode so we don't fail on the next marked step (Step(fail))

Other applications might just search for a specific Step for another purpose, since they would all work a "stand alone" commands (even Step(4)).

Taking advantage of here documents

There are some subtle uses of mk, and then some there are other really subtle uses. The here documents are an if-then-elif-else-endif type construction. In this outline we see the three alternatives (submarkers 1, 2, 3), each of which has a here document block. The last alternative doesn't start a here document, but does end the third one.
$Test(1): %J something %j
	first block
$Test(2): %J something other %j
	second block
$Test(3): %J another way %j
	third block
$Test(*): default case $$ %j
Use "mk -VmTest -i" on a file with those lines to see it go. If you quit from the command prompt mk leaves the file in /tmp for you, that might be a feature or a bug, see untmp(1).

The other way to view a here document set is like a shell archive. The sections could be installed into other filenames (then processed with mk or even executed). It is not an error to remove or rename the here document files (%j) in your command.

Under -a (all matched commands) we can unpack all the here document data in the file, which makes mk into a pretty smart archive unpacker. I would use uudecode or perl to unpack the data to be safer.

The 2 additional forms (%$ and %?) are largely used for automation. Assume that some processor only knows the total of the numbers it it producing after it has written them out. It can put the total on the end of the here block:

$Numbers: %Jecho Total %$ ; cat %j
100
200
300
$Numbers: %j %^ $$ 600
Then "mk -mNumbers $file" outputs the header line and the list.

Unused expanders

As if mk didn't have enough expanders, we have some ideas for using the symbols left on the 101 key I/O device.
%_
The filename and line number of the current marked line, as "file:line". Never really needed this, but in templates it makes more sense.
%, or %&
Under -A continue with the next marked line when this one succeeded. Group marked lines into a script, kind of the opposite of %..
%\, %>, %), %], %}
More than likely I'll never use these, as the backslash itself is already special, and the close punctuation would confuse everyone.

More examples

While mk has all of the expander magic above, we still fall-back on the shell variable expansion to trap the main program. This allows the calling application to replace xterm with echo to debug, or with a script to trace actions, then execute the xterm.

Here are some examples to clear-up some of the expansions:

$Page: ${xterm-xterm} -display %{DISPLAY} -cr green -e ${PAGER-less} %f
Under the marker "Page" run an xterm for the file with a pager in it, default to "less" if $PAGER is not set. Don't try this unless $DISPLAY is set in the environment.
$Page(*): ${xterm-xterm} -display %{DISPLAY} -cr %s -e ${PAGER-less} %f
A re-play of the first example with the submarker used as a color selector for the curser color. If this is put above the first example it will take effect when mk is run with "-d color".
$Page: ${PAGER-less} %f
Under the marker "Page" fall-back to a local pager ($PAGER) or less if that is not set.
$Compile(*): %;${echo-echo} "%G: cannot %m myself" 2>&1 && ${false-false}
This file can't be compiled. Warn stderr and fail. If this command is not picked then fail anyway.
$Compile: ${lex-lex} %f && %b %O -d" -ll" lex.yy.c && %b %O -mUpdate %f
$Update: %=/%C/c/${rm-rm} lex.yy.c && ${mv-mv} a.out %F
$Update: %!/%C/c/${rm-rm} lex.yy.c && ${mv-mv} -i a.out %F
Run lex over this file, recursively call mk by the name provided with the single letter options given on the output C file, then remove the output C file, and move the a.out build to our name without an extender.

Used mv -i if we don't confirm the Update marker.

A more complete example

Say we want to run a cron based task every hour to poll a list of hosts. Using mk we might break this problem down like this:
Write a script to poll 1 host
Which we can test for each host and re-use for other similar tasks. There might be a list of test cases (using mk) in the file to debug it.
Build a list of the hosts to poll
Call it "poll.list" and include a comment line like:
# $Poll: grep -v '\043' %f |xapply -P3 -f '$HOME/libexec/pollme' -
host1
host2
...
The crontab line calls mk on the list of hosts
27 * * * *  /usr/local/bin/mk -smPoll $HOME/lib/poll.list

This has several advantages over combining the loop with the list of hosts.

Reuse
We can use the same pollme script with a different list of hosts for other tasks or to diagnose failed polls. With another marker in the poll.list file we can reuse the host list for a different task. We can mark the poll.list with RCS/CVS/SVN keywords, as long as we comment them.
Speed
We poll 3 hosts in parallel with xapply's -P, option but we don't have to change the crontab to tune that factor. And we know what the list of hosts is used for by reading the file.
Clarity
Basically this factors the parts of the work in clear blocks. The code to poll in a script, the list of hosts in a file, and the trigger in the crontab.

If we don't do these we risk loosing track of what the list of hosts does, or which program uses it.

Conclusion

Mk treats the comments in a file as compacted shell commands. These commands are extracted by matching a "marker" name to a token prefix, then expanding a lot of percent expressions into a shell command. The command is passed to the shell to do what ever the marker means. Any "meaning" assigned to a marked command is from the perspective of the person that wrote the markup: mk doesn't pretend to assign intent to such names, much as make doesn't.

This puts the details about a files use in the file, rather than someplace harder to locate (like a crontab, Makefile, or script in another directory).

Mk has a strong templating structure which allows files to only include the marker commands that are different than the customary versions.

The culture around mk includes heavy use of shell code as well as recursive calls to mk.

Uses in the ksb tool chain

Mk is used as a back-end for:
man - to format and install manual pages
system install scripts - to select many optional parts
sbp - to select commands to build empty filesystems
sudop - to emulate sudo under op
valid - to run regression test code
Level 1 CM policy - to build, install, and test source files
Level 5 CM policy - to pick hosts and options for software configuration
shell scripts - the provide example usage hints, for common instances
README file - suggest commands to run configure and/or make

Version identifier: $Id: expand.html,v 5.28 2010/08/13 19:26:04 ksb Exp $