To understand this document

This document describes one of the two master source pull services, the other is msrcmux. If you want the server to do the configuration of the shadow copy, then you want to read about that service. If you need to use local meta-information to configure the shadow copy, then you are at the right place.

You need to have a working knowledge of rsync, and either msrc or mmsrc. Having used the master source structure to build at least one of your own products would really help.

If you have never read the msrc primer, please read it before you continue with this document.

The examples in this document do not assume that you are an administrator on either the master source host, or on the client machine. If you are then you can do quite a bit more to make your life easier.

Four data-flow models of configuration management

There are 4 models to explain here. Traditionally most people think of either pushing data to a target host, or pulling data from a central service to the target host. Other variants include peer-to-peer distribution of updates, which is beyond the scope of this document.

What mpull does

Mpull downloads a single master source directory from a repository via rsync. From the new local copy it configures a shadow copy for the client instance's platform based on the locally available configuration management information. That is to say the site policy information from the localhost (in auto.cf) is used to build any required shadow copy. So this doesn't use the policy on the master server at all.

It does the configuration via mmsrc. Options to mmsrc specify a utility action to update the host from the recently configured shadow directory, and may replace auto.cf with any other hxmd configuration file specification.

This implements a client-all version of the master source structure. That allows client instances to test new software, configuration files, and other spells before they are pushed (or pulled) more globally.

Command-line usage

Because it is a ksb-tool it takes the common -h and -V options:
mpull -V
mpull -h

Because the script is basically a front-end for rsync and mmsrc, it takes all the options for those programs, separated by the location of the target master source directory:

mpull [rsync-opts] msrc-dir [mmsrc-opts] [utility]
rsync-opts
Any option that rsync takes must be expressed as a word that starts with a dash (-). By quoting any separating white-space, or by using the assignment form of the option, you can make that work for every rsync option.
msrc-dir
The target master source directory. This is expressed as a relative path from the rsync module's top-level directory. When using rsync over ssh it is expressed as a path on the remote machine.
mmsrc-opts
Any options to mmsrc which are needed to configure the target directory. Usually database options (like -D, -C, -X, and -Z) to hxmd. If you provide none, then -Cauto.cf is assumed.
utility
The local update shell command needed to complete the task at hand. The default is make, because that's the default from mmsrc. This is largely ineffective to actually update a host with anything I've coded, you really want to specify make install clean, or mk -mInstall \*.man usually.

Example usage

As an example let's trace a download an up-to-date copy of the glob program and install it as the superuser:
$ mpull --exclude=RCS local/bin/glob -dX op make install
Because I added a trace option (-dX) that outputs something like:
rsync -arSH --port=873  --exclude=RCS \
	msrc::msrc/local/bin/glob/ /tmp/mpull2v1od1/local/bin/glob
mmsrc -y INTO=/tmp/ksb/local/bin/glob -Cauto.cf -dX op make install
mmsrc: make -s -f /tmp/mtfcZgsJEu/makefile INTO=/tmp/ksb/local/bin/glob __msrc
Then the output from the make run looks like:
mkcmd glob.m
(cmp -s prog.c glob.c || (cp prog.c glob.c && echo glob.c updated))
glob.c updated
(cmp -s prog.h glob.h || (cp prog.h glob.h && echo glob.h updated))
glob.h updated
rm -f prog.[ch]
/usr/bin/gcc  -Dhosttype  -c glob.c
/usr/bin/gcc -o glob  -Dhosttype  glob.o
install -c -s glob /usr/local/bin

The download and configuration all happens as me, the op escalation allows the build an install to happen as the superuser. That means the owner of a workstation may install updated products without any help from the site Admins. It might be safer to use pre-compiled packages, but not every update is the same for every workstation: some files are completely unique, but built by automated processes.

For more details about the options see the manual pages for rsync, mmsrc, and mpull.

Why a pull-source structure?

The base version of the master source tool chain stresses pushing configured shadow copies of the source to target machines (or instances) for updates. This model provides the initial setup of machines, more on-going provisioning, and mass data recovery pretty well. So what is missing from that model?

Really there is nothing missing, if you have a completely uniform schedule for changes. When all changes happen to each population of targets in uniform pattern, you can push via the clock and probably use a few build-hosts to make packages for large populations. But that's never really the case, because we need to test disruptive changes on test instances, we need to test failure cases, we need to try upgrades before we deploy them to very large populations.

Mpull lets you test new configuration combinations without committing them to the meta-data repository. By using local meta-data against a known-good master directory we are testing a limited change (variation of parameters really works here).

Next reason: having only a push model leads to some possible privilege, resource and synchronization issues. Test groups may require write access to the "production copy" of the master source, to push test changes to their test-bed instances. Or production support groups will provide the staff to run test plans, which might be even more of an issue. Unavailable machines may miss a push, which requires some close-the-loop operations to keep the posse synchronized. For a stronger view see the list at infrastructures.org.

The down-side of a pull-only model is that changes may be taken from the master repository while it is not stable (in the midst of a commit). Clients may over-load the master repository by all demanding updates at exactly the same time (via cron, or some other triggering even). Also recovering from any mistake that stops the update process requires a push process, or a lot of fingers on keyboards and expect scripts.

But the best of both worlds: push when needed, pull when needed, and keep close-the-loop in mind all the time.

Close the loop

In my model of configuration management the most important idea is that every level pushed information to adjacent levels to feed-back information. This is how bugs get fixed, how site policy gets applied, how better code gets deployed. In truth the whole configuration management structure doesn't function without mechanisms to both send and receive feedback between structural elements.

Hostlint example

The best example I can think of is a program petef and I did to check the versions of every local tool installed on each instance under our control. We called the program hostlint. If you have never read hostlint's HTML document, please read then return here.

Hostlint checks level 4 (running hosts) against level 5 (site policy) to assure that every host conforms. If you have site policy statements that do not impact the contents of any file, version of a product, release of a package, or existence of a login, group, netgroup or network route -- then you can't check it with hostlint.

But without a way to remeditate the issues hostlint finds, we have not solved any problems. Mpull is one way to fix what hostlint finds lacking. The other is msrcmux.

How I work in Real Life™

There are a few cases where I use mpull, but mostly I push with the dmz script interface to msrc. I spend a lot of my time building products, or developing new features in the tools I support. I push code to test instances a few times an hour as part of my development cycle. I could have one window open on the source host, and another on the test host where I'd run mpull, but that takes longer as it must rsync the whole source directory down (not just the changed files).

But on other people's machines (different political domains) I use mpull almost exclusively. I pull source to my home directory (or /tmp) to build, and install the results in my home directory. This let's me build a complete CM environment without being an Admin on the instance. I only need the mpull script, a copy of mmsrc built for the machine, and a local copy of rsync to get started.

One of the largest contributors to irony in the universe

Mpull depends on 2 master source tools hxmd and mmsrc; the irony is that mpull can't upgrade mmsrc.

This is because the master source directory that builds mmsrc reaches into the source for hxmd and msrc to create the C version of the boot-strap code. This meta-requirement, that the source to those tools be present on the "master source side" of the build process, means that when we rsync down just the lone directory we don't have enough information to complete the shadow copy for our platform. But the copy of mmsrc.c is usually up-to-date, so the failed check is really not required. There are 4 work-arounds.

The first is to use muxcat(1) to fetch a pre-configured shadow copy of the source from msrcmux. This assumes that the client in question has a valid meta-configuration on the local master source repository.

client$ mkdir /tmp/mmsrc$$
client$ cd /tmp/msrc$$
client$ muxcat -x msrc.example.com msrcmux local/sbin/mmsrc . | tar xvf -
+directory
+configuration
+/usr/local/sbin/mmsrc -yINTO=/tmp/mMxT9jdNRm -lDHOST=client \
	-Cauto.cf -- tar cf - .
x ./
x ./boot.html
x ./README
x ./TODO
...
x ./Makefile
client$ op make install
...
client$ cd .. ; rm -rf msrc$$

The next is to download the tar archive of the msrc_base package and install the whole set. That one is sure to work every time.

client$ fetch ...
client$ tar xzf msrc_base-...
client$ cd msrc_base*/
client$ op make boot
client$ cd ../; rm -rf msrc_base*

If you've make local changes to the tools, you don't want to back out to my version. So the second is to roll a new "msrc_base" package. You must use the package recipe to make a new stage directory to download:

msrc$ /usr/msrc/Pkgs/msrc_base
msrc$ make stage
... lots of output ...
op -u source level2s-chown /tmp/msrc_base-number
msrc$ make check
cd /tmp/msrc_base-number &&  find local css Pkgs -type f \
	|grep -v level2s.cf.host | xapply -f 'diff %1 /usr/msrc/%1' -
msrc$
Then on the target host we need to pull that whole directory to build the one little program we must have:
client$ export MPULL_SRC=/tmp/ksb MPULL_FROM=msrc:/tmp
client$ mpull msrc_base-number cd local/sbin/mmsrc \&\& op make install
Or install all of them:
client$ export MPULL_SRC=/tmp/ksb MPULL_FROM=msrc:/tmp
client$ mpull msrc_base-number op make boot

Back on the master host run "make clean" to remove the stage directory:

msrc$ make clean

The last is to rsync the master source for mmsrc ("local/sbin/mmsrc") into a temporary directory and build it with hxmd:

$ cd /tmp
client$ mkdir mmsrc.$$
client$ cd msrc.$$
client$ rsync -arSH msrc::msrc/local/sbin/mmsrc/ .
client$ hxmd -Cauto.cf -Glocalhost 'make -f %1 clean mmsrc' Makefile.host
rm -f mmsrc mmsrc.o machine.o *core errs lint.out tags
/usr/bin/gcc  -Dhosttype  -c mmsrc.c
/usr/bin/gcc  -Dhosttype  -c machine.c
/usr/bin/gcc -o mmsrc  -Dhosttype  mmsrc.o machine.o -llibs
client$ op install -c mmsrc /usr/local/sbin/mmsrc
client$ cd ..
client$ rm -rf msrc.$$
That assumes that the name of the current host in auto.cf is "localhost". Replace "op" with "sudo" if you live in the dark ages.

Any of those give you what you need without too many hassles.

Summary

Mpull fills in the pull-source aspect of the master source structure. It is dependent on the availability an rsync module that contains a stable version of the master source repository. It also requires hxmd and mmsrc, which it can only upgrade with some help.
$Id: mpull.html,v 1.10 2012/03/29 19:39:12 ksb Exp $