Specific means every instance may be unique

Pushing an identical configuration to every one of your site's host might be fine for /etc/pam.d or /usr/share/dict/words, but it is not okay for the configuration of privilege escalation rules.

To understand this document

This document assumes that you've been a UNIX™ systems administrator for long enough to know that you shouldn't just login as root for your daily work. After you got burned by a superuser mistake you went to typing sudo or super in front of commands you wanted to run privileged.

But giving non-administrators access to sudo is too much -- you know they are going to destroy something. So you've tried to limit the damage with harder and more detailed limits with sudo and eventually had to give up. Either you've ended up doing more work yourself, or you've given pretty much unlimited access to people that really don't need it.

If that fits you, then you are ready to move up to op. But to move up you need to learn how to get a better policy, not just a sharper tool. In this document I'm going to present the implementation details of getting the correct configuration for each rule to every host for every login and service you deploy.

See the op manual page, and the HTML documents for details about op itself. I'm not going to cover too much of that material (again) here. Later in this document I'm going to spend some time showing how to do this under the master source structure I use for all such tasks, if you've never heard of msrc or xapply before you're going to have to read about it before you get down to Make it so.

The op rule-base is treated like a level 3 package more than a level 2 product. That is to say that the scripts and rules are always pushed together.

Trusted individuals and trusted applications

Login credentials are assigned to an individual based on the fields from some files in the /etc directory: password, group, and maybe project or login.conf at login. Without a setuid (or setgid) bit set on an executable it is unlikely that a process can modify credentials (that is gain additional groups, change real or effective user identifiers, or change their project identifier).

That doesn't mean a process can't get services: lots of UNIX services don't require any special access. This is largely because the UNIX file permission structure is really quite complete for mundane tasks. Programs like ls, and even ps don't have any access beyond requesting user's access.

Individuals get accounts

Systems administrators grant some logins more privilege than others. The basic separation is "those with accounts on my hosts" and "the others without accounts". For the purpose of this document we'll be addressing the individuals with accounts on the hosts run by the local administrators.

It is usually a violation of local policy for a person to give the credentials to another: we don't want an unclear audit trail of who made changes, or who can read Customer records.

Trusted applications get `setuid` bits

Like individuals are trusted with accounts, some applications are trusted with special access. Programs with the setuid bit in their permission bits trigger execve to escalate the effective user (or group) identifier (see the execve manual page). Good examples of such a programs are atq and netstat:

$ ls -als /usr/bin/atq /usr/bin/netstat
 22 -r-sr-xr-x  4 root  wheel  21228  Feb 15 17:05 /usr/bin/atq
132 -r-xr-sr-x  1 root  kmem   133528 Feb 15 17:04 /usr/bin/netstat

A listing via ls displays the letter 's' rather than an 'x' for the execute permission to show that the program assumes the privileges of the owner (group) of the file as it executes. But, like our login credentials,, we don't want escalated programs to give away their special powers for other uses.

While the kernel doesn't know the purpose of the escalation, we draw a distinction between two main classes of escalated programs: gate keepers and shell services.

Gate keepers offer structured access to some system resource: for example the line printer services, job scheduling, trusted network services, or kernel statistics are all protected by setuid and/or setgid programs. To achieve the gate keeper function a programmed task is run on behalf of the individual, then the process exit's with a code to report success or failure. This structured approach keeps gate keeper functions relatively simple and usually quite secure. These programs are gate keepers because they don't offer any interaction other than by the single request made on the command line.

A shell service like su (or ssh) offers an interactive shell with escalated (or different) credentials from those of the invoking process. Usually a shell service requires some additional factor (a password, key-pair, or token) to grant escalated access because the shell access allows such a wide-ranging set of actions. Shell services are hard to secure, even with the restricted option (viz. -r) of ksh, see the section on this in the ksh manual page.

Which type of escalation do you want to give most of your Customers?

I would try to make every escalated action a single request made on the command line. I don't want my Customers to type long commands that might be spelled badly, and I don't want them to use an escalated interactive shell thinking it was just a plain shell.

This is from 25 years of my own experience: if you read this far and think your current structure works you can stop here. Thanks for your time.

If you want to simplify the Customer interface, make it more secure and much easier for you to manage keep reading.

The structure of `op`'s escalation rules

We are going to avoid all the snake pits I've already fallen into by using some general rules for the installed rule-base. Then we are going to break the source to the rule-base up into sections that help use keep it up-to-date. Then we'll talk about closing-the-loop on the rule-base to make sure it is always sane.

General rules

These might be obvious to you, but read them anyway. It can't hurt to review why we do things the way we do.

Don't share the rule-base between hosts

Never export the op rule-base via NFS mounts. We don't want everyone to read the rule-base (to know what rules to try to break) and we don't really want hosts to use a rule-base from another host.

Split groups of escalation rules out of `access.cf`

There is a balance between putting each escalation rule in its own file and putting them all in one or two files: but if you put all your rules in access.cf you are going to be sad. In fact you'll be sad until you find the right balance, and we aim to make that balance easier for you to find.

Putting too many unrelated rules in the base configuration file leads to a never ending deployment cycle, which might include a re-review of every rule in the updated file.

Keep escalation rules for a single topic in the same file

Location of the rule implies some ownership and audit trail. Rules from the same file are (more than likely) from the same revision of the rule-base, so the stop rule likely knows how to the end all the tasks the start rule set in motion.

Similarly all the rules required to run a given service are all present if any are: that's a great invariant I use, all the time.

Name rules with the most significant word first

To the Customer there is little difference in typing:

$ op apache start

and

$ op start apache

but these is a world of difference if I want to encode them in the same rule. Op keys on the first word, which it calls the mnemonic. If we get Customers used to using the verb as the mnemonic the rule-base gets crazy.

If we get Customers used to using the project, facility, or target login as the mnemonic then we can factor common rules out with help from op. And the on-line rule listing looks a lot more structured to everyone.

Later we can add the port number on the end and our Customers might be able to remember the order: I wanted to access "apache", I want to "start" it on port "8443". If I don't always go from the topic to the verb to the details then they might have to try all the combinations to figure it out. So they won't even try -- they will just page you to start it for them.

This is one of the main issues with sudo, when you just add it to the beginning of a long command.

Types of topics we might use to factor rules out of the pool

Here are five main divisions we shall use to break the local configuration into topical files. These divisions are meant to allow fine-grained control of which rules end up on each target host. If a set of rules always move together, then they should be in the same file.

Any given configuration might have more than one of most of each of these types of files. The split is based on the need to send some rules in isolation. That is to say that we are not looking for a reason to put every rule in its own file, but we are also not afraid to do that when warranted.

OS or host-type based rules: These rules are sent to the host based on the base operating system. These contain rules that allow access to features of the OS for administrators and other special groups.
Common examples of this are Solaris zone rules, FreeBSD jails, and Linux firewall access. Since a given host only runs 1 OS at a time there is usually only one of this type of file on each target host.
class or cluster based rules: These rules are sent to a host based on the build class of the node. These rules provide services for the application groups that support that class of server.
Common examples of this are rules to start/stop/test applications we know must be installed on the cluster.
service or role bases rules: These are sent to a host based on the active (or installed) applications defined in a SERVICE attribute (in some hxmd configuration file).
Commonly these are controlled by the op's msrc configuration management control directory by consulting an authoritative control file.
remote policy rules: These are sent to every remote host, but only installed if a guard marked command allows it.
Commonly these are based on the existence of a login, application directory, or both. After an application is installed a follow-up msrc spell updates the op rule-base. The separation is because one happens as a mortal login, the other needs to be run as the superuser. See the remote HTML document, and HTML document for mk which is used to process the guarding marked lines.
support scripts to support complex rules: These shell scripts are sent to all machines, but might only be installed if some active op uses them. (Even in a comment line.)
These are largely run by automation to cleanup after the production system. The crontab line might indirect through the script with mk to extract the correct escalation spell and invocation.

Making it so

Since I use msrc to do this I'm going to assume that you can read control recipe files and have some idea of how those work. If you don't you need to read (at least) the master source quick start HTML document and maybe more on that whole tool-chain. But you can muddle though this without that frame of reference, as you like.

I'm going to construct a master source configuration directory to manage each type of configuration file, based on the topics above. I'll include a hypertext link to the actual recipe file in each section, but abstract it a little in the examples to make them easier to read.

Pushing the top-level (package)

At the top-level we need to apply the recursion control recipe pattern to push the five topic directories to the target host, then let msrc run the control command on each target instance. On the target host the spell we've constructed completes the update the driver requested.

In Msrc.hxmd (the hxmd options file) we record the HOSTTYPE and HOSTOS attributes of each requested target, so that the recursive applications msrc have access to them. We force a special configuration file into the configuration data op.cf (see the file) which contains macros which are used in the platform recipe file. By including the class.m4 (under -j) we allow the use of the class facility in any markup under us. Then we force PRE_CMD to hook in the recursion spell in the control recipe.

In Makefile (the control recipe file) we offer a remote_descend target that visits the other directories in turn, this is hooked into PRE_CMD so that we call-back to the control recipe to do the recursion step just before we run the requested utility. By spelling it in the hook as ${3}_descend we'll hit either "remote_descend" or "local_descend" (which has not been coded yet and fails).

In Makefile.host (the platform recipe file) we just install the files we find in libexec, and the services listed in the host's SERVICE attribute. Each service is mapped to a file under the service directory. There is also a spell to build access.cf with sed (which we'll talk about below).

All the standard payload make targets are supported in Makefile.host, and the extra sane target to check the proposed configuration with op -Sn.

If the utility requested triggers the install recipe then any older versions of the configuration files are moved into OLD (via install), then each collected configuration file and support script are installed. This assures that no out-dated rule files are left in-place.

How we implement more than one grouping of configuration files

For each grouping we have some structure around processing and installing the parts of the rule-base that are destine for a given host.

Once we collect the parts (sent by the msrc structure in this directory) we install the ones selected for the host: that is to say we might send some files that are never installed, but those don't hurt anything.

In the paragraphs below I describe how we send the files that make up each topic.

From host's platform type

In this case there can be only 1 platform type for each host. So we used this as lever to pick an access.cf. That file encodes the DEFAULT section that every other file uses, and that might (might not) have some platform dependent parts in it. If it does this is a good way to pick one.

We stage that file as hosttype/access.cf, then let the top-level platform recipe create the final one. That hook allows some last-minute custom configurations made on each host, given markup in Makefile.host to pick the correct recipe.

From the build class of the host

Very much like the logic to pick access.cf, but in this case we build class/class.cf. The top-level platform recipe doesn't (presently) filter this file, but it could with some extra markup.

From a list of desired services

Here is some greater power. Each host has an attribute named "SERVICE" which is a space-separated list of the services we need to support on the host. This attribute can sometimes be set for whole groups of hosts, and sometimes is set per-host in the configuration file.

In either case we extract the list of services from that macro with this m4 markup:

ifdef(`SERVICE',`SERVTOCF(translit(SERVICE,` ',`,')))

The SERVTOCF function maps a list of services to the files under the service directory:

define(SERVTOCF,`ifelse($1,`',`',`service/$1.cf' `SERVTOCF(shift($@))')')dnl

For a list of services "apache tcpmux radius" that produces "service/apache.cf service/tcmpux.cf service/radius.cf". We used the translit to build a list to shift, the empty list produces nothing, lists with at least 1 element produce that element as a path followed by the service list of the rest of the elements. That makes m4 a lot like LISP.

With that mapping taken care of we are good to go as long as every service we need has a file that describes the rules. If we don't then make will tell us it doesn't know how to build it, or install will choke on the missing file.

From run-time decisions made on the host

In the last section we installed the rules for a service with no evidence that the service was installed or running on the target host. Sometimes we need the rule to start the service, or worse to install the service itself -- that's why we must believe the attribute macro in the configuration files.

In this case we don't just believe the configuration file: we check for some condition at run-time. There is a whole HTML document on just that topic, so I won't repeat it all here.

Follow-up scripts to support the rules above

While op allows in-line scripts, it is not great form to embed lots of logic in the rule-base. It is much better to install a script with the right permissions in a standard place where we can keep track of them, audit them separately, and reuse them. See some notes on that.

In this version of the install script we install every support script on every machine: we really don't need to do that. We should look for the full path to each script in the recently installed rule-base, if it doesn't exist we can never call it so we shouldn't install it. See the HTML notes about coding and installation of these scripts.

This would force some rules to include a comment with the path to any libexec script they call through a variable or via $PATH. Then again they mustn't trust the Customer's path, and shouldn't be doing anything too clever in the rule-base. We'd also have to to install any script called from another, which is why I didn't write a spell to try it. (I believe an oue instance to filter those installed and some clever sed would get it all in a few passes, even.)

This will be upgraded to a better structure in the next release.