To understand this document

This document assumes you are familiar with UNIX shell commands and have run some system level utilities. It also assumes that you have access to a UNIX system as the superuser. The document also uses my "code.css" style sheet to denote the difference between markup terms, parameter designations, and command-line options, environment variables, and a path /seen/in/the filesystem.

If you are looking for the "How to configure document", then you are looking for this HTML document.

Privilege escalation in general

UNIX ™ and Linux services use the least privilege required to perform each task, which makes the whole system more secure. Special groups (viz. "operator", "lp", "mail") give some applications access to protected resources (data, devices, directories), rather than running all local service as the superuser. Everyone takes great care to use secure network protocols (viz. ssh and https) for private data, and to avoid injection attacks or releasing private data to third parties.

Any tools that escalate privilege must also be capable of very fine-grained control, and be as secure as possible by default. Part of that control would include rejection of simple typographical errors and overt acts of subornation, and a clear audit trail. This document describes how to use op (with my modifications) to get exactly what you want, and nothing else.

As a first example allow anyone to change their own shell (under Solaris):

chsh	/bin/passwd -e $l ;
	users=^.*$
	uid=root

A slightly more complex example: allow any Customer in group "web" to restart Apache:

apachectl /usr/local/sbin/apachectl $1 ;
	groups=^web$
	$1=^(restart)$
	uid=root

With that in place anyone in group "web" may run:

op apachectl restart

to restart the running Apache instance. They can't pass any other verb (stop, configcheck, etc.) unless it is added to the $1 regular expression list in that rule definition, or another rule is created.

Tip: I almost always use group membership as the key to escalated access. It is easy to maintain as my Customers change political groups, I don't have to change the op configuration, because their login names never appear in the rules, just their groups. And group access is key to other features of UNIX -- so use it.

Op allows much more complex checking and control, some of which may be out-sourced to an arbitrary helper application. Before we dig into all that we need to explain the basis of design.

Think of op as a firewall. It keeps Bad Guys away from sensitive commands while allowing Good Customers access to make their tasks easier. This is what an IP firewall does, but op does it with shell commands rather than network resources. We follow the same paradigms as any firewall:

limit access to only those we expect
inspect and pass only valid payloads
log access attempts we do or don't allow

UNIX models

The general-purpose privilege escalation on a UNIX system comes in 2 flavors: the setuid bits, and proxy access through a system daemon.

Using setuid (or setgid)

Any program with setuid (setgid) on in its permissions bits runs with effective-uid (effective-gid) set to the uid (gid) of the file owner (see chmod(1)). This allows a mortal login to run an application with the privilege of the owner of the application. The application can "drop" back to the original login at anytime (for example after opening a protected file).

In general, the program running the escalated program has no knowledge that the task is running with escalated privilige. It runs just like any other application. The only exception is that it may not respond to signals as a normal process would.

Using a proxy

Other privilege escalation is often done by making a connection to an running service (with any of the interprocess communication facilities: sockets, FIFOs, shared memory, semaphores, message queues). Usually the service has a client application that formats the request for the service, make the transaction, and returns the requested results to the shell. For example the line printer service allows connections via a FIFO or a TCP/IP socket to send files to a printer via lpro or lp.

In this case the client cannot start the process, so when it doesn't exist no service escalation is possible.

Some of the system daemons which accept these connections may appear to "start on demand" from inetd, tcpmux, sshd, launchd or the like. But these IPC connection are made to an existing end-point that was present shortly after the host booted, or the user logged in. There must be an existing process holding the phone for the incoming call to be answered. And such services are always started with escalated privilege to allow their access to private resources. The sendmail MTA is an example of this type of escalation, which runs at least a group "mail".

How `op` is used

Op may be used to start daemons with escalated privilege on-demand, or to run a program that is not normally setuid as if it were setuid. As long as a rule is allowed by local site policy, and can we phrased in English, op can be configured to follow that rule. Also such service have almost any process attribute changed, for example the current working directory, umask, or nice value.

More to the point the escalation does not have to be to the superuser: most op run as a different mortal account, not as "root".

How `op` does it

For op we want to focus on the first tactic: op runs with a setuid bit and an owner of the superuser ("root"). But op is designed to "drop" to a particular login and group set specifically for each configured application.

For example, a program that needs to remove a mailbox from the local e-mail spool might only have to run with the "mail" group. Coding a whole new application just to run rm would be a waste of time: we can tell op (in part):

rm-mail	/bin/rm -f /var/mail/$1 ;
	uid=.
	gid=mail

That specification tells op to treat anyone running the command:

$ op rm-mail lark

as if the login running this command were in the group "mail" and ran:

$ rm -f /var/mail/lark

The advantages to this:

There is no special path to find "rm-mail".: Either you fill "/usr/local/bin" with lots of adapter scripts, or you add new directories to each Customer's $PATH variables. By putting the adapter logic in the op rule, we obviated the need to add most scripts or extend $PATH.
Setuid shell scripts are a security issue.: Don't ever make a setgid (or setuid) script: there is a race condition in the indirect execution of shell scripts via a symbolic link that allows Bad Guys to break them. Op provides a secure path to any script it runs, and doesn't need setgid (or setuid) bits on the script, it drops to the correct credentials before it executes the target application.
We can revision control and audit the op rule-base: So we remember who built "rm-mail", why she did it, and who needed it. If we do have to put some adapter logic into a script we use the op rule-base as an index to keep track of where they are, not the Customer's $PATH.
We keep the filesystem cleaner: Over the years we've found that little adapter scripts get lost, out-of-date, or otherwise mismanaged. Either they never get deleted, or get deleted while still (rarely) needed. With a well-known structure and policy to grant access and index them we have the elements we need to manage the ones we do need.

The disadvantages to the example presented:

Insecure in that we might be able to remove unexpected files: As coded (in the example) that would allow the removal of any file on the filesystem that the group "mail" could remove. We can tighten that up later -- it is just an example.
Think about giving that rule /var/mail/../*/*": since we didn't forbid the "/../" string that might be able to remove some other files not under /var/mail.
Anyone can access the rule: We didn't limit the rule to a list of groups or users. We should always be explicit when we make a rule as to who we expect to access it.

Later we'll see how to remedy these issues, and how to code more complex rules.

Mechanics of escalation: building rules

"Who can execute this mnemonic?" is the first question we need to answer for each rule. That question is answered in op's configuration file in three parts: parameter matching, basic authentication, and detailed authorization.

To describe these we are going to jump right into op's configuration file, because it is the best way to get you started. Read the page How to configure op to get started.

After you've got some rules installed you can poke a the three help options: -l, -r, and -w. Read about them in the manual page, and under -h. Note that they produce different output for different logins.

In fact op makes a lot of effort to skate on a fine line between telling the Bad Guy too much and telling the Good Customer too little. Sometimes a local admin might change op to be a little less verbose if there might be more Bad Guys about. In fact the "help" script may not be installed at your site, or -l might show an error message about such listings being "forbidden by site policy".

With more examples like the one above we will poke at the other features of op. This only works if you have access to a machine where you can edit access.cf (as root). If you don't have a host to do that on you can just read the manual page, as this tutorial won't help you very much.

See also the more technical review of the configuration file format.

Which rule should we select?

The configuration file may define more than one rule for a given mnemonic: but the first one that matches the input arguments is the only one that the Customer may access. If they do not have credentials to run that one op rejects the attempt.

After the mnemonic name matches additional attributes may be added to the definition of a mnemonic to specify expressions that must match the argument list to select the proper rule.

mnemonic: The first thing that has to match is the mnemonic name itself. That is a literal string match, because RE matched proved too produce unexpected results. By convention mnemonic names should be short strings with no shell-special characters in them.
$#=number: Force the count of the number of words allowed on the command-line to be exactly number. When two mnemonics have the same name, forcing a different number of allowed arguments disambiguates them.
$N
$N=REs: The named positional parameter (viz. $1, $2, $3, and so on) must match one of these REs, otherwise another mnemonic may be selected. The default RE is a single dot (.).
$*=REs: Every other positional parameter must match one of these REs.

Each of the positive matches with a leading $ has a negative version.

!#=number -- forbid some number of parameters
!N=REs -- forbid a string match to a single parameter
!*=REs -- forbid a string match to all parameters
!_.attr=REs -- forbid string match an attribure of the target program: In each case a match of any of the listed REs causes a match of the rule to fail. This is often used to prevent leading dashes in parameters, or the string "/../" (which might be used to climb out of a directory).
!N: This is another way to limit the number of parameters allowed, but $# is preferred, since "$#=3" is clearer than either "!4" or "!4=.". And the sanity checker doesn't check negative limits as well as it should.

Examples of parameter matching

My customer want to run rndc with several keyword option: start, stop, reload, status, reconfig, and querylog.

Most of these the native rndc will do, but both "start" and "restart" need to call the /etc/rc.d/named script:

rndc	/etc/rc.d/named $1 ;
	$1=^(start|restart)$
	...

rndc	/usr/sbin/rndc $* ;
	$1=^(stop|reload|status|reconfig|querylog|help)$
	...

In the example above I anchored the "stop,reload,..." list because that helps op build the correct usage message under -l. I also could build another rule with "freeze" and "thaw" if I needed one. By keying on $1 we make it look like the command namespace is not as flat as it really is.

It is worth mentioning here that op will process more complex REs, but will not show the best usage message (under -l). For example, $1=^(res|s)art$ and $1=^(re)?sart$ both match two fixed words, but the help builder doesn't grok those. This fact may be used to hide the allowed words, but that would be poor form.

The help code does lists as well a disjunctions (e.g. $1=^restart$,^start$). There is code already available in op's engine to expand REs better, but this produces usage messages of nearly unlimited options in some common cases, so we don't do that. (E.g. $1=^[a-z][a-z][0-9][0-9]$ would output a line with about 67,600 alternations.)

Why we match positional parameters

In older versions of op we would have named the rules as:

rndc-start	/etc/rc.d/named start ;
	...
rndc-restart	/etc/rc.d/named restart ;
	...
rndc-stop	/usr/sbin/rndc stop ;
	...

Which really wastes space in the configuration file and causes the Customers to wonder if it was a hyphen or an under-bar they need, and in which order the two words go ("start_rndc" sounds more English). It is almost as bad as the little adapter scripts. The better solution is to match on $1 and the like. And to declare in policy that we always put the "facility" or "application" keyword first in related escalation rules.

In this example I need to pass a helper script some values:

apachectl /opt/web/bin/apachectl $1 $2 ;
	$1=^unsecure$,^secure$
	$2=^(start|stop|restart|graceful|configtest)$

This uses a little different matching tactic for $1: a list of REs that each match a single word. Op knows how to output either under -l:

op apachectl unsecure|secure start|stop|restart|graceful|configtest

Who can run an escalated rule?

Now that we've filtered down to a single rule we need to check to see if the Customer is allowed access to it. Op controls access to each mnemonic based on five attributes. A client must match at least one of this group of three:

groups=REs: Allow execution only for clients that have a group matching one of these REs. If any RE is prefixed with a octothorp (hash, "#") then the match is against the numeric gid, not the group name.
users=REs: Allow execution only for logins clients that have a login matching one of these REs. If any RE is prefixed with a octothorp (hash, "#") then the match is against the numeric uid, not the login name.
netgroups=words: Allow execution only for clients that are a member of one of the listed netgroups. See innetgr(3).

If they don't pass any of those they get a nice error message and op exits with a non-zero status (usually 1).

As I said above: use groups in preference to users to make your life easier. I also prefer netgroups to users, but I don't use them much as group membership works almost every time. I do use "users=.*" to mean "anyone", which op even outputs under -w.

If any of the above match, then these four optional attributes may check deeper:

password

Ask for the user's password to credential the execution. Before being asked they must have been allowed by one of the first three attributes. If this is set as a DEFAULT (see below) then it cannot be reset per-rule.

password=logins

As above, but check against the password of each of the specified logins. If a PAM authentication fails a password specification may still allow the escalation. When a list of logins is configured and matched, three additional specifications are allowed, besides a literal name:

.
%l: The client's password.
%u: The password for the login specified under -u (see below).
%f: The password for the owner of the file specified under -f (see below).
%d: The password for the owner of the directory containing the file specified under -f (see below).

pam

Unlike some other attributes, the empty value turns off PAM authentication. This allows a rule with a common set of DEFAULT attributes to skip PAM authentication.

pam=.

Dot is taken as the default application, which is listed in the version output under -V, usually "op".

pam=application

The specified PAM application must authenticate the requesting user before any escalation is allowed. The requesting user and the remote user are both set to the requesting login, the remote host is "localhost".

Commonly specified applications: "su", "login" or "system". Using "sudo" would tie op and sudo to the same policy, which could be clever.

This option fulfills (skips) any password check when satisfied. Other specifications:

helmet=path

The the program specified by path is run setuid/setgid, if it exits zero the access is allowed. Such a program is only consulted if one of the first three rules above allowed the access, and any password specification was met.

jacket=path

The program specified by path is run setuid/setgid to monitor the progress (and completion) of the new process. Such a program is only executed after all other authorization checks, including any helmet provided.

This is not really intended to deny access, but it can and should in some cases, see "jacket", below.

Helmets and jackets have other uses, see "Helmet and jacket programs", below. But for now just take it on faith that these provide some additional checks that you might want someday, and move ahead.

Examples for matching for access

To explicitly match all customers (to prevent the sanity check for complaining):

users=^.*$

To match members of group 0 as a client, by gid:

groups=#^0$

To allow everyone in group staff that knows the "operator" password:

groups=^staff$
password=operator

To allow any member of group "wheel" that can also su:

groups=^wheel$
pam=su

To allow the owner of a workstation access to install a set of mnemonics put them in a netgroup named "owner" and set:

netgroups=owner

This is how we specify host-based access in op's configuration. There is no other way to directly limit the scope of a mnemonic to a given host: that is a job for msrc, hxmd, a helmet, or a netgroup.

This is broken, as the netgroups code cannot get a list of netgroups to match REs against.

netgroups=.*

Always list netgroups lists explicitly without RE markup:

netgroups=localadmin,netadmin,operator

This allows anyone to try the rule, but the helmet rejects clients without LDAP authentication:

users=.*
helemt=/usr/local/libexec/jacket/ldapcred
$LDAP_CRED=$l:$r
$LDAP_LEVEL=admin

The "ldapcred" helmet exits 0 for success and may remove the two parameter environment variables via the API below. Note that LDAP_CRED and LDAP_LEVEL are arbitrary names I picked, while $l and $r are op markup described below.

Required options may be added

In addition to those access limits above, a rule may require any combination of four op command-line options (specified before the mnemonic):

-f file: The specified file is matched against (possibly many) attributes before the name of the file or an open file descriptor on that file is passed to the escalated command. A failure to match any attribute denys access to the mnemonic.
-g group: The specified group is matched against several attribute checks before allowing access to the mnemonic.
-u login: The specified login is matched against several attribute checks before allowing access to the mnemonic.
-m mac: If Mandatory Access Control support is compiled in, then this option may specify (part of or all of) a process label for the escalated process.

These options are mandatory whenever each of them is called upon for a value. If the option's value is never required then the specification of the option doesn't allow access with an error message, for example:

$ op -f /dev/null help
op: Command line -f /dev/null not allowed

There are two ways op calls for a value from the command-line: by a reference in the context of an attribute as a percent macro (%f), or as a parameter specification when building the actual command as a dollar expansion ($f). In the configuration file a third form (!f) represents a negation or a disallowed value for the option.

The first type allows op options in the rule definition to reference any %f, %g, or %u as the target value for the option. The second expands the the appropriate value from the command-line option of the same letter when building a command (or environment variable). For example, using %f when a login name is expected, substitutes the owner of the file. Using $f in the args section of the rule definition substitutes the path as part of the executed command. We don't use the percent form there because we don't want to make percents special in that context, and we don't use dollar in the other place as that is a legitimate value for some options (for example part of an RE).

In the password description above we've already seen that in that context %u expands to the user's login.

Permission configuration for option values

These attributes specify who may access the mnemonic. Once the selection of a mnemonic is made these checks may reject the access (on the mnemonic), which logs a failed escalation attempt.

%g=REs: The command-line -g's group must match one of these REs.
!g=REs: The group specified on the command-line must not match any of the listed REs.
%u=REs: The login specified on the command-line must match one of the listed REs.
!u=REs: The login specified on the command-line must not match any of the listed REs.
%u@g=REs: The login specified on the command-line must be a member of a group that matches one of the listed REs.
!u@g=REs: The login specified on the command-line must not be a member of any group that matches one of the listed REs.
%f.attr=REs: The file specified on the command-line has its stat(2) attribute checked against the listed REs, one of which must match.
!f.attr=REs: Same as above, but none of the REs are allowed to match the attribute.
%d.attr=REs
!d.attr=REs: Same as above, but compare to the directory containing file.
%_.attr=REs
!_.attr=REs: Same as above, but compare to the proposed target executable. For example we could limit the owner of the file to a particular login.

For the case of %f the attr must come from this list, most of which are taken from struct stat members with the leading "st_" removed.

dev

The file's device number in decimal.

ino

The file's inode number in decimal.

nlink

The file's link count in decimal.

atime

The file's access time in decimal.

mtime

The file's modification time in decimal.

ctime

The file's change time in decimal.

btime or birthtime

The file's birth time in decimal (not available on platforms other than FreeBSD).

size

The file's size in decimal bytes.

blksize

The file's block size in decimal.

blocks

The file's size in 512 byte blocks.

uid

The file's owner as a decimal uid.

login

The file's owner converted to a login name.

gid

The file's group as a decimal gid.

group

The file's group, converted to a group name.

login@g

The file's owner is treated as %u@g: the owner must be a member of a group matching one of the given REs for the file to pass. Inverted under !f, of course.

mode

The file's mode as a four-digit octal number.

users

The owner and group of the file are compared to the RE list as owner:group. The owner must map to a login name, the gid must map to a group name.

perms

The file's permissions as ls might display it.

path

The file's absolute path.

access

A four character string representing the return values from four calls to access(2) against the file's: "rwxf" would indicate all access, while "----" would indicate no access at all.

type

The file's type letter as ls would display it in the first column of the symbolic permissions.

When the file is a symbolic link, then (after the 'l') the type of the file the symbolic link points will be added.

If the (possibly indirect) file is a directory and is an active mount-point, then letter 'm' is suffixed. If the directory is empty the letter 'e' will be suffixed. So a match for de match an empty directory that is not a mount point, while dme matches an empty mounted filesystem -- either of which might be referenced via a symbolic link, unless the expression is left-anchored.

Examples of argument specifications

To fix the first example (rm-mail) we can make sure the name of the mailbox is a valid login name:

rm-mail	/bin/rm -f /var/mail/$u ;
	uid=.
	gid=mail
	%u=^.*$

To allow any file under /tmp, not owned by login sshd:

	%f.path=^/tmp/.*$
	!f.login=^sshd$

To allow anyone in group source to specify another member of group source (even themselves):

	groups=^source$
	%u@g=^source$

To allow any group that contains "web" in the name we could use an unanchored RE, but then sanity will carp at us, better to be more explicit:

	%g=^.*web.*$

And lastly the ever popular "anyone but the superuser":

	!u=#^0$

What can `op` change about a process?

There are about 20 attributes of a process one might escalate or remove to make escalation safer. That is to say any tool like op should be able to change any these attributes in a predictable way to assure that the new privileged command is as safe as it can be. Under op most of those attributes may be forced to specific values.

By default op modifies the environment for a mnemonic command by changing the effective uid to 0 (the superuser), and removing any supplementary groups, then cleaning the environment. This default is modified by putting attribute settings on the mnemonic "DEFAULT" (which is never used as a Customer driven mnemonic).

For a specific rule, any default should be replaced with an explicit value that gives the minimal privilege required to meet the intent of the rule. If your policy has a lot of rules that need the same privilege (uid, gid) you might look into using compile option SENTINEL to enable superuser managed sentinel configurations. These are directories under the top-level configuration directory that are named for and owned-by a group, which contain a stand-alone configuration for the owner and group of the directory. If you can't do that you might compile separate sentinel copies of the op binary to assure least privilege rules, but do that only if you must.

Letting the system administrator link out-sourced sentinel configurations to the common rule-base actually has 2 benefits. It allows a site policy that audits all setuid programs strictly, otherwise mortal users just install an insecure perl script mode 6555 to meet their needs. It also enables the administrator visibility (via the symbolic link) to all the policy directories to help auditors (and themselves) find them. If your site policy doesn't mandate audit of all setuid executables you should think about why you have a site policy. Or, more to the point, why you don't have a site policy.

Below we list the process attributes that op might change, and some idea of why that is something op might do.

$VAR

Pass the given environment variable as-is: don't remove it from the original environment. This might be used to pass $TERM for example.

$VAR=value

Set the given environment variable to the exact value. This might be used to set a $PATH, or $TZ.

Because op's configuration parser breaks words are white-space, values with embedded white-space use a special markup ($\s) to code a literal space. Later we'll see that this can also be done with a helmet, or jacket.

basename=word

Force a different argv[0] for the new process. Some programs (like sbp) look at the name of the program to force command-line options. Also most shells look for a leading dash (`-') in the name to start a login shell. Not often used.

chroot=directory

Change root for the process. Used to start network programs that use a restricted environment. On some systems this is way harder to setup than others.

daemon

Double-fork the process into the background, redirect I/O to /dev/null. Used to start daemons processes: these also don't stay connected to the controlling terminal device, see setsid(2).

dir=directory

Change directory here first. Has the obvious use. Usually changes to the root directory, or a directory that the Customer normally could not access.

environment

Allow all existing environment variables to pass. This is only used when the effective uid and gid are left as they were; then we can pass the environment as-is because no possibly compromising escalation was done.

environment=REs

Allow any existing environment variable which match any of the listed REs through to the new process. This is often used to allow access to one of ksb's wrapper applications.

gid=list

When there is exactly 1 group name in the comma separated list, this forces the real gid to that group. For example, the real group identifier might be used by the escalated process to restore the user's original group (via %l).

Otherwise this may take the place of an initgroups to set an explicit group list (one which might not be possible for any existing login). In addition to an explict group name, gid (by decimal numner), these markups assume the related group: %g, %u (primary login group), %f (group owner), %d (group owner), %l (client's current real group), or . (use the invokers gid).

When the rule has both a gid list and an initgroups list the results should be the unique elements from both the list and the groups from the specified login, but that is limited by setgroups's limit of NGROUPS_MAX+1 available slots.

egid=word

The effective group may be different from the real only if these are both set. Takes the same specification as gid. This is also always the first group in the group list, because that's the convention on a lot of UNIX systems.

uid=word

The real user identifier is forced to the specified value. The word may be a login, uid, %u, %f, %d or %l (use the invoker's real uid). The empty string is an alias for the invoker's real uid. The default uid is the effective uid given by the setuid bit on the op binary, usually the superuser. Since that might give away too much privilege the sanity check asks that you specify an explicit uid for each command. You can suppress by setting one for the DEFAULT stanza. A great value for that is "nobody", in my humble opinion.

euid=word

The effective user identifier is forced to that given value. The default is that value of uid.

initgroups

When no word is provided op calls initgroups(3) on the same login as the uid set (either effective or real).

initgroups=word

Force an initgroups call on a specific login with this specification: one of any valid login name. %u, %f, or %d for the related login, or %l (aka .) for the current list.

fib=number

Use the setfib(2) system call to set the routing table for the new process. This is only available on FreeBSD systems, see -H output for availability.

mac=markup

If the operating system supports Mandatory Access Control process labels, then the markup string is expanded like a parameter or environment variable, then applied as the new process label for the escalated process. The command-line option -m mac is a conduit to allow (parts of) the label to be specified at run-time (see $m).

nice=number

This allows tasks to run with greater priority than the default. The nice value ranged from -20 to 20 on most UNIX systems.

stdin=redir

stderr=redir

stdout=redir

Force the input (output, error) channel of the new process to redir. Redir may be prefixed with the standard shell input/output redirection markups (<, <>, >, >>) to modify the open(2) flags. The file may be specified as %f, or an path to an existing file. Op won't create a file with such redirection.

session

Turn off any PAM session default.

session=login

Setup a PAM session for the given login. The application requesting the session is always the default on listed under -V, usually "op". The client user is the requesting session, the remote host is "localhost".

The login may also be specified one of the common way to get a login: by login name, %l, %u, %f, or %d.

Or the initgroups login may be specified as %i. When no initgroups is set, the value is either of uid or euid in that order.

Note that %i is only available under session and cleanup, since usually it makes sence to provide a session for the same login as the initgroups.

cleanup

Turn off any cleanup default.

cleanup=login

Taking the same specification as session, fork(2) a process to call pam_session_close(3) after the escalated process exits. The same specifications as session are allowed, with the special dot (.) specification interpreted as a request to exactly copy the session specification (even if empty).

This specification is normally not required unless the session started a co-process (for example an instance of ssh-agent in the case of pam_ssh).

umask=octal

Set the processes umask (default 022). Mostly used to unsure that the client doesn't make a file with escalated privileges that is insecure.

Examples of limits

To start the real-time Large Hadron Collider process with elevated scheduler priority:

cruncher /opt/atomic/bin/smasher $* ;
	groups=^lhc$,^eotw$,^admin$ ...
	uid=mighty gid=mouse
	stderr=>>/var/atomic/errors
	nice=-4  umask=0026
	$PATH=/opt/atomic/bin:${PATH}

To allow users in group "operator" to cat any single plain file on the filesystem:

cat	/bin/cat ;
	groups=^operator$
	uid=. gid=.
	%f.type=^-$
	stdin=<%f

The only part of the escalation that runs as the superuser is the open of the file. The cat process runs as the mortal that ran op. That is so cool.

Seeing the impact of each one

Most of a process's attributes may be displayed by running the showme.sh script. I often use this script to test op's environment logic in new rules I crafted. For more advanced checks you might need a perl or C program to produce special output.

Building the command to run

If the first word after the mnemonic is a command path, then the args after that are all positional parameters to that utility. For example the target program is rsync for this rule:

snap	rsync -arSH $* ;
	dir=...

When the first word after the mnemonic is a lone open curly brace ({ followed by white-space) then the lexical part of the configuration file parser builds an in-line script out of all the characters until it finds a close curly brace (}) as the first non-white-space character on a line. The script is effectively replaced with 3 tokens ($S -c $s). the args after that are positional parameters to that script. (Recall that most shells treat the first paramerer after the -c specification as $0 in the script, not $1.)

snip	{ TEMP=`mktemp /tmp/some$$XXXXXX`
	...
	} $- ;
	users=...

If the first word is MAGIC_SHELL then something totally different happens. The MAGIC_SHELL token is discarded. If that leaves no args then a default argument list will be constructed later. Otherwise the argument list is expanded as given, but the meaning of $* and $@ changes:

shop	MAGIC_SHELL ;
	uid=...

The "echo" command is an exception to the first rule. Op, like the shell, has a built-in echo command. This avoids a complaint from the sanity checker about the disposition of $PATH for every escalation that just updates a flag file:

puma	echo $1 ;
	$1=^stop$,^go$,^debug$
	stdout=>/var/run/puma ...

Expander markup for building commands

The arguments to the new process are expanded from the list of words after the mnemonic and before the delimiting semicolon (;") or ampersand (amp). These words are expanded via a shell-like substitution. The dollar-sign ("$") is the only special character. No backslashes, no quotes for white-space. This is an attempt to make it clear to an auditor what the expansion will output, while still allowing useful replacement operations.

Expanded from the rule definition

$0: The mnemonic specified on the command-line.
$_: The path to the program we are going to execute. This cannot be used to create itself, of course.
$s: The in-line script provided in place of a command path, without the delimiting curly braces. This is an exception to the rule about case (below), this is just the best letter to represent the script text, and it is often used with $S.
$w: The name of the configuration file that defined the access rule.
$W: The line number in $w that started the rule stanza.

Expanded from the UNIX credentials

In general each lower-case letter is a string, while the upper-case version is a number. For example $l is a login name with $L as that login's uid.

These are expanded from the credentials that the UNIX process holds (uids real and effective and the like), and the provided environment, and the command-line:

$a: The group list of the client process. For example "staff,wiz" when the process was in two groups. Which makes $A a list of the gids.
$i: The target login for any initgroups(3) call from the rule. Which makes $I the uid of that login.
$h: The home directory of the client login.
$H: The home directory of the target login. This doesn't follow the case convention, but the numeric rule doesn't really apply to a home directory.
$k: The shell listed in the password file for the client login.
$K: The shell listed in the password file for the target login.
$l: The client login name. Which makes $L their uid.
$n: The new group list given to setgroups(2), which makes $N a gid list.
$o: The target real group, which makes $O the group's gid.
$r: The clients real group, which makes $R that group's gid.
$t: The target login, which makes $T be the target uid.
${ENV}: The value of the environment variable ENV as it was in the original environment. Note that it is unlikely that ${10} will find an environment variable named 10 in the process environment, as the shell doesn't allow assignments to variables named with all digits. The tenth command line parameter is spelled $10 when op needs to reference it.
$S: The value of $SHELL, if allowed from the original environment, or /bin/sh when no value for SHELL is present (or allowed). This is used largely for MAGIC_SHELL and in-line script support.

Expansion based on the command-line presented

There are several expansions used to construct the utility command and parameters executed by op:

$1, $2, ...: Each positional parameter after the mnemonic is available by the same name a shell script would use. Mentioning the name as a word, or substring of a word expands the actual value in place of the markup.
$N: The N-th command-line parameter (like the shell) $1, $2, $3 and so on.
$*: The arguments specified on the command line that were matched by the $* attribute below. $* squeezes out empty parameter words.
$@: The same words as $* (above), but the original work-breaks are honored.
$#: The number of words presented on the command-line. This doesn't always match the number of words in $@ as some of them might have matched fixed $N's.
$+
$-: Much like $*, $+ expands to all the words given on the op command-line, pushed into a single word. As a parallel to $@, $- exapnds to those same words, with the original word-breaks preserved.
$f: The absolute path to the file given on the command-line.
$F: Substitutes an open file descriptor (small integer in decimal) open to the file specified on the command-line. This descriptor will be read/write if possible, else read-only. It is safer to ask for read or write with the stdin, or stdout attributes below when possible, viz. stdin=%f.
$g: The group name provided to -g, or the part of the login specification after the colon. Any mention of this in a rule forces -g on the command-line (unless a -u specified as login:group is included).
$G: The gid of the group name provided to -g (also honors the group name after the colon under -u).
$u: The login provided to -u. Mention of this anywhere in the parameter list forces the need for a -u on the command-line.
$U: The uid of the login provided to -u (which also forces the need for that option).
$d: The directory part of the -f's file value. Substitutes the absolute path to the directory containing the file specified on the command-line.
$D: Substitutes an open read-only file descriptor (small integer in decimal) on the directory part of file. Caution: this can break the chroot attribute (below).
$m: The value specified under -m. This is usually a complete process label, or the part after the colon.
$M: The value of the current process label. While not actually a command-line specification, it is related to to $m (above).

Fixed strings

These are fixed strings that are useful markup, largely to overcome limitations in op's configuration parser.

$|

The empty string. This is useful to deny a semicolon its special meaning, see the examples below.

$\c

Allow any of the black-slash escapes tr(1) allows for special characters:

a alert character
b backspace
f form-feed
n newline
r carriage return
t tab
v vertical tab: As in tr(1).
o an m4 open quote (`)
q an m4 close quote ('): These are provided for sites that markup the rule-base with m4 and don't know how to use changequote effectively..
s a space (not from tr): This was added to allow spaces in command arguments. Actually it is more often in environment variable values.

$$

A literal dollar sign.

$|

The empty string. This allows "$1" to be followed directly by digit "3", as "$1$|3".

Examples of expander markup

There is a handy alphabetical list of expanders in the reference document.

To remember the original login name in $ORIG_LOGIN:

rule	command ... ;
	$ORIG_LOGIN=$l

To get a rule to echo a semicolon use:

echo	echo $|;$| ;
	uid=. gid=. users=.*

To get a parameter with an embedded space:

date	/bin/date +%a$\s%b$\s%e ;
	uid=. gid=. groups=^tiger$,^operator$

or you could use an in-line script, but I don't use them often for these reasons. Here is an example where I might, because I need some spaces and I/O redirection both:

date	{
		/bin/date +"%a %b %e"
	} ;
	uid=. gid=. groups=^tiger$,^operator$
	stdout=/opt/tiger/config/shutdown

Note that to do the redirection in the script it you have to run as the tiger application login, which is less secure.

date	{
		/bin/date +"%a %b %e" >/opt/tiger/config/shutdown
	} ;
	uid=tiger gid=. groups=^tiger$,^operator$

A different use of the in-line script is a dynamic DNS update with nsupdte. Since we need to keep the authentication key secret, and the op rule-base is already protected we can stash the whole update script here.

dnsSet	{
		# key from stdin
		( cat -; cat - <<-! ) | nsupdate -v
		server 10.10.10.254
		zone dynamic.example.com.
		update delete sulaco.dynamic.example.com. A
		update add sulaco.dynamic.example.com. 86400 A ${1}
		# show
		send
		!
	} $0 $1 ;
	$1=^[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*$
	uid=. gid=. users=^root$,#^0$ netgroups=owner
	stdin=</some/path/to/key.cmd

This rule is usually only run by dhclient from /etc/dhclient-exit-hooks as the superuser, or as the owner of each workstation.

To get the rule above to the target host without giving away the crypto-key is a whole topic all by itself. I'll just say here that msrc can merge the dynamic name of the host and the appropriate key from a file that mortals cannot normally read, and op works for that step as well. The key.cmd file would have command to set the crypographic key, e.g.:

key dynamicKey aabbCCddEE00FF==

Two more notes: padding and help output

First padding with $0 before the IP address is inserted because the One True Shell treats the first positional parameter after a -c option as $0, and it is more confusing to most programmers do the inverse (viz. count from 0 in the in-line script). Second the help output under -l gives the usage:

op dnsSet IP

which looks like magic, because we didn't tell it the first parameter was an IP address. This is because op recognizes some REs for what a human would call them. See a paragraph about that.

To set some of the same environment variables sudo might:

rule	...
	environment=^(COLORS|DISPLAY|HOSTNAME|KRB5CCNAME|LS_COLORS|MAIL|PATH|PS1|PS2|TZ|XAUTHORITY|XAUTHORIZATION)$
	$SUDO_COMMAND=$+
	$SUDO_USER=$l
	$SUDO_UID=$L
	$SUDO_GID=$G
	$LOGNAME=$t $USER=$t $USERNAME=$t
	$HOME=$H
	TERM=unknown

Note that SUDO_COMMAND will not be exactly the same, but it is close to what you really want to know (if there are forced parameter placements in the command those are not included in $+). If I really wanted to emulate sudo I think I would code a helmet to do it: I believe the rules structure in sudo is far too complex to really audit.

To set some of the same environment variables super might:

rule	...
	ORIG_USER=$l ORIG_LOGNAME=$l ORIG_HOME=$h
	USER=$t LOGNAME=$t HOME=$H
	IFS=$\s$\t$\n
	PATH=/bin:/usr/bin
	SUPERCMD=$0

I'm not really sure about HOME, since the wording in the manual page is unclear (to me).

Helmet and jacket programs

Other privilege escalation tools try to think of every aspect one might with to use limit access, or to log about an access: op has a simple, but complete API to allow the administrator to "plugin" any check they can code in a program.

The basic protection is provided by a helmet (that is a play on the idea that it protects your head). A helmet is a program that is called to approve the access just before op is ready to run the program. It receives a long list of options and parameters the explain the context that op is in, and expects the program to exit 0 only if the access should be granted.

The helmet may add or delete environment variables from the target process's environment with an overly simple protocol. And it may recommend a failure exit-code back to op. Usually a helmet is passed any addition specifications via environment variables set in the rule's configuration. These are removed by the helmet if they should not be leaked to others (although they may appear in the process table for a very short time as the helmet executes).

For example below I made up a rule that called the helmet "time-box" to check a time-box of 22:00 to 06:00 for the start of the "backup" mnemonic:

backup	/usr/local/libexec/doBackups ... ;
	groups=^operator$
	helmet=/usr/local/libexec/helmet/time-box
	$TIME_BOX=2200-2400,0000-0600
	$TZ=...

That helmet should remove $TIME_BOX from the environment for two reasons: (1) other programs might use it in conjunction for some unrelated filter when set, and we don't want to trigger that by accident, and (2) we don't need to radiate information about the allowed access. Also note that we need to set $TZ if we are going to look at the clock, right?

There is an example perl script jacket.pl in the source to op, or you may view an HTML version of the same code.

Checking to see if the current hour and minute is in a range of integer values is left as an exercise to the reader. But if we had that check we could use the jacket.pl code above and add to our "CHECKS AND REPARATIONS":

	use POSIX qw(strftime);
	my($hhmm);
	$hhmm = strftime $ENV{'TIME_BOX'}, localtime;
	print "-TIME_BOX\n";
	if (your code here) {
		print "-TIME_BOX\n";
		exit 0;
	}
	print STDERR "It is not your time\n69\n";
	exit 69;

There is a more complex timebox implementation in the libexec/jackets package. It allows both inside, and forbid specifications. It is even possible to make that a jacket and kill the escalated process if it leaves the allowed window.

A jacket

A jacket wraps the escalated process so it can wait for it to exit. Along the way it could time the process, kill the process, restart failed processes, or block other people from using the same resources as the process. It might even run a wrapper diversion for the task, like ptbw.

It can do anything any co-process might. It is up to you to justify the use of the extra process. It may cleanup after the escalated program or run the parts of the application that need elevated privileges.

The original op process becomes the jacket program, a child process (already fork'd) is about to execve the escalated program. Before it does it blocks reading from stdout from the jacket.

The jacket sets up any times, file locks, or what-not it needs, then closes stdout to free the blocked child. If it can't gain the locks it needs or smells something bad, it can write a non-zero exit code to the child (say 75) and op will abort the execution and exit without starting the target program.

The same program might be both a helmet and a jacket: it can tell which context it is in by the -P option, only jackets get that one.

The `MAGIC_SHELL` form

This form usually allows arbitrary command execution, so I don't favor it. In fact the number of times I'll installed a production magic shell rule is exactly two. I'd prefer ssh with a well controlled authorized_keys file, with a forced command, as is the local policy at NPC Guild.org. At the very least you should be really picky about who can use this access: at the least I'd limit it to a single (special purpose) group.

Use of $* to build the `string` specification

Remember that the $* markup represents all of the command-line parameters merged into a single parameter. Any sense of word separation is lost (replaced with a single space). So, for example one could invoke a command-line script as:

test1	/bin/sh -c $* ;
	groups=^anyapp$
	uid=. gid=.

The -c's single argument is all the words on the original command-line. To test this I ran:

op test1 date \; hostname

which produced the output from date and hostname. Note that I back-quoted the semicolon to get it passed as a literal word to op.

The MAGIC_SHELL markup does the same thing, but it is sensitive to the case were no arguments are provided. So in the case where no args are specified: op removed the "-c $*", so we get two possible commands, with arguments we get:

$S -c $*

And when presented with no arguments it builds:

$S

This provides an interactive shell with no command-line parameters which is what some people want. Note that the shell is always indirected through the $S markup, to provide for the case where no SHELL is set in the client's environment. The markup $k ($K) may be used to set a $SHELL in the escalation options, if you know who to trust. Otherwise for an absolute path to the shell you need.

There is one other magic element: if the shell you set contains the string "perl" then the "-c" is changed to "-e" because that's what perl wants.

Alternatives

By putting $* in the context of an environment specification the same transformation (joining the words together with spaces) may be rendered:

	$OP_ARGV=$*
	$OP_ARGC=$#

which does give the new process a different way to get the command-line parameters joined into a single word. This might also be useful to pass the arguments to a helmet or a jacket.

Examples of MAGIC_SHELL specifications

To allow anyone in group "wheel" to become the login "operator" on demand:

operator MAGIC_SHELL ;
	groups=^wheel$
	uid=operator initgroups=operator

To force that access though the restricted shell and limit the first word a little:

operator $k -r -c $* ;
	groups=^wheel$
	uid=operator initgroups=operator
	$1=^/sbin/dump$,^/sbin/restore$

Note that the $1 limitation doesn't really add any security, as $2 might be an option that makes the command exit to allow a command after a literal semicolon to run anything. By giving access to a shell you are removing almost any limit from the Customer.

This is a feature that people like, but I don't really think you need to use it in production. The operator example above could be better thought-out an expressed as the 3 things you really need to do as the operator login, not a general shell access. Maybe a command to remove a dump, create a dump, and one to get started with restore. I doubt there are so many commands one would run as operator that they cannot be matched by op's RE logic.

My favorite exploit was passing the text below to a magic shell to get an interactive session as the target login:

$(DISPLAY=localhost:11 /usr/local/bin/xterm -ls)

You can depend on your Customers to be more creative than you thought they could be. If they can load commands that are then run by op, then they will load a command with an option to get a shell, count on it.

Compatibility with version 1

There are a few options from version 1 that are emulated in version 2, and some that just don't exist anymore.

fowners: This should be replaced with %_.owners. Presently it is accepted as an alias.
fperms: This should be replaced with %_.perms. Presently it is accepted as an alias.
nolog: This option reduces the syslog priority (see syslog(3)) from notice to info. In version 1 it removed the notification all together.
help: There is no option to replace this. If you need to hide the output of -l or -r you should use an in-line script and pass the arguments in environment variables.
securid: Replace this with a local pam policy.
xauth: The xdisplay jacket provides this service.

In-line scripts are spelled with curly braces and a newline ({...\n}), rather than single quotes. See the configuration document for more information.

Lastly the delimiting semicolon (;) must stand alone to end the command specification. Some older versions of op would accept this:

name	su - root;
	users=...

These were both changed to allow a rule to include a semicolon, single quote, or double quote in the command specification.

Sanity checks

Under the -S option op tests each rule in the rule-base against my own ideas of what is "sane". When it finds something it should flag as "insane", it produces an error message on stderr. If the error is serious it exits with a non-zero exit-code from <sysexits.h>, see sysexits(3).

Over 40% of the code in op is totally dedicated to the sanity checker. If you use op for 1 reason it woud be the sanity checks that keep the system administrator from installing a really bad rule-base.

The sanity drops to the real uid of the invoker when frisking any files given on the command-line. But an op rule to allow a mortal administrator to run "op -S" is not out of the question.

Remember that the rule-base checks are against the filesystem and password and group files (and maybe netgroups, and PAM configuration files) -- so you can't just run them on any singe host. To get the most out of these checks you'll have to run the sanity check on at least 1 host of every `class' you make, and maybe all hosts once every audit cycle.

Example sanity checks

Allowing unanchored regular expressions to match users or groups might be silly (as noted in the examples in that section). Giving a regular expression to netgoups or any other option that can only accept a literal string would be bad.

Giving non-numeric values to nice, umask, or $#, or N in $N is really frowned upon.

Giving specifications for -u, -g, or -f then not using them in the rule (or the opposite). Forcing a single explicit match of an option -- which makes it not an option, rather it becomes mandatory specification.

Requiring a password from a login that doesn't presently exist on the host.

Setting a DEFAULT rule that is never used (other than the one in access.cf). Putting a mnemonic matching option in DEFAULT.

Adding rules that match the same patterns as one above it. Or putting the same mnemonic in more than one file (as you can't be sure which is consulted first).

There are a lot of path checks as well, and some others that I hope you'll never see. We even try to predict the name of the program to be executed, actually we go to a lot of trouble to find it.

Crossing the fine line between sane an not-so-much

Once in a while you do want to allow the Customer to set a PATH. I'm not at all sure you can justify that, but here is a work-around:

trusted	{	# in-line code
		...
	} ;
	groups=^wheel$,^root$
	$PATH=${PATH}

I would actually set $PATH to a known value and pass the Customer's $PATH in as a parameter:

trusted	{	# in-line code
		...
	} $0 ${PATH} ;
	groups=^wheel$,^root$
	$PATH=/usr/bin:/usr/local/bin:/sbin:/usr/local/sbin

The sanity checker exits with a code from <sysexits.h>. Some of the issues detected could be safely ignored, if you were sure that some later step in your build process was going to install missing files, or add netgroups, users, or groups. These are usually force to EX_NOINPUT, EX_NOUSER, or EX_NOHOST. Other exit non-zero codes almost always indicate an issue that should be addressed in the rule-base.

I'll concede that EX_PROTOCOL is a strange overload for a questionable path specification or an out-of-bounds number.

The message below is always a Bad Sign:

op: rule: a missing semicolon may have consumed all options

(Which could also be a missing ampersand, but you get the idea.) When the DEFAULT stanza really contains all the options your rule needs, then add a %_=., (saying, "the command is at least 1 character long"), which is effectively a no-op, to suppress this message.

The command line

The command-line for op has more modes than most tools:

op [-f file] [-g group] [-u login] [-m mac] mnemonic [args]: In this mode op looks up the mnemonic that matches the args given, then uses the context of the current process to authenticate the escalation. When all goes well the op process becomes (or jackets) the requested process.
Any command-line login and group must match and rules for %u, !u, %u@g, !u@g, and %g, !g. Any command-line file must match all the %f, !f qualifiers.
op -l [login]: This mode requests the list of rules the current user might request. The superuser may specify a login to request another's list.
op -r [login]
op -w [login]: List what op might run as the result of the corresponding -l command. This sometimes help Customers see which rule they want. Under the -w option also list why each role is allowed. This is great for audits.
op -S [files]: This mode is only used to sanity check the addition of new files to the existing op rules, or when no files are provided to sanity check the existing policy.
op -Sn [files]: Under this switch op does not read the existing rule-base to check for integration with the current rules. If the file you are checking is a new version of an existing file you'll need this to remove errors about duplicate rules.
op -h: The standard on-line help all my tools provide.
op -H: This is an extended help text to allow rule writers a "quick reference" view of the configuration attributes. It is built from the same list the sanity checker uses to list unknown attribute settings.
op -V: The standard on-line version information, plus any compile-time option settings.

Strange addition to -f

One may specify a nonexistent file under -f, as long as the attributes only match:

path: To force the nonexistent file's location.
perms (always n---------),
type (always n),: To check for the file's nonexistence.
access (always ----): A nonexistent file has no access available.
anything else: A match against any other %f or !f attribute rejects the attempt.

Authorization is out-sourced to helmets

Op out-sources authorization to helmets. If it is not enough to known who a person is to make an escalation safe, then you need to know who authorized the access, or maybe which policy allows the access (e.g. time of day, phase of the moon).

Such checks are done in a helmet, see the jacket document for much more on that topic. I'd wait to read that link until you need more power than authentication grants you.

What I didn't add

I didn't add a bunch of checks that you'll never need. If op gets though the basic allow checks and the checks for the -f, -g, and -u options then any additional checks you build into a helmet is usually just for the over-cautious, or for a local policy no other site would use or understand.

Features that are not missing, just less obvious

Op eats its own dog food to allow administrators to access -l for other logins. Use this rule to allow anyone in groups "wheel" or "staff" to see anyone but root's allowed rule list:

op	/usr/local/bin/op $1 $2 ;
	groups=^wheel$,^staff$
	$1=^-[lrw]$
	!2=^root$,^0$
	uid=0 gid=.

Run this rule as:

op op -l ksb

More dog food: use op to let anyone in group zero run a sanity check as the superuser:

op	/usr/local/bin/op $1 $* ;
	groups=^0$
	$1=^-S$
	uid=0 gid=. initgroups=root

We don't match $* because op does a fine job of that for me.

Summary

Op offers all the features you want in a privilege escalation structure with an easy to audit configuration. Its simple rule-structure specifies clear and isolated definitions, while still allowing complex rules. Support for less often required services (such as timeboxing access to rules) is out-sourced to helmet or jacket co-processes.

The op rule-base is broken into separate files to allow deployment of subsets of the rule-base to different hosts. Host-specific access may be granted by selective deployment of rule, or by the netgroup(5) facility.

Customer access may be granted by login name, uid, group membership, lack of group membership or any rule coded in a helmet. Resources are accessed by name, ownership, group ownership or any other stat(2) attribute, as well as by existence (or nonexistence).

In other cases op may be installed setuid (setgid) to manage escalations to a single login (group). This may allow group projects to manage their own build-spaces and test-harnesses with advise, but no intervention, from the administrator.

These features together offer a complete privilege escalation structure without forcing the administrator to give away arbitrary superuser access. And a built-in sanity checker assures that the rule-base is not completely insane before any Customer complains.

$Id: op.html,v 2.85 2012/10/06 19:39:56 ksb Exp $