Family: Dicer
Authors: KSB Braunsdorf
Mail: [email protected]
Version: 6.1
None known.

Introduction

This is the arc-function of printf(3): the input is a well formatted line of text, the expression describes a location in that text where the desired result can be found, the output is that string.

Configuration

#define DICER_START '['
Change the character the application uses to start a Dicer expression.
#define DICER_END ']'
You can force define DICER_END to a character constant to replace the right square bracket as the closing marker. I would let all dicer expression use square brackets if I had a vote.
#define MIXER_RECURSE '('
Change the default recursive expression token for the Mixer. Redefine in machine.h to override.
#define MIXER_END ')'
Change the default end character for the Mixer.
require "dicer-dicer.c" "dicer.h"
The mkcmd spell to include just the dicer directly from the explode repository.

Synopsis

#include "dicer.h"

Description

See xapply's description of the square bracket expander. Briefly we split a string on a character (like ":" or "/") then select a field from that list to either keep or remove. The result of that selection might be returned as is, or processed again.

For example the string "/home/662/ksb" is my home directory. If I want to pull out "662" I can write that as "[/3]" (spilt on "/" and take the third item (think "" / "home" / "662" / "ksb"). If I want to remove "/ksb" I can write "[/-$]" (split on "/" and remove the last item).

This is a lot more useful than it sounds to specify parts of an input record (like /etc/passwd lines or login@host pairs).

By adding the Mixer in we can reformat fields character by character. For example, when %p would expand to "8005551212" the expressions (%(p,1-3")"4-6"-"7-$) expands to "(800)555-1212".

Provides

extern char *Dicer(char *pcDest, unsigned *puMax, char *pcTemplate, char *pcData);
Since the output can never be longer than the input make sure that pcDest points to at least strlen(pcData)+1 characters. The pcTemplate parameter picks up after the leading bracket, and any work you do to selected the string to process. We return a character pointer to the character after the close bracket.

In no case should the Dicer write beyond Max characters, or the strlen of Dest if puMax is a NULL pointer. The value left in puMax is the length of the data copied into Dest (aka the strlen).

extern char *Slicer(char *pcDest, unsigned *puMax, char *pcTemplate, char **ppcList);
This is a common (although largely unsafe) usage for the Dicer. It splices together a string from a vector of substrings drawn from ppcList (%1 is the first, %2 the next, and so on to the first (char *)0). The template may contain %N, %{NN}, or %[NN...], where N is a single digit number, NN maybe a multiple digit number, and "..." represents a valid Dicer expression.

This is very much like printf. The destination buffer could be overflowed by a poorly chosen template and list combination. In most errors the code returns (char *)0, a successful call results in an empty string (""), other errors return the part of the template that remains to be applied.

The puMax parameter uses the same convention as Dicer's does, then the Mixer is applied if the dicer expression is surrounded by %( ... mixer).

extern char *Mixer(char *pcInplace, unsigned *puMax, char *pcExpr, int cExit);
This function implements the optional character selection feature. Not every program that supports access to the Dicer needs to allow access to the Mixer. The input string Inplace is rewritten to only include the characters selected by Expr. The selection is done in the style of the Dicer, integer positions numbered from 1 to the length of the string are specified by decimal numbers, dollar ('$') [the last character], or star (*) [the whole range]. A range (start-end) is allowed to select a substring. Any index is taken from the end of the end of the string if it is introduced with a tilde (~). This makes dollar a shorthand for ~1 (or, more verbosely (~1-~1)).

Ranges are separated with a comma (,), or a blank. Any leading (or extra) separators are silently ignored. Ranges may also be separated by literal strings, in either double ("...") or m4 (`...') quotes. The characters in the string are appended to the current result (as space allows). Thus %(1,1`-'$) expands to the first character of the first word followed by a dash (-) then the last character.

A Mixer expression can be positioned after a term to further process the selected value. For example in (17-$)(1,$-4,1-3`,'1) all three references to "1" in the second term select character 17 from the previous term, and the last comma is a literal.

The expression ends at the first unquoted occurrence of the Exit character. This allows the caller to change the outer expression boundary at run-time.

The output ends after *puMax characters, if that is not a NULL pointer, otherwise the length of the input string is assumed (including the end of string '\000').

Note that reversed ranges work to output the string from the right to the left. The expression $-1 reverses the Inplace string.

Normally the expression is bracketed in parenthesis ('(' and ')'), and recursive expressions are allowed. The suggested syntax is (dicer,mixer). This allows the dicer to select a large string, then the mixer limits that to the desired substring. For example %({10},$) is the last character in the tenth input word. The compositional form %(4,($-2)($-2)) removes the first and last character from the fourth parameter, through a bit of chicanery. (I would prefer %(4,2-~2).) Some expander's uses angle brackets in place of parenthesis, because parenthesis already had a special meaning.

The return value is the expression which remains after consumption up to and including the Exit character. A (char *)0 return value indicates a syntax error, range errors largely result in the empty string.

EXAMPLE

See the test driver embedded in the module (which requires gcc's -fwritable-strings to run), via:
explode -s dicer.h
explode -s dicer.c

Build it with:

mk dicer.c

Run with:

./dicer
No output is good.

Diagnostics

None.

See Also

strcpy(3), strcat(3), scanf(3)

To Do List

Document the apply form, if we decide to keep it.
After a dicer's separator we may select all the fields created with commercial at (@), then continue with a new separator and field selector. The result is the catenation of the resulting strings, separated by the original separator.

$Id: dicer.html,v 6.18 2012/03/29 20:41:49 ksb Exp $