STATES(1) manual page
Table of Contents
states - awk alike text processing tool
states
[-hvV] [-D var=val] [-f file] [-o outputfile] [-p path] [-s startstate] [-W level]
[filename ...]
States is an awk-alike text processing tool with
some state machine extensions. It is designed for program source code highlighting
and to similar tasks where state information helps input processing.
At
a single point of time, States is in one state, each quite similar to awk’s
work environment, they have regular expressions which are matched from
the input and actions which are executed when a match is found. From the
action blocks, states can perform state transitions; it can move to another
state from which the processing is continued. State transitions are recorded
so states can return to the calling state once the current state has finished.
The biggest difference between states and awk, besides state machine extensions,
is that states is not line-oriented. It matches regular expression tokens
from the input and once a match is processed, it continues processing from
the current position, not from the beginning of the next input line.
- -D
var=val, --define=var=val
- Define variable var to have string value val. Command
line definitions overwrite variable definitions found from the config file.
- -f file, --file=file
- Read state definitions from file file. As a default,
states tries to read state definitions from file states.st in the current
working directory.
- -h, --help
- Print short help message and exit.
- -o file, --output=file
- Save output to file file instead of printing it to stdout.
- -p path, --path=path
- Set the load path to path. The load path defaults to the directory, from
which the state definitions file is loaded.
- -s state, --state=state
- Start execution
from state state. This definition overwrites start state resolved from
the start block.
- -v, --verbose
- Increase the program verbosity.
- -V, --version
- Print
states version and exit.
- -W level, --warning=level
- Set the warning level to
level. Possible values for level are:
- light
- light warnings (default)
- all
all warnings
States program files can contain on
start block, startrules and namerules blocks to specify the initial state,
state definitions and expressions.
The start block is the main() of the
states program, it is executed on script startup for each input file and
it can perform any initialization the script needs. It normally also calls
the check_startrules() and check_namerules() primitives which resolve the
initial state from the input file name or the data found from the beginning
of the input file. Here is a sample start block which initializes two variables
and does the standard start state resolving:
start
{
a = 1;
msg = "Hello, world!";
check_startrules ();
check_namerules ();
}
Once the start block is processed, the input processing is continued from
the initial state.
The initial state is resolved by the information found
from the startrules and namerules blocks. Both blocks contain regular expression
- symbol pairs, when the regular expression is matched from the name of
from the beginning of the input file, the initial state is named by the
corresponding symbol. For example, the following start and name rules can
distinguish C and Fortran files:
namerules
{
/\.(c|h)$/ c;
/\.[fF]$/ fortran;
}
startrules
{
/-\*- [cC] -\*-/ c;
/-\*- fortran -\*-/ fortran;
}
If these rules are used with the previously shown start block, states first
check the beginning of input file. If it has string -*- c -*-, the file is
assumed to contain C code and the processing is started from state called
c. If the beginning of the input file has string -*- fortran -*-, the initial
state is fortran. If none of the start rules matched, the name of the input
file is matched with the namerules. If the name ends to suffix c or C,
we go to state c. If the suffix is f or F, the initial state is fortran.
If both start and name rules failed to resolve the start state, states
just copies its input to output unmodified.
The start state can also be
specified from the command line with option -s, --state.
State definitions
have the following syntax:
state { expr {statements} ... }
where expr is:
a regular expression, special expression or symbol and statements is a
list of statements. When the expression expr is matched from the input,
the statement block is executed. The statement block can call states’ primitives,
user-defined subroutines, call other states, etc. Once the block is executed,
the input processing is continued from the current intput position (which
might have been changed if the statement block called other states).
Special
expressions BEGIN and END can be used in the place of expr. Expression
BEGIN matches the beginning of the state, its block is called when the
state is entered. Expression END matches the end of the state, its block
is executed when states leaves the state.
If expr is a symbol, its value
is looked up from the global environment and if it is a regular expression,
it is matched to the input, otherwise that rule is ignored.
The states
program file can also have top-level expressions, they are evaluated after
the program file is parsed but before any input files are processed or
the start block is evaluated.
- call (symbol)
- Move to
state symbol and continue input file processing from that state. Function
returns whatever the symbol state’s terminating return statement returned.
- calln (name)
- Like call but the argument name is evaluated and its value
must be string. For example, this function can be used to call a state
which name is stored to a variable.
- check_namerules ()
- Try to resolve start
state from namerules rules. Function returns 1 if start state was resolved
or 0 otherwise.
- check_startrules ()
- Try to resolve start state from startrules
rules. Function returns 1 if start state was resolved or 0 otherwise.
- concat
(str, ...)
- Concanate argument strings and return result as a new string.
- float
(any)
- Convert argument to a floating point number.
- getenv (str)
- Get value
of environment variable str. Returns an empty string if variable var is
undefined.
- int (any)
- Convert argument to an integer number.
- length (item,
...)
- Count the length of argument strings or lists.
- list (any, ...)
- Create a new
list which contains items any, ...
- panic (any, ...)
- Report a non-recoverable error
and exit with status 1. Function never returns.
- print (any, ...)
- Convert arguments
to strings and print them to the output.
- range (source, start, end)
- Return
a sub-range of source starting from position start (inclusively) to end
(exclusively). Argument source can be string or list.
- regexp (string)
- Convert
string string to a new regular expression.
- regexp_syntax (char, syntax)
- Modify regular expression character syntaxes by assigning new syntax syntax
for character char. Possible values for syntax are:
- ’w’
- character is a word
constituent
- ’ ’
character isn’t a word constituent
- regmatch (string, regexp)
- Check if string string matches regular expression regexp. Functions returns
a boolean success status and sets sub-expression registers $n.
- regsub (string,
regexp, subst)
- Search regular expression regexp from string string and
replace the matching substring with string subst. Returns the resulting
string. The substitution string subst can contain $n references to the
n:th parenthesized sup-expression.
- regsuball (string, regexp, subst)
- Like
regsub but replace all matches of regular expression regexp from string
string with string subst.
- require_state (symbol)
- Check that the state symbol
is defined. If the required state is undefined, the function tries to autoload
it. If the loading fails, the program will terminate with an error message.
- split (regexp, string)
- Split string string to list considering matches
of regular rexpression regexp as item separator.
- sprintf (fmt, ...)
- Format
arguments according to fmt and return result as a string.
- strcmp (str1,
str2)
- Perform a case-sensitive comparision for strings str1 and str2. Function
returns a value that is:
- -1
- string str1 is less than str2
- strings are equal
string str1 is greater than str2
- string (any)
- Convert argument to string.
- strncmp (str1, str2, num)
- Perform a case-sensitive comparision for strings
str1 and str2 comparing at maximum num characters.
- substring (str, start,
end)
- Return a substring of string str starting from position start (inclusively)
to end (exclusively).
- $.
- current input line number
- $n
- the
n:th parenthesized regular expression sub-expression from the latest state
regular expression or from the regmatch primitive
- $‘
- everything before the
matched regular rexpression. This is usable when used with the regmatch
primitive; the contents of this variable is undefined when used in action
blocks to refer the data before the block’s regular expression.
- $B
- an alias
for $‘
- argv
- list of input file names
- filename
- name of the current input
file
- program
- name of the program (usually states)
- version
program version
string
/usr/share/enscript/hl/*.st enscript’s states definitions
awk(1)
, enscript(1)
Markku Rossi <mtr@iki.fi> <http://www.iki.fi/~mtr/
>
GNU Enscript WWW home page: <http://www.iki.fi/~mtr/genscript/
>
Table of Contents