Next: 4. The Director Up: CFHT Director/Agent Model Revision Previous: 2. Users' Guide Contents

Subsections

3. The Agent (Programmers' Guide)

The following section is intended for anyone writing an agent, or wanting to adapt a command driven program to run under Director.

3.1 Functions for Agents in `libcli`

The Director package includes a C library which makes writing Agents easier. This library can manage a command table inside the Agent, handle help menus, and display the right prompt strings for Director. As such, it implements most of the protocol between the Agent and Director.

Programs which use libcli as part of Pegasus can include the header file like this:


   #include "cli/cli.h"

If you are not building with Make.Common, create a symbolic link in your program's source directory called cli which points to the libcli source code and the above should work too. Linking with the library is accomplished by adding -lcli to the $(EXECNAME) line of a Make.Common Makefile:


   $(EXECNAME): $(OBJS) -lcli

The following sections describe the functions provided to Agent programs by libcli.

3.1.1 Macros and Type Definitions

If not already provided by libcfht, cli.h will define:

PASSFAIL

should be used as the return type for all command functions (this is a type, which can either be PASS or FAIL.)

BOOLEAN

a type which can be either TRUE (1) or FALSE (0).

TEMPFAILURERETRY

a macro found in the GNU C library which can be placed around all system calls to make sure they restart if interrupted by a signal.

RCSID

a macro which is useful to include once near the top of each .c file for version tracking purposes. For example:


   #include "cli/cli.h"

   RCSID("$Id$");

   . . .

The RCS system will insert a version number here which will find its way into the compiled binary. If RCS is not used, this macro does no harm.

Command

a structure containing a name, a function pointer, and a one line help string:


   typedef struct {
         char* name;
         Function* func;
         char* help;
   } Command;

This structure is from the GNU readline code. Agents should contain a table of these structures, where name is the command name the user must type to cause function to execute. The table will look like this:


   static Command comlist[] = {
      { "cd",   cli_cd,   "Change working directories" },
      { "exit", com_exit, "Exit the program" },
      { "help", cli_help, "Displays help for directest" },
      { "?",    cli_help, "Synonym for help" },
      { 0, 0, 0 } /* This terminates the command list */
   };

Almost all agents will want to implement the commands above, along with their application specific commands. As a convention, it is suggested that internally defined command functions begin with com_. Functions defined in libcli begin with cli_, and for certain basic command functions, the Agent may be able to call those directly, as with cli_cd and cli_help above.

The following sections describe the functions provided by libcli. See also the C header file for libcli, libcli/cli.h, for the complete list of functions in this library.

3.1.2 cli_init()


   PASSFAIL cli_init(const char* name, Command* comlist, int director);

`name' should be the name of your program (for example, argv[0] or the basename), but it is currently not used for anything in Director.

`comlist' is the command-function list, described in the previous section.

`director' should be set to 3 for this version of Director. The Agent should check for a FAIL from cli_init(), as this indicates that the required version of Director was not found.

The cli_init function must be the first thing inside main() of the Agent program, because it has to set up line buffering mode on standard output (using a setvbuf call) before any other code tries to print anything. Here's a typical example:


#include <stdlib.h>
#include "cli/cli.h"

. . .

int
main(int argc, const char* argv[])
{
   int i;

   if (cli_init(argv[0], comlist, 3)==FAIL) return EXIT_FAILURE;

   . . .
}

The library does not define a default minimum director version and leaves it entirely up to the agent, because some simpler agents do not use any features of Director 3. It is up to the programmer of the agent to indicate compatibility with Director 2 by calling cli_init(..., ..., 2). For most cases, requiring Director 3 is not a problem though.

The cli_init() function may be called multiple times by the agent in order to change the available set of commands. This is done by defining multiple comlist tables, each of them containing a ``mode'' function. The ``mode'' function ( com_mode()) must call cli_init() with the appropriate comlist, based on its parameters. For observing sessions, we have used this to implement ``mode observing'' and ``mode engineering'' commands to unmask/mask certain commands which are useful to engineering staff but dangerous if accidentally used by observers.

Functions below are only valid after cli_init() has been called at least once.

3.1.3 cli_help()


   PASSFAIL cli_help(const char* arg);

This is another function borrowed directly from GNU readline.

If arg is not NULL or empty, then the one-line help strings of all matching commands are printed.

With no arguments, the full command table with help strings is displayed.

3.1.4 cli_getline()


   char* cli_getline(void);

Displays a prompt string on stdout ("ok> " or "failed> ", depending on result of the last command) and reads a line from stdin.

Possible returns:

A pointer to a null terminated character string read from the user.
NULL and errno==0 indicates that the end of file was reached.
NULL and errno==EINTR indicates a non-blocked signal was trapped.

If a string is returned, it should be passed to cli_execute() (see below) and eventually free()'d.

3.1.5 cli_execute()


PASSFAIL cli_execute(const char* line);

Given a string obtained by cli_getline(), this does the work of separating the command from its arguments, looking up the command in your table and running the corresponding com_...() function from the table. The arguments found in line (if any exist, after the command name) are passed to the com_...() function, and the return of the com_...() function becomes the return of cli_execute().

3.1.6 cli_arg1()

Returns the same string in args, but possibly with "" or '' removed from the whole string. At the very minimum, this function strdup's the string for you, so you should always free the memory it returns when done.

3.1.7 cli_argv()


   char** cli_argv(int* argc, const char* args);

argc, if not NULL, will be used to store the number of arguments found in args.

args must be a string of characters or a zero length string (but cannot be NULL).

The function returns an vector of arguments with the last element set to NULL. All of the arguments are allocated in a single string and the returned array just contains pointers into that string. Even when there are no arguments, you should always free the memory allocated to argv itself, and also to argv[0] whenever there were arguments. Example:


PASSFAIL
com_something(const char* args)
{
   static char** cargv = NULL;
   static int cargc = 0;
   . . .

   if (cargv) { if (cargv[0]) free(cargv[0]); free(cargv); }
   cargv = cli_argv(&cargc, args);
   . . .
}

3.1.8 cli_bool()


   BOOLEAN cli_bool(const char* arg);

This call is derived from libcfht's cfht_tob() function. It takes a string and returns the string's boolean value. In addition to the usual TRUE/FALSE and YES/NO pairs, many other possibilities are handled.

The following strings return TRUE:


   "True"        "Yes"        "Up"        "Automatic"
   "Enable"      "In"         "1".."9"    "ON"
   "OPen"        "COlumn"

Case is not significant in the tests. In the above strings, uppercase is used to denote the characters used in determining if a string is TRUE. Any string whose first characters do not match any of the above is considered FALSE.

cli_bool() is usually called on an argument in the array returned by cli_argv() or the return of cli_arg1().

3.1.9 cli_strdup()


   char* cli_strdup(const char* orig);

See man page for strdup. VxWorks 5.2 doesn't have a strdup, so use this call instead for code that needs to be portable.

3.1.10 cli_system()


   PASSFAIL cli_system(const char* command, const char* args);

Exec's a command with /bin/sh and passes it `args' as arguments. NOTE: This is not very secure at the moment. A ';' in the args causes the shell to execute the rest of the line as a new command. The two strings `command' and `args' are just concatenated with a space in between, so you can leave either of them empty or NULL if you don't need both.

3.1.11 cli_getcwd()


   const char* cli_getcwd(void);

Returns a string with the current working directory.

3.1.12 cli_gethome()


   const char* cli_gethome(void);

Returns the current effective user's home directory

3.1.13 cli_cd()


   PASSFAIL cli_cd(const char* args);

Use this to implement your "cd" command to ensure synchronization with director's concept of the current working directory. Tilde expansion will already have been done on args when running with director.

3.1.14 cli_signal()


   void    cli_signal(int signo, PFV handler);
   void    cli_signal_block(int signo);
   void    cli_signal_unblock(int signo);
   BOOLEAN cli_signal_isblocked(int signo);
   BOOLEAN cli_signal_event(int signo);
   BOOLEAN cli_signal_peek(int signo);

These calls are identical to cfht_signal_...() when libcfht is not present. Refer to the on-line man page for cfht_signal() for more information on these functions.

3.2 Stand-alone Testing

An important requirement of the design is that the agent program MUST be able to function without the Director. Not only is this important for a fall-back mode, but it also makes for easy testing of components. When an agent is executed by hand on a terminal, and not inside Director, the output will not appear exactly as the user was meant to see it (no colors and statusbar messages scroll instead of going to the right place, etc.) Similarly the keyboard input may not be handled as gracefully (line editing functions and history will be lacking) but the program should still operate. Be sure to test this!

3.3 Restrictions on Agent

You MUST follow these requirements in order for your Agent program to function correctly when running under Director. To help comply with these points, a library has been assembled called libcli which is distributed with the Director source code. See the file libcli/cli.h for interface documentation. You do not have to use this library in order to be compliant.

Also included with the Director source code is a skeleton Agent program called directest.c. It demonstrates how to use most of the functions provided by libcli, including a method for installing a timeout while at the command prompt (lines of code relating to this are commented with /* LPS */). When this timeout expires, your agent could go off and do some ``Low Priority Stuff''. If you don't need this, it's better just to leave out all the LPS lines!

Runs on machine with most direct access to hardware that it controls.
If it needs it, must be able to access filesystem with the data (through NFS probably). This is handled as a completely separately issue.
Must be able to run on a "dumb" terminal (do not print any terminal-specific escape codes for bold, underline, etc.) Legal control characters are listed in the next section.
Must not change working directories unless user issues a `` cd'' command. It's best to use the cli_cd() call in libcli. (The current working directory issue may become important if Director is expanded to handle multiple agents at once.)
Must operate stdout in line-buffered mode. This is taken care of by calling libcli's cli_init() call, or steal code that calls setvbuf() from there. Unbuffered modes (typically used on stderr) will work also, but will be less efficient.
Must echo " " ("ok " or "failed " when it has status of a previous command to report) as prompt character and flush this to stdout before reading a new line from stdin. (Taken care of by just using libcli's cli_getline()/cli_execute() calls.) Avoid having regular output of the program echo these prompt strings because it will confuse Director into thinking that the command has completed. Note: If it is possible for your program to be interrupted by a signal while waiting for a command, and if in that situation a new prompt gets displayed, be sure to strip the ``ok'' or ``failed'' from the prompt when it is re-displayed. Again, libcli's cli_getline() takes care of this for you.
No subprompts. There cannot be any submenus, including (Y/N) confirmations or prompts asking for a filename or missing parameter. Director assumes that the agent is always waiting ready at the top level command prompt (or busy executing a command.) So, for example, if you had a command called download that required a filename argument, and someone just typed download, the following would NOT work:
```
   ok> download
Enter filename> _
```
Instead print a usage message and force the user to type the whole command again (or press the up-arrow and edit their old command)
```
   ok> download
   Syntax: download <filename.lod>
   failed> download full
   . . .
   ok>
```
No toggled arguments. Try to avoid commands that toggle a current setting, especially if it is desirable to later have that command used in a script or by a GUI. The script does not know the previous state and should just be able to say things like `` bit on'' and `` bit off''.

3.3.1 Note about Unix Signals

Director builtin commands like stop, break, and quit will send signals to the Agent (so it need not be at a command prompt to receive these commands). If your program has forked any child processes, they will receive the signal also, because Director sends the signal to the entire process group. It will often be undesirable to have child processes receiving these signals (for example, if the child has exec()'d another program, it will dump core on a SIGQUIT unless it has a special signal handler installed.) The recommended way to keep child processes from seeing these signals is to place the child in its own process group immediately after the fork(). The POSIX call setsid() will do this for you.

The cli_system() call in libcli uses this method to prevent commands like ``ls'' that your program may want to run from seeing these signals. See libcli/cli.c for an example.

3.3.2 Note about child processes

There may be other incompatibilities if you mix various ways of fork()'ing and exec()'ing processes with other methods like popen() or system(). These are not really related to Director, but can occur in any Unix application that mixes these types of calls. The best advice to avoid these kind of problems might be to see if there is a way to keep your Agent program single-threaded and as simple as possible.

3.3.3 Note about stderr and stdout

Director reads both stderr and stdout from the agent. If none of the special strings (see appendix) are found at the beginning of the line, then lines read from stdout will be displayed in the normal color, and lines read from stderr will be displayed and logged as warnings (in yellow). Generally you should print everything to stdout, but it is ok to use C library calls like perror() which automatically print on stderr.

3.3.4 Note about line buffered mode

Director reads from the agent in line buffered mode. This means nothing will be displayed until a newline is received, so it is important to remember to put $\backslash$ n's at the end of all your printf()'s. Even if you are printing on stderr, you must assume line buffered behavior.

3.4 Legal Control Characters


\t   Moves to the next tab stop on 8 column boundaries
\a   Beep. (Bell also rings when an error message is displayed.)
\r\n Treated as a "\n" (i.e. Director handles DOS newlines properly)
\r   Treated as a "\n" if after text, ignored if after empty line

All other control characters are changed to the `*' character by Director, to avoid interfering with the ncurses library, which has control of the screen (and therefore should be the only one sending escape sequences to the terminal.)

3.5 Requests versus Status

Since agents are persistent (like servers, in the client-and-server model used in CFHT's Pegasus), it is possible to keep state in internal variables. State variables describing things like the current exposure type or filename are straightforward. A variable is created in the program which is changed when the corresponding command is received. Other states involving things like positions of mechanical devices can be more tricky.

There are many cases where the requested state and the actual state (sensed from encoders, for example) are not exactly the same. Normal reasons for this could be that the device is still on its way to a newly requested position (this only happens when movements are allowed in parallel; see the next section) or the device has reached a slightly different value than the target, within an acceptable tolerance. Abnormal causes for a descrepancy could include things like hardware failures. In summary, if the answer to any of the following conditions apply:

Does the device or setting take time to move/change, and is there a separate command from the one which accepts the value which waits on motion to complete?
Does the device not always end up exactly at the value commanded (for example, because of encoder precision)?
Can there be a sensed value which could disagree with the requested value for any other reason (hardware failure? moved by other means?)

Then it is best to keep track of the last requested VAL_REQUEST and the sensed value VAL_SENSE separately. Additionally, there are cases where motion should not begin until another operation has completed. In this case, it may also help to keep track of whether VAL_REQUEST has been sent on the hardware or not, but copying it to VAL_SENT (or setting and clearing a flag). This is case of separated intent/go commands, described in more detail below.

3.6 Simple Move Commands

There are cases where a single straightforward command to move a device is a appropriate. In this case, there is no need to keep track of VAL_REQUEST, VAL_SENT, and VAL_SENSE separately. However, the following conditions should be met:

The device moves relatively quickly, or it is acceptable for the entire system (Director and all agents) to wait while the device moves.
The only chance for failure is during the move.
The only thing that can move the device is a command to the agent (or there is no way to sense the current position anyway.)

For an example, lets consider a mirror which can be moved in and out of the beam. A dialog with the agent would look like this for the case where the mirror was currently in the beam, and requested to go out of the beam:


(... mirror is currently in beam)
ok> mirror out
progress: Please wait ... moving mirror out of beam.
status: Mirror is out of the beam.
ok> _

The final OK signifies both that the requested position was valid, and that it has been reached. If a second request for ``mirror out'' followed this one, the agent should return ok (PASS) as quickly as possible.


ok> mirror out
logonly: Mirror is out of the beam.
ok> _

Note the choice of ``logonly:'' for redundant messages, versus ``status:'' when a true new position has been reached.

Error conditions which could occur for a simple move command include invalid syntax and hardware failure. The return of the command is FAIL for both, so the user will rely on error messages printed by the agent to distinguish. Here are a couple of example dialogs with the agent with an error condition:


ok> mirror otu
error: `otu' is not a valid mirror position.  Choose from `in' or `out'.
failed> _

Or if the hardware fails:


ok> mirror out
(... gears grinding ... timeout ... etc.)
error: Servo motion timeout/limit switch hit/etc.
failed> _

3.7 Intent/Go/Wait Move Commands

For many cases, the model above is not sufficient. One reason is a limitation/feature of Director: while one agent is processing a command, no other agent can be busy at the same time. (``Busy'' only means busy from Director's point of view. If motion is in progress, but the agent is displaying a prompt ready for a command, then Director will begin the queued next command.) A single agent which has control of hardware that can do multiple things in parallel also benefits from a different model. One way to split things up is to provide commands for three different types of events, which may occur at three different points in time:

A series of ``intent'' commands configures the user's setup for the upcoming observation or experiment. If no new intent command is issued, a previous value is assumed to be ``sticky'' and remains in effect until a new intent command changes it.
At a possibly later point in time, a single ``go'' command (per agent) is issued when it is safe to begin motion(s).
A final ``wait'' command (per agent) issued at the point in time when motions must be complete. (For example, before opening a detector shutter to take a picture.) The ``wait'' is the only command that is ``blocking''. ``Intent'' and ``go'' commands return immediately.

Figure 5 shows how a sequencing script (visible as a Director command to the user) can allow actions of various agents to proceed in progress. The example shows an observation taken with an instrument with a grism and filter, selection of telescope coordinates, and the integration and readout phases. A readout from a previous integration is still completing as the current integration is being configured.

**Figure 5:** Timing Diagram of Overlapping Go/Wait Commands
$\begin{figure} \begin{center} \epsfig {file=director-timing.eps}\end{center}\end{figure}$

The following sections describe in more detail what an agent must do when it receives each type of command, intent, go, and wait.

3.7.1 Intent Commands

There should typically be a separate intent command for each possible setting or axis of movement. Combining parameters into one command is acceptable when they are always given together, from the user's point of view. Coordinate pairs is a common example. If there is any case where a user might want to issue a value separately, don't require them to issue it as part of a command that changes other values.

The first thing the agent should do when it receives an intent command is compare the ``new'' value with the previous VAL_REQUEST. If they match, simply return PASS.
In case there is already a previous motion in progress (a go has been issued, but no corresponding wait) the agent must either wait on the previous move first, or adjust the new target value to the one it just received, depending on what the hardware allows. NOTE: Having an extra ``intent'' command between a ``go'' and ``wait'' is not the normal flow of events, but the condition should be detected and handled, since it may sometimes occur when sequences are aborted in the middle and restarted. If the ``wait'' command is written as described below, this entire step could be taken care of by simply calling ``wait'' inside the intent command.
Next, if the value exceeds hardware limits or contains syntax errors, this should cause immediate failure of the intent command, if possible. It is also acceptable to trap limit errors, and even syntax errors during the ``go'' or the ``wait'', but it can make debugging more difficult.
Finally, the intent command must store the new value in VAL_REQUEST and return PASS.

Note that ``intent'' commands only manipulate program variables and do not communicate with the device.

3.7.2 Go Command

After a series of ``intent'' commands, an agent should expect a single ``go'' which puts all of them into effect. Generally, the ``go'' command compares each last known VAL_REQUEST with the last known VAL_SENT. For any that do not match, it does the following:

Transmit VAL_REQUEST to the hardware.
Enable servo motion, if applicable.
If the above two steps are successful, copy VAL_REQUEST into VAL_SENT, otherwise return FAIL.

If the agent sees two ``go'' commands in a row, the second one has no effect, because all the VAL_REQUESTs == VAL_SENTs.

3.7.3 Wait Command

The purpose of the ``wait'' command is to ensure that the user's requested values ( VAL_REQUEST) match those of the actual hardware, VAL_SENSE.

Inside ``wait'', make a call to ``go''. Even if a ``go'' had already been issued by a script, if it is implemented as described above, it only has to go through its table of items to quickly determine that all VAL_SENT are already equal to VAL_REQUEST.
Next, the ``wait'' should do whatever is needed to ensure that the table of VAL_SENSE values are up to date. In the case of a callback system, there is nothing to do here. In a polling model, a request will need to be made to the hardware to read current status and store it in VAL_SENSE.
For each device in the agent's list, do the following:
If device is still busy, set a ``'' and continue through the list.
If device is not busy, check if VAL_SENSE and VAL_REQUEST are equivalent (or within tolerance.) If it is within tolerance, check if VAL_SENSE is a new value by comparing it to the last value reported to the user ( VAL_REPORTED). If they differ (by a tolerance also, if you don't want to report every slight drift in position), set VAL_REPORTED to VAL_SENSE and print a status: message something like this:
```
status: Device has reached VAL_SENSE
```
If there is a descrepancy, print an error message and possibly suggest a new intent command which could get around the condition:
```
error: Device requested to go to VAL_REQUEST is reporting VAL_SENSE.
error: Use the command ``device FAKE'' to ignore this problem.
```
And then return FAIL.
Next if is not set, it means all device have reached there requested positions. Wait should now return PASS.
If is set, but if wait is called as ``wait -poll'', a FAIL should now be returned. This gives a non-blocking way to test if things are stable. An example of how this is useful is given in the main command loop of the example agent below.
A normal ``wait'' should now delay (wait for an event or sleep for some polling interval) and then repeat step 2.)

3.8 Relative Moves

Scripts which call ``go'' and ``wait'' multiple times can cause potential problems for relative offsets. To avoid this, always convert relative to absolute in the intent step inside the Agent. Script should be careful not to re-send relative offsets. If agent converts it right away, it doesn't need to be concerned with this.

3.9 CLI Style Guide (under construction)

3.9.1 Log Message Types

The use of error/warning/logonly/debug/status messages should follow the guidelines set forth in the cfht_log man page. Error messages are used slightly differently, however. In handlers they are typically used when the handler is about to give up and exit. Instead error messages should be used as a final message that an individual command failed, in the case of a cli. A typical failure sequence would then have a bunch of warnings (and logonlys) followed by a single error message.

debug:: Not usually generated except in ``debug on'' mode. Useful to track problems with the software itself.
logonly:: Not usually displayed except in ``verbose on'' mode. Mostly useful in tracking down problems with the instrument or elsewhere.
warning:: Displayed in yellow. Something is not 100% normal and may cause bad results down the line. (But request itself hasn't failed yet.)
error:: Displayed in red (and terminal beeps/de-iconifies). Top level command request has failed. Read warnings above for details.
status:: Displayed in green. Usually a confirmation that a command succeeded.

For a complete list of message types, see the appendix or type `director -hs'.

3.9.2 String formatting

When echoing strings or values of parameters back to the user, opposing single quotes are recommended. For example:


   error: The file `foobar' does not exist.

Next: 4. The Director Up: CFHT Director/Agent Model Revision Previous: 2. Users' Guide Contents

Sidik Isani
2001-10-18

3. The Agent (Programmers' Guide)

3.1 Functions for Agents in libcli

3.7 Intent/Go/Wait Move Commands

3.1 Functions for Agents in `libcli`