Build intelligent, unattended scripts Martin Brown
Published on July 03, 2007 Share this page

Facebook Twitter Linked In Google+ E-mail this page Comments 0 Content series: This content is part of the series: System Administration Toolkit

About this series

The typical UNIX� administrator has a key range of utilities, tricks, and systems he or she uses regularly to aid in the process of administration. There are key utilities, command-line chains, and scripts that are used to simplify different processes. Some of these tools come with the operating system, but a majority of the tricks come through years of experience and a desire to ease the system administrator's life. The focus of this series is on getting the most from the available tools across a range of different UNIX environments, including methods of simplifying administration in a heterogeneous environment.

The unattended script problem

There are many issues around executing unattended scripts-that is, scripts that you run either automatically through a service like cron or at commands.

The default mode of cron and at commands, for example, is for the output of the script to be captured and then emailed to the user that ran the script. You don't always want the user to get the email that cron sends by default (especially if everything ran fine)-sometimes the user who ran the script and the person actually responsible for monitoring that output are different.

Therefore, you need better methods for trapping and identifying errors within the script, better methods for communicating problems, and optional successes to the appropriate person.

Getting the scripts set up correctly is vital; you need to ensure that the script is configured in such a way that it's easy to maintain and that the script runs effectively. You also need to be able to trap errors and output from programs and ensure the security and validity of the environment in which the script executes. Read along to find out how to do all of this.

Setting up the environment

Before getting into the uses of unattended scripts, you need to make sure that you have set up your environment properly. There are various elements that need to be explicitly configured as part of your script, and taking the time to do this not only ensures that your script runs properly, but it also makes the script easier to maintain.

Some things you might need to think about include:

  • Search path for applications
  • Search path for libraries
  • Directory locations
  • Creating directories or paths
  • Common files

Some of these elements are straightforward enough to organize. For example, you can set the path using the following in most Bourne-compatible shells (sh, Bash, ksh, and zsh):

1 PATH=/usr/bin:/bin:/usr/sbin

For directory and file locations, just set a variable at the header of the script. You can then use the variable in each place where you would have used the filename. For example, when writing to a log file, you might use Listing 1 .

Listing 1. Writing a log file
1 2 3 4 LOGFILE=/tmp/output.log do_something >>$LOGFILE do_another >>$LOGFILE

By setting the name once and then using the variable, you ensure that you don't get the filename wrong, and if you need to change the filename name, you only need to change the name once.

Using a single filename and variable also makes it very easy to create a complex filename. For example, adding a date to your log filename is made easier by using the date command with a format specification:

1 DATE='date +%Y%m%d.%H%M'

The above command creates a string containing the date in the format YYYYMMDD.HHMM, for example, 20070524.2359. You can insert that date variable into a filename so that your log file is tagged according to the date it was created.

If you are not using a date/time unique identifier in the log filename, it's a good idea to insert some other unique identifier in case two scripts are run simultaneously. If your script is writing to the same file from two different processes, you will end up either with corrupted information or missing information.

All shells support a unique shell ID, based on the shell process ID, and are accessible through the special $$ variable name. By using a global log variable, you can easily create a unique file to be used for logging:

1 LOGFILE=/tmp/$$.err

You can also apply the same global variable principles to directories:

1 LOGDIR=/var/log/my_app

To ensure that the directories are created, use the -p option for mkdir to create the entire path of the directory you want to use:

1 mkdir -p $LOGDIR

Fortunately, this format won't complain if the directories already exist, which makes it ideal for running in an unattended script.

Finally, it is generally a good idea to use full path names rather than localized paths in your unattended scripts so that you can use the previous principles together.

Listing 2. Using full path names in unattended scripts
1 2 3 4 DATE='date +%Y%m%d.%H%M' LOGDIR=/usr/local/mcslp/logs/rsynclog mkdir -p $LOGDIR LOGNAME=$LOGDIR/$DATE.log

Now that you've set up the environment, let's look at how you can use these principles to help with the general, unattended scripts.

Writing a log file

Probably the simplest improvement you can make to your scripts is to write the output from your script to a log file. You might not think this is necessary, but the default operation of cron is to save the output from the script or command that was executed, and then email it to the user who owned the crontab or at job.

This is less than perfect for a number of reasons. First of all, the configured user that might be running the script might not be the same as the real person that needs to handle the output. You might be running the script as root, even though the output of the script or command when run needs to go to somebody else. Setting up a general filter or redirection won't work if you want to send the output of different commands to different users.

The second reason is a more fundamental one. Unless something goes wrong, it's not necessary to receive the output from a script . The cron daemon sends you the output from stdout and stderr, which means that you get a copy of the output, even if the script executed successfully.

The final reason is about the management and organization of the information and output generated. Email is not always an efficient way of recording and tracking the output from the scripts that are run automatically. Maybe you just want to keep an archive of the log file that was a success or email a copy of the error log in the event of a problem.

Writing out to a log file can be handled in a number of different ways. The most straightforward way is to redirect output to a file for each command (see Listing 3 ).

Listing 3. Redirecting output to a file
1 2 cd /shared rsync --delete --recursive . /backups/shared >$LOGFILE

If you want to combine error and standard output into a single file, use numbered redirection (see Listing 4 ).

Listing 4. Combining error and standard output into a single file
1 2 cd /shared rsync --delete --recursive . /backups/shared >$LOGFILE 2>&1

Listing 4 writes out the information to the same log file.

You might also want to write out the information to separate files (see Listing 5 ).

Listing 5. Writing out information to separate files
1 2 cd /shared rsync --delete --recursive . /backups/shared >$LOGFILE 2>$ERRFILE

For multiple commands, the redirections can get complex and repetitive. You must ensure, for example, that you are appending, not overwriting, information to the log file (see Listing 6 ).

Listing 6. Appending information to the log file
1 2 cd /etc rsync --delete --recursive . /backups/etc >>$LOGFILE >>$ERRFILE

A simpler solution, if your shell supports it, is to use an inline block for a group of commands, and then to redirect the output from the block as a whole. The result is that you can rewrite the lines in Listing 7 using the structure in Listing 8 .

Listing 7. Logging in long form
1 2 3 4 5 cd /shared rsync --delete --recursive . /backups/shared >$LOGFILE 2>$ERRFILE cd /etc rsync --delete --recursive . /backups/etc >>$LOGFILE 2>>$ERRFILE

Listing 8 shows an inline block for grouping commands.

Listing 8. Logging using a block
1 2 3 4 5 6 7 8 { cd /shared rsync --delete --recursive . /backups/shared cd /etc rsync --delete --recursive . /backups/etc } >$LOGFILE 2>$ERRFILE

The enclosing braces imply a subshell so that all the commands in the block are executed as if part of a separate process (although no secondary shell is created, the enclosing block is just treated as a different logical environment). Using the subshell, you can collectively redirect their standard and error output for the entire block instead of for each individual command.

Trapping errors and reporting them

One of the main advantages of the subshell is that you can place a wrapper around the main content of the script, redirect the errors, and then send a formatted email with the status of the script execution.

For example, Listing 9 shows a more complete script that sets up the environment, executes the actual commands and bulk of the process, traps the output, and then sends an email with the output and error information.

Listing 9. Using a subshell for emailing a more useful log
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 LOGFILE=/tmp/$$.log ERRFILE=/tmp/$$.err ERRORFMT=/tmp/$$.fmt { set -e cd /shared rsync --delete --recursive . /backups/shared cd /etc rsync --delete --recursive . /backups/etc } >$LOGFILE 2>$ERRFILE { echo "Reported output" echo cat /tmp/$$.log echo "Error output" echo cat /tmp/$$.err } >$ERRORFMT 2>&1 mailx -s 'Log output for backup' root <$ERRORFMT rm -f $LOGFILE $ERRFILE $ERRORFMT

If you use the subshell trick and your shell supports shell options (Bash, ksh, and zsh), then you might want to optionally set some shell options to ensure that the block is terminated correctly on an error. For example, the -e (errexit) option within Bash ensures that the shell terminates when a simple command (for example, any external command called through the script) causes immediate termination of the shell.

In Listing 9 , for example, if the first rsync failed, then the subshell would just continue and run the next command. However, there are times when you want to stop the moment a command fails because continuing could be more damaging. By setting errexit, the subshell immediately terminates when the first command stops.

Setting options and ensuring security

Another issue with automated scripts is ensuring the security of the script and, in particular, ensuring that script does not fail because of bad configuration. You can use shell options for this process.

Other options you might want to set in a shell-independent manner (and the richer the shell, the better, as a rule, at trapping these instances). In the Bash shell, for example, -u ensures that any unset variables are treated as an error. This can be useful to ensure that an unattended script does not try to execute when a required variable has not been configured correctly.

The -C option (noclobber) ensures that files are not overwritten if they already exist, and it can prevent the script from overwriting files it shouldn't have access too (for example, the system files), unless the script has the correct commands to delete the original file first.

Each of these options can be set using the set command (see Listing 10 ).

Listing 10. Using the set command to set options
1 2 set -e set -C

You can use a plus sign before the option to disable it.

Another area where you might want to improve the security and environment of your script is to use resource limits. Resource limits can be set by the ulimit command, which is generally specific to the shell, and enable you to limit the size of files, cores, memory use, and even the duration of the script to ensure that the script does not run away with itself.

For example, you can set CPU time in seconds using the following command:

1 ulimit -t 600

Although ulimit does not offer complete protection, it helps in those scripts where the potential for the script to run away with itself, or a program to suddenly use a large amount of memory, might become a problem.

Capturing faults

You have already seen how to trap errors, output, and create logs that can be emailed to the appropriate person when they occur, but what if you want to be more specific about the errors and responses?

Two tools are useful here. The first is the return status from a command, and the second is the trap command within your shell.

The return status from a command can be used to identify whether a particular command ran correctly, or whether it generated some sort of error. The exact meaning for a specific return status code is unique to a particular command (check the man pages), but a generally accepted principle is that an error code of zero means that the command executed correctly.

For example, imagine that you want to trap an error when trying to create a directory. You can check the $? variable after mkdir and then email the output, as shown in Listing 11 .

Listing 11. Trapping return status
1 2 3 4 5 6 7 8 ERRLOG=/tmp/$$.err mkdir /tmp 2>>$ERRLOG if [ $? -ne 0 ] then mailx -s "Script failed when making directory" admin <$ERRLOG exit 1 fi

Incidentally, you can use the return status code information inline by chaining commands with the && or || symbols to act as an and , or , or type statement. For example, say you want to ensure that the directory gets created and the command gets executed but, if the directory is not created, the command does not get executed. You could do that using an if statement (see Listing 12 ).

Listing 12. Ensuring that a directory is created before executing a command
1 2 3 4 5 mkdir /tmp/out if [ $? -eq 0 ] then do_something fi

You can modify Listing 12 into a single line:

1 mkdir /tmp/out && do_something

The above statement basically reads, "Make a directory and, if it completes successfully, also run the command." In essence, only do the second command if the first completes correctly.

The || symbol works in the opposite way; if the first command does not complete successfully, then execute the second. This can be useful for trapping situations where a command would raise an error, but instead provides an alternative solution. For example, when changing to a directory, you might use the line:

1 cd /tmp/out || mkdir /tmp/out

This line of code tries to change the directory and, if it fails, (probably because the directory does not exist), you make it. Furthermore, you can combine these statements together. In the previous example, of course, what you want to do is change to the directory, or create it and then change to that directory if it doesn't already exist. You can write that in one line as:

1 cd /tmp/out || mkdir /tmp/out && cd /tmp/out

The trap command is a more generalized solution for trapping more serious errors based on the signals raised when a command fails, such as core dump, memory error, or when a command has been forcibly terminated by a kill command.

To use trap, you specify the command or function to be executed when the signal is trapped, and the signal number or numbers that you want to trap, as shown here in Listing 13 .

Listing 13. Trapping signals
1 2 3 4 5 6 7 8 function catch_trap { echo "killed" mailx -s "Signal trapped" admin } trap catch_trap 1 2 3 4 5 6 7 8 9 10 11 sleep 9000

You can trap any signal in this way and it can be a good way of ensuring that a program that crashes out is caught and trapped effectively and reported.

Identifying reportable errors

Throughout this article, you've looked at ways of trapping errors, saving the output, and recording issues so that they can be dealt with and reported. However, what if the script or commands that you are using naturally output error information that you want to be able to use and report on but that you don't always want to know about?

There is no easy solution to this problem, but you can use a combination of the techniques shown in this article to log errors and information, read or filter the information, and mail and report or display it accordingly.

A simple way to do this is to choose which parts of the command that you output and report to the logs. Alternatively, you can post-process the logs to select or filter out the output that you need.

For example, say you have a script that builds a document in the background using the Formatting Objects Processor (FOP) system from Apache to generate a PDF version of the document. Unfortunately in the process, a number of errors are generated about hyphenation. These are errors that you know about, but they don't affect the output quality. In the script that generates the file, just filter out these lines from the error log:

1 sed -e '/hyphenation/d' <error.log >mailerror.log

If there were no other errors, the mailerror.log file will be empty, and email is sent with the error information.

Summary

In this article, you've looked at how to run commands in an unattended script, captured their output, and monitored the execution of different commands in the script. You can log the information in many ways, for example, on a command-by-command or global basis, and check and report on the progress.

For error trapping, you can monitor output and result codes, and you can even set up global traps that identify problems and trap them during execution for reporting purposes. The result is a range of options that handle and report problems for scripts that are running on their own and where their ability to recover from errors and problems is critical.