Fork me on GitHub

Error Handling in Shell Scripts

Once thing new shell programmers struggle with is error handling. The shell does not use exceptions like the programming languages most developers are familiar with; basically the shell mostly pre-dates the notion of exceptions in higher-level programming languages.

Developers often conceptualise shell commands to be something similar to procedures: You call it with some parameters, and it does something. This overlooks an important aspect: Commands have an exit status which indicate whether the command was successful or not.

By convention, an exit status of zero indicates success, and a non-zero exit status indicates a failure. Commands may use different non-zero values to indicate different sorts of failures (the documentation for the individual command will usually describe this).

Imagine this simple script:

1
2
3
4
5
6
7
8
#!/bin/sh

healthcheck --dilithium  > /tmp/dilithium-status
set-heading --relative --horizontal=180 --vertical=0
set-warp-factor 9
brew-beverage --temperature=95 tea
engage-engines
rm /tmp/dilithium-status

(yes: This will direct the Star Ship Enterprise to reverse course at maximum speed. Something you'd want to do in an emergency - and it is slow to do manually. So obviously Gene Roddenberry should have scripted it.)

As it stands: if healthcheck fails, the script will continue anyway !

Most of the time, this is not really what we want.

There are several ways of detecting and dealing with a failing command in a script.

Stopping on Errors

95% of the time, it is perfectly acceptable for the script to merely stop (with a non-zero exit status) if a command fails, and rely on the failing command to explain (to stderr) what went wrong. This can be done by modifying the shell behavior with set -e:

1
2
3
4
5
6
7
8
9
#!/bin/sh

set -e
healthcheck --dilithium  > /tmp/dilithium-status
set-heading --relative --horizontal=180 --vertical=0
set-warp-factor 9
brew-beverage --temperature=95 tea
engage-engines
rm /tmp/dilithium-status

Normally, the shell would simply execute commands in sequence. This behavior is subtly changed by set -e: It makes the shell exit immediately if a command (or pipeline) returns a non-zero exit status.

But... If you have a command that is allowed to fail, this gives you the opposite problem! The script will exit with a non-zero exit status!

There's an easy way out of this by turning the command into a list of commands - where the last part is guaranteed to succeed. This is usually done by adding || true to the command:

1
2
3
4
5
6
7
8
9
#!/bin/sh

set -e
healthcheck --dilithium  > /tmp/dilithium-status
set-heading --relative --horizontal=180 --vertical=0
set-warp-factor 9
brew-beverage --temperature=95 tea || true
engage-engines
rm /tmp/dilithium-status

Notes:

  • The || operator basically means "only execute the right-hand side if the left-hand side fails". This is different from ; which just means "execute the left-hand side and then execute the right-hand side" (regardless of exit status).

  • The built-in command true is a simple no-op which is guaranteed to give a exit status of zero (indicating success).

  • The exit status of a pipeline is the exit status of the right-most command (of those which were actually executed).

Catching Errors

Sometimes merely using set -e is insufficient: You may want to emit a warning to the user before continuing:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#!/bin/sh

set -e
healthcheck --dilithium  > /tmp/dilithium-status
set-heading --relative --horizontal=180 --vertical=0
set-warp-factor 9
if ! brew-beverage --temperature=95 tea ; then
  echo 1>&2 $0 Warning: Civilized travel is not possible. Sorry.
fi
engage-engines
rm /tmp/dilithium-status

A couple of points to note:

  • The ! operator reverses the exit status of the command. It is a logical equivalent of a boolean "not"

  • The warning is written to standard error courtesy of 1>&2 - novice users often forget that warnings and errors should go to standard error rather than standard output.

  • The warning message identifies the command emitting the warning by including $0 (the name of the script itself) in the message. This is useful for adding the right context to the warning and allows for easier debugging.

Error Cleanup

Sometimes it is necessary to do some cleanup in case of failures - which means that simply using set -e is insufficient.

Our example script will leave /tmp/dilithium-status behind - as a junk file in /tmp. We can avoid that.

Or perhaps even do some cleanup before bailing out completely:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#!/bin/sh

set -e
trap "rm /tmp/dilithium-status" EXIT
healthcheck --dilithium  > /tmp/dilithium-status
set-heading --relative --horizontal=180 --vertical=0
set-warp-factor 9
if ! brew-beverage --temperature=95 tea ; then
  echo 1>&2 Warning: Civilized travel is not possible. Sorry.
fi
engage-engines

This makes use of the shell "trap" feature: It directs the shell to execute commands when the script finishes - even if the script fails. Conceptually, this is a simplistic equivalent to Python's try ... finally construct (it does not support nesting).