Sunday, September 06, 2009

Shell Tricks

DBAs from time to time must write shell scripts. If your environment is strictly Windows based, this article may hold little interest for you.

Many DBAs however rely on shell scripting to manage databases. Even if you use OEM for many tasks, you likely use shell scripts to manage some aspects of DBA work.

Lately I have been writing a number of scripts to manage database statistics - gathering, deleting, and importing exporting both to and from statistics tables exp files.

Years ago I started using the shell builtin getopts to gather arguments from the command line. A typical use might look like the following:

while getopts d:u:s:T:t:n: arg
do
case $arg in
d) DATABASE=$OPTARG
echo DATABASE: $DATABASE;;
u) USERNAME=$OPTARG
echo USERNAME: $USERNAME;;
s) SCHEMA=$OPTARG
echo SCHEMA: $SCHEMA;;
T) TYPE=$OPTARG
echo TYPE: $TYPE;;
t) TABLE_NAME=$OPTARG;;
#echo TABLE_NAME: $TABLE_NAME
n) OWNER=$OPTARG
echo OWNER: $OWNER;;
*) echo "invalid argument specified"; usage;exit 1;
esac

done


In this example, the valid arguments are -d, -u, -s, -T, -t and -n. All of these arguments require a value.

The command line arguments might look like this:
somescript.sh -d orcl -u system -s scott

If an invalid argument such as -z is passed, the script will exit with the exit code set to 1.

For the script to work correctly, some checking of the arguments passed to the script must be done.

For this script, the rules are as follows:
  • -d and -u must always be set
  • -s must be set if -T is 'SCHEMA'
  • -t and -n must both have a value or be blank
  • -s must be used with -T
In this example, values for -T other than 'SCHEMA' are not being checked.

The usual method (at least for me) to test the validity of command line arguments has always been to use the test, or [] operator with combinations of arguments.

For the command line arguments just discussed, the tests might look like the following:

[ -z "$DATABASE" -o -z "$USERNAME" ] && {
echo Database or Username is blank
exit 2
}

# include schema name if necessary
[ "$TYPE" == 'SCHEMA' -a -z "$SCHEMA" ] && {
echo Please include schema name
exit 3
}

# both owner and tablename must have a value, or both be blank
[ \( -z "$TABLE_NAME" -a -n "$OWNER" \) -o \( -n "$TABLE_NAME" -a -z "$OWNER" \) ] && {
echo Please specify both owner and tablename
echo or leave both blank
exit 4
}

# if -s is set, so must -T
[ -n "$SCHEMA" -a -z "$TYPE" ] && {
echo Please include a type with -T
exit 5
}


As you can see, there are a fair number of tests involved to determine the validity of the command line arguments. You may have guessed why I skipped one for this demo - I just did not want to write any more tests.

Validating command line arguments really gets difficult with a larger number of possible arguments. Worse yet, any later modifications to the script that require a new command line argument become dreaded tasks that are put off as long as possible due the complexity of testing the validity of command line arguments.

While writing a script that had 11 possible arguments, I was dreading writing the command line argument validation section, I thought there must be a better way.

It seemed that there must be a simple method of using regular expressions to validate combinations of command line arguments. I had never seen this done, and after spending a fair bit of time googling the topic it became apparent that there was not any code available for a cut and paste solution, so it seemed a nice opportunity to be innovative.

After experimenting a bit, I found what I think is a better way.

The method I use is to concatenate all possible command line arguments into a ':' delimited string, and then use a set of pre-prepared regexes to determine whether or not the command line arguments are valid.

One immediately obvious drawback to this method is that arguments containing the ':' character cannot be used. However the delimiting character can easily be changed if needed.

Using the same example as previous, the command line arguments are all concatenated into a string and converted to upper case:

ALLARGS=":$USERNAME:$DATABASE:$OWNER:$TABLE_NAME:$SCHEMA:$TYPE:"
# upper case arges
ALLARGS=$(echo $ALLARGS | tr "[a-z]" "[A-Z]")


Next a series of regular expressions are created. The first two are generic, and may or may not be used as building blocks for other regular expressions. The others all correspond to a specific command line argument


# alphanumeric only, at least 1 character
export ALNUM1="[[:alnum:]]+"
# alphanumeric only, at least 3 characters
export ALNUM3="[[:alnum:]]{3,}"
# username - alphanumeric only at least 3 characters
export USER_RE=$ALNUM3
# database - alphanumeric only at least 3 characters
export DATABASE_RE=$ALNUM3
# owner - alphanumeric and _ and $ characters
export OWNER_RE='[[:alnum:]_$]+'
# table_name - alphanumeric and _, # and $ characters
export TABLE_RE='[[:alnum:]_#$]+'
# schema - alphanumeric and _ and $ characters
export SCHEMA_RE='[[:alnum:]_$]+'


These regular expressions could use further refinement (such as username must start with alpha only ) but are sufficient for this demonstration.

Next, the regular expressions are concatenated together into ':' delimited strings, with each possible command line argument represented either by its corresponding regex, or by null.

The regexes are stuffed into a bash array. For our example, it looks like this:
#   :   user        :  db           :  owner        :  table     : schema        : type
VALID_ARGS=(
":$USER_RE:$DATABASE_RE:$OWNER_RE:$TABLE_RE::(DICTIONARY_STATS|SYSTEM_STATS|FIXED_OBJECTS_STATS):" \
":$USER_RE:$DATABASE_RE::::(DICTIONARY_STATS|SYSTEM_STATS|FIXED_OBJECTS_STATS):" \
":$USER_RE:$DATABASE_RE:$OWNER_RE:$TABLE_RE:$SCHEMA_RE:(SCHEMA):" \
":$USER_RE:$DATABASE_RE:::$SCHEMA_RE:SCHEMA:")

Notice that there are four different combitations of command line arguments represented.

In all cases the USERNAME and DATABASE are required and must correspond to the regex provided.

In the first combination of arguments, the owner and table must also be specified, and type (-T) must be either one of DICTIONARY_STATS, SYSTEM_STATS or FIXED_OBJECTS_STATS.

In the second possible combination, the only argument allowed in addition to DATABASE and USERNAME is the type (-T) argument.

The third combination requires the OWNER, TABLE_NAME and SCHEMA argument to have a valid value, and the TYPE argument must be set to SCHEMA.

The final combination of arguments requires just the SCHEMA argument and the TYPE argument must be set to SCHEMA, in addition to the USERNAME and DATABASE arguments.

By now you likely want to know just how these regular expressions are tested. The following function is used to test the command line arguments against each regular expression:
function validate_args {
typeset arglist
arglist=$1

while shift
do
[ -z "$1" ] && break
if [ $(echo $arglist | grep -E $1 ) ]; then
return 0
fi

done
return 1

}


Here's how it is used in the script:

# VALID_ARGS must NOT be quoted or it will appear as a single arg in the function
validate_args $ALLARGS ${VALID_ARGS[*]}


While this method may appear somewhat confusing at first, it becomes less so after using it a few times. It greatly simplifies the use of many command line arguments that may appear in differing combinations.

As far as I know, this method only works properly with the bash shell. I have done testing on only two shells, bash and ksh. It does not work properly on ksh.

Here's a demonstration of the ksh problem. The following script is run from both ksh and bash:

function va {

echo ARG1: $1
}


R1="[[:alnum]]+"
R2="[[:alnum]]{3,}"

va $R1
va $R2
And here are the results:
18:9-jkstill-18 > ksh t3
ARG1: [[:alnum]]+
ARG1: [[:alnum]]3
[ /home/jkstill/bin ]

jkstill-18 > bash t3
ARG1: [[:alnum]]+
ARG1: [[:alnum]]{3,}
[ /home/jkstill/bin ]



Notice that when the script is run with ksh, the '{', '}' and ',' are removed from the regular expression. I could find no combination of quoting and escape characters that could prevent that from happening. This method of command line argument validation could be made to work using ksh if those characters are not used in the regexes. That would be rather limiting though.

One other drawback you may have noticed with this method of validating command line arguments is that when an error condition is encountered, the exit code is always 1. With the [] method it was easy to exit with different codes to indicate the nature of the error. Something similar could likely be done by embedding a code into each set of regexes, but I will leave that as an exercise for the reader.

The complete prototype script, as well as a test script can be downloaded:


The next article will include a set of functions used along with the validate_args() function to make shell scripts a bit more robust.

12 comments:

Surachart Opun said...

Good Shell Example.
Thank You for good idea for me.

Noons said...

Hi Jared.
Really weird, that difference between bash and ksh. In AIX it seems to run fine, or am I missing something obvious?

$cat t3
function va {
echo ARG1: $1
}
R1="[[:alnum]]+"
R2="[[:alnum]]{3,}"
va $R1
va $R2
$ksh t3
ARG1: [[:alnum]]+
ARG1: [[:alnum]]{3,}
$bash t3
ARG1: [[:alnum]]+
ARG1: [[:alnum]]{3,}
What am I doing wrong?

Jared said...

Good to hear from you Noons.

Maybe the difference is the ksh version?

I am using PD KSH v5.2.14 99/07/13.2 on Linux.

Could be that the version on AIX has had this particular anomaly corrected.

Try it out on a Linux box if you get a chance. I've tried it on RH ES 4 and Oracle Linux (all the same I know) with the same result.

Pete Finnigan said...

Very nice article Jared; I am also a user of getopt and shell code. At the beginning of the article you imply it not much use for Windows guys; well I use cygwin on Windows often which supports bash. I have also used bash outside of cygwin on Windows and also I have the unixutils download ready for use so I can use grep, awk etc on Windows.

cheers

Pete

Chris said...

Your problem with the "{" and "}" characters seems to be brace expansion (just search for the term, e.g. here).

I didn't try it, but the troubles should go away if you enclose the regexp patterns in single quotes, i.e.

R2='[[:alnum]]{3,}'

instead of

R2="[[:alnum]]{3,}"

Jared said...

Thanks for the comment Chris, I hadn't previously located the reason for the braces being stripped.

Using single quotes did not help. I had already tried this.

Chris said...

Hi again. After playing the wise guy, I decided to try out if your pdksh problem goes away if the regexp is enclosed in single quotes. Well, it doesn't... However, your little code fragment behaves as expected after

set -o posix

Martin Berger said...

Jared,
thank you!
the right post at the right time (for me).
It will cost me a day, but save many! ;-)
Martin

Jared said...

Pete, I'm glad to hear someone uses cygwin and MS Unix Services for production work.

I probably should have mentioned them, thanks for drawing attention to them.

Noons said...

Jared: make that two confirmed cygwin users, myself and Pete!
;)
Couldn't live without it, Windows is not that fun to work with...

Martin Berger said...

add me also.
Cygwin is one of the first things on every Windows-Box I get in touch. Even those I have to setup for family and friends.
And of course bring it into %PATH% ;-)
Martin

Joel Garry said...

hp-ux 11.23 ksh also seems to run fine.

Don't use cygwin because whenever I work on windows, I have to not upset windows geeks. So I just move whatever I can to unix, and do anything else native. As little as possible.

word: unkyper