Thursday, November 12, 2009

Data Modeling


Most readers of the blog are probably DBA's, or do DBA work along with development or other duties.

Though my title is DBA, Data Modeling is something I really like to do.

When first learning Oracle, I cut my teeth on data modeling, and used CASE 5.1 on unix to model a database system. True, CASE 5.0 used an Oracle Forms 3.x based interface, and the GUI modeling was unix only.

That was alright with me, as the Form interface allowed manual changes to be made quite quickly.

And the graphic modeling tool was fairly decent, even on a PC running Hummingbird X Server.

When Designer 2000 came out, it was clearly a more capable tool. Not only did it do everything that CASE 5.1 could do, it could do more. I won't make any silly claim that I was ever able to fully exploit D2K, as it was an end-to-end tool that could do much more than model data and databases.

What it could do with just the databases however was quite good.  Data models could be created, and then a physical database could be generated from the model.

Changes in the database model could be reverse engineered back to the model, and changes in the model could be forward engineered in to the physical model. D2K could truly separate logical and physical models, and allow changes to be migrated back and forth between the two.

There are other high end tools such as Erwin which can no doubt accomplish the same thing, but I have not used them.  

One important differentiation for me between D2K and other tools was that D2K worked with Barker Notation, which is the notation I first learned, and the one I still prefer.  

I should not speak of Designer 2000 in past tense I guess, as it is still available from Oracle as part of the Oracle Development Suite, but is now called Oracle Designer.  It just hasn't received much attention in the past few years, as I think many people have come to think of data modeling as too much overhead.  

I've tried several low end tools in the past few years, and while some claim to separate logical and physical models, those that I have tried actually do a rather poor job of it.

All this leads to some new (at least, new to me) developments from of all places, Microsoft.

Maybe you have heard of Oslo, Microsoft's Data Modeling toolset that has been in development for the past couple years.

If you're just now hearing about it, you will likely be hearing much more. The bit I have read has made me think this will be a very impressive tool.

If you have done data modeling, you have likely used traditional tools that allow you to define entities, drop them on a graphical model, and define relationships.

The tool you used may even have allowed you to create domains that could be used to provide data consistency among the entities.

Oslo is different.  

Oslo incorporates a data definition language M. The definitions can be translated to T-SQL, which in turn can be used to create the physical aspects of the model.  M also allows easy creation of strongly typed data types which are carried over into the model.

Whether Oslo will allow round trip engineering ala D2K, I don't yet know.

I do however think this is a very innovative approach to modeling data. 

Here are a few Oslo related links to peruse :


You may be thinking that I have given SQL Developer Data Modeler short shrift.

Along with a lot of other folks, I eagerly anticipated the arrival of SQL Developer Data Modeler.

And along with many others, was disappointed to learn that this add on to SQL Developer would set us back a cool $3000 US per seat.  That seems a pretty steep price for tool that is nowhere near as capable as Oracle Designer, which is included as part of the Oracle Internet Developer Suite. True the price is nearly double that of SQL Modeler at $5800, but you get quite a bit more than just Designer with the Suite.

As for the cost of Oslo, it's probably too early to tell.

Some reading suggests that it will be included as part of SQL Server 2008, but it's probably too soon to tell.

Why all the talk about a SQL Server specific tool?

Because data modeling has been in a rut for quite some time, and Microsoft seems to have broken out of that rut.  It's time for Oracle to take notice and provide better tools for modeling, rather than upholding the status quo.







Monday, November 09, 2009

MetaLink, we barely knew ye

But, we wish we had more time to get better acquainted.

If you work with Oracle, you probably know that MetaLink went the way of the Dodo as part of an upgrade to My Oracle Support during the weekend of November 6th, 2009.

And so far it hasn't gone too well, as evidenced by these threads on Oracle-L:

Issues with My Oracle Support
Metalink Fiasco

Many people were lamenting the loss of MetaLink well before its demise, but I don't think any were quite expecting the issues that are currently appearing.

A few have reported that it is working fine for them, but personally, I have found  it unusable all morning.

At least one issue with MetaLink appears to have been cleared up with MOS, that is while I was able to login to it last week.

During a routine audit of who had access to our CSI numbers, I came across a group of consultants that were no longer working for our company, and froze their accounts.  The next day I received a frantic voice mail  from a member of the consulting firm, and he informed me that they had no access to MetaLink because I had frozen their accounts.


I returned the call just a few minutes later, but they had already been able to resolve the issue, as one of their consultants with admin rights had been unaffected, and was able to unfreeze their accounts.


Removing them from the CSI is the better procedure, but in the past when I have attempted to do so, I found that there were still open issues owned by the accounts, and could not remove them. The application owners had been very clear that this access should be removed, so I froze the accounts, so that is what I did on this occasion as well.


This all seemed quite bizarre to me.  This must be a very strange schema in the ML user database, and some strange logic to go along with it.  By granting a user access to a CSI, MetaLink was giving me Carte Blanche to effectively remove them from MetaLink.

How has My Oracle Support fixed this?  Try as I might, I could not find a 'freeze' button in user administration in MOS.  So the fix seems to have been "remove the button"

Thursday, October 15, 2009

Oracle Open World 2009 Report - Part Two


Tuesday October 13th

Unconference on Indexes
Richard Foote
10/13/2009 10:00 AM

I started off the day attending the indexing presentation of fellow Oak Table member Richard Foote.  Foote has become quite well known for his expertise on index internals since the publication of Oracle B-Tree Index Internals: Rebuilding the Truth

This was basically a Q&A session, and I will include just a couple of the questions.

Q: Have you ever seen an index Skip Scan used correctly?
A: The short answer was 'No'

Foote mentioned that he had only rarely seen an index skip scan used, and then inappropriately.  For more information on skip scan, see Foote's blog entry on Index Skip Scans

Q: When can you safely drop an index that doesn't seem to be used?
A: That is very difficult to determine

The explanation for this answer is that it is very difficult to determine in an index is never used. It does require some patience, as the code that uses the index may be run only rarely, making it difficult to determine if it is actually used

Oracle Closed World

OCW actually started on Monday, though due to the wonders of technology I missed it on that day.  The event was invitation only, either by being present when it was mentioned, or by receiving an SMS text on your phone.

This is where technology comes in.  The SMS was rather garbled, and I received through a series of very short SMS messages what seemed to be an invitation to stroll into a dark alley somewhere in downtown San Francisco.  It was later cleared up and I attended on Tuesday.

Oracle Closed World is the brain child of Mogens Norgaard, another Oak Table member, and co-founder of Miracle AS Oracle consulting

On Tuesday Jonathan Lewis reprised his "How to be an Expert" presentation, the difference being that this audience was comprised of folks with a wide breadth of Oracle knowledge.

Lewis took advantage of this by making the questions harder, and chiding the audience for not knowing the answers.  All was in good fun. Undoubtedly the presence of beer didn't make the questions any easier to answer.

Wednesday was a presentation by Jeremiah Wilton, Oak Table member and formerly a DBA at Amazon.com.

Wilton presented a live demo on using Amazon's Elastic Compute Cloud (EC2) to provision a linux server, using Elastic Block Storage (EBS) to provide persistant storage, and preconfigured Amazon Machine Instances (AMI) to build provision the server with Oracle already installed.

The fact that Wilton was able to do this during a 1 hour live demo, complete with the inevitible mishaps that can occur during a live demo, and complete the task was quite impressive.

This appears to be a great method to setup test instances of Oracle for experimentation.  There are companies using this for production use as well.

 Amazon Web Services

Perl - A DBA's and Developers Best (forgotten) Friend
Arjen Visser - Avisit Solutions
10/13/2009 

Perl is a topic near and dear to my heart.

I have been using it since version 4 in the early 1990's, and have advocated it's use ever since.  It is a robust and powerful language with a huge collection of libraries developed by the user community and archived in the Comprehensive Perl Archive Network (URL HERE:  http://cpan.org/)

When I spotted the Perl session on the schedule I immediately signed up for it.

What I had not notice was the subtitle indicating it was a session for beginners.

No matter, I had to go.

The sesssion began with a concise but clear introduction to Perl basics.

So far, so good.

When the time came to discuss Perl connectivity to Oracle, it was a bit surprising to be confronted with a slide showing how to use Perl as a wrapper for sqlplus.

"Surely" I thought, "this is going to be a slide showing how not to do it"

If you have used Perl with Oracle, you are no doubt familiar with DBI  and DBD::Oracle

DBI is the Perl Database Interface module developed and maintained by Tim Bunce

DBD::Oracle is the Oracle driver for DBI, also originally developed and mainted by Tim Bunce, and now being maintained by The Pythian Group

DBI and DBD::Oracle are very mature and robust Perl packages for using Perl with Oracle.

You would also likely know that using Perl as a wrapper for sqlplus is something that is very cumbersome and inelegant. So as to not write whole treatise on why you shouldn't do this, I will simply say that doing so is rarely necessary, and never an optimal method.

Which brings us to the next slide in the presentation, which had a diagram showing the how DBI and DBD::Oracle fit into the Perl architecture.

The speaker then told the audience that these were hard to install and difficult to use, and didn't recommend using them.

After picking my jaw back up off the floor, I lost all interest in the rest of the presentation.  I don't remember what the rest of the slides were.  Maybe I blacked out from the shock. What I remember is walking away from the presentation rather incrudulous.

Just last week, a friend that had not used Perl asked my how to install it on a Solaris server.  With only a few lines of email that I typed from memory he was able to successfully install DBI and DBD::Oracle.

Hard to install indeed.

11 Things about 11gR2
Tom Kyte

Really it was Tom's top 10 list for 11gR2 - he liked his favorite feature so much he counted it twice.

And that is the one I will mention.

It is Edition Based Redefinition,

In a nutshell this feature allows you to create a new namespace for PL/SQL objects, creating new versions in a production database.

This will allow upgrading applications with little or no downtime, something that has always been on of the DBA holy grails.

Rather than try to explain it (OK, I don't yet know know it works) I will just tell you to take a look at Chapter 19 in the 11gR2 Advanced Application Developers Guide.

Wednesday Keynote
Larry Ellison

Ellison promised to discuss 4 topics, I will include 3 of them.

I left before the Fusion Middleware discussion.

Oracle enterprise linux update

One interesting fact presented was a survey performed by HP detailing Linux usage in corporate data centers.  The numbers are rather surprising.

* Oracle Enterprise Linux 65%
* Redhat 37%
* Suse 15%
* Other 2%


Next was the second generation of the Exadata Database Machine.

Essentially it is faster then gen 1.

It was used to set a new TPCC benchmark record - I believe it was 1,000,000 transactions per seond.

Ellison was proud of the record being 16 times faster than the record previously set by IBM, and rightfully so if those numbers are correct.

It seems IBM has challenged the results however, claiming the Exadata 2 as  'only 6 times faster'.  As you might imagine, Ellison had some fun with that, even offering a $10 million prize to anyone that can show that a Sun Exadata machine cannot run the app at least twice as fast as another other system.  IBM is invicted to participate.

At this time Ellison welcomed a special guest to the stage. Californie Governor  Arnold Schwarzenegger.

Commenting on being in a room with so many IT folks Schwarzenegger commented "As I came out on stage I felt my IQ shoot up 10 pts."

Schwarzenegger talked for a few minutes on the impact of technology on peoples lives. "Technologies impact is flesh and blood" in reference to how tech is used to aid response of public services such as firefighting.

Arnold called for a round of applause for Larry Ellison and Scott McNeely for being technology leaders.

The camera cut to Ellison, looking uncharacteristically humble as he mouthed 'Thank you'.

After Schwarzenegger left the stage, Ellison continued, this time discussing My Oracle Support.

My Oracle Support has been a hot topic lately, as the majority of DBA's are less than thrilled with the new Flash interface being imposed.  It is my understanding that a HTML version of the interface will be maintained, so we won't have to deal with Flash if we don't want to.

Here's where it gets interesting - the unification of Oracle Enterprise Manager and My Oracle Support.

There is now a 'My Oracle Support' tab in OEM.

DBAs will be allowed to opt in to OCM, Oracle Configuration Manager, allowing Oracle to perform automated discovery of bugs and patches needed, either in Oracle or other vendors on server (OS bugs)

Oracle will will then have a global database to mine for proactive response to possible problems.

When a configuration is found to have issues, all users with that configuration can be proactively notified.

The real news IMO though is the impact on patching.

Oracle recently started offering a new patch pacakge - PSU.

This is different than the CPU patch system, as it may require merge patches to resolve patch conflicts.

If OEM My Oracle Support determines that a merge patch is needed, it will automatically file an SR requesting the patch and notify you when it is available.

Even if you don't like OEM, this may be a good use of it.

Ok, that's enough for now, time for lunch.



Wednesday, October 14, 2009

Oracle Open World Report for October 11th and 12th

As I am attending Open World 2009 on blogger credentials, it seems proper I should  actually blog about it.

So, here it is.  I won't be blogging about keynotes or other things that will appear in the news the following day, but rather on some of the sessions I attend.

As I got back to my room too late and too tired to do this properly on Monday, I am putting Sunday and Monday in the same post.

Here goes:


Open World - Sunday 10/11/2009

While attending Oracle Open 2009, I thought it a good idea to make some report of sessions attended, and any interesting developments at OOW.

Some of the sessions I attended may not be considered DBA topics. I thought it would be interesting to break out of the DBA mold for a bit and attend some sessions that might be a bit outside the DBA realm.

Sue Harper - Everyday Tasks with Oracle SQL Developer

Sue Harper is the product manager for SQL Developer, and was presenting some of the useful new features of the SQL Developer 2.1 Early Adopter release.

While I have used SQL Developer from the time it was first released as Raptor, I have not until recently used it simply as a database browsing tool.  After exploring some of the features that allow writing reports with master/detail sections, I converted some SQLPLus scripts for use with SQL Developer.

SQL Developer is a very capable tool, so I attended this session to see what else I might be missing out on.

There was only one hour allocated for the session, and given the first 15 minutes were consumed convincing the audience why they should be using SQL Developer, there was just that much less time available to see the new features.

Taking a few minutes to market it is probably just in the product manager DNA.

Some of the features demonstrated were actually available in 1.5, but maybe not widely known.  As I have not used 2.1, I won't always differentiate between versions here. Some of these features may not be new to 2.1, maybe just improved.

Though not a new feature in 2.1, a few minutes were used to demonstrate the use of the built in version control integration. This is a very useful feature, and can be setup for seamless integration for CVS, SubVersion, Perforce, and one other I can't recall now.  It's definitely worth a look.

Some features that are new to 2.1 looked very useful:

Persistent column organizing and hiding.  When viewing data in SQL Developer, the columns may be easily rearranged and selected or de-selected for viewing.  While previous versions allowed dragging columns around, 2.1 has a nice dialog that makes this much easier.

New to 2.1 is column filtering.  By right clicking on a cell in the data pane, a dialog can be brought up to filter the data based on values found.  This allows filtering the data without requerying the table.

Also new to 2.1 is an XML DB Repository Navigator. It was mentioned, but alas there was not time to demonstrate it.

http://www.oracle.com/technology/products/database/sql_developer/index.html

http://sqldeveloper.oracle.com/


Christoper Jones - Python/Django Framework

This was a hands on developer session centered on using the Python scripting language with the Django Web application framework.  This was a fun session.  The lab was already setup, running Oracle Linux VM's with access via individual laptops setup in the training room.

The lab was a go at your own pace session, with instructions both printed and available via browser.  Browser based was the way to go with the instructions, as the examples could be cut and pasted, saving a lot of time typing.

I wasn't able to quite complete the lab as I kept encountering an error when running the web app.  It was probably due to an error in one of the scripts I modified during the session, but enough was accomplished to see that the Django Framework looked very interesting.  Perhaps even simple enough to use for a DBA.  Yes, I did search the schedule for a similar Perl session, perhaps using Mason or somesuch.

The training materials are going to be placed on OTN in the Oracle By Example section after Open World concludes.

http://www.oracle.com/technology/obe/start/index.html

Ray Smith - Linux Shell Scripting Craftmanship

The last session I attended on Sunday was all about shell script craftsmanship. Ray Smith was sharing some common sense methods that can be used to greatly enhance your shell scripts.

If you have done any software development, the information presented would be similar to what you already know.

  • Use white space and format your code for readability.
  • Don't be overly clever - other people will have to read the shell script.
  • Format your scripts with a header explaining the purpose of the script, and separate sections for independent and dependent variables, and a section for the code.
  • Use getops to control command line arguments.


Smith strongly advocated that everyone in the audience obtain a copy of the book "The Art of Unix Programming" by Eric S. Raymond.  This is not a new book by any means, but Smith drew upon it for many of the principles he advocated in scripting.



A couple of tools new to me were mentioned:

Zenity and Dialog - both of these are graphical dialog boxes that may be called from shell scripts in linux.

http://freshmeat.net/projects/zenity

Dialog is installed with linux, so just do man dialog to check it out.

It was an interesting presentation.  Though a lot of it was not new to me, the two dialog tools mentioned were, showing that no matter how well you think you may know a subject, you can always learn something from someone else.


Open World - Monday 10/12/2009


Jonathan Lewis Unconference - How to be an Expert
http://www.jlcomp.demon.co.uk/
http://jonathanlewis.wordpress.com/

Jonathan Lewis had an interesting unconference presentation.

http://wiki.oracle.com/page/Oracle+OpenWorld+Unconference

In a nutshell, it comes down to this:

You must practice, and practice quite a lot.

To make the point, he used the joke about the American Tourist asking the grounds keeper how the lawns of the Royal Estates are maintained to be so lush, have such and even texture and in short, to be so perfect.

The groundskeeper explained while the tourist took notes.

First you must dig down 4 inches.

Then you must put down a layer of charcoal.

Then another 1 inch layer find sharp sand.

Finally a layer of fine loam goes on top.

You then must seed the lawn, and water it very well for 6 weeks.

After 6 weeks, you must cut the grass, being very carefully to remove only a small amount as you mow.  This must be done three times a week.

And then you continue doing this for 200 years.

Ok, everyone had a good laugh at that, but the point was made.

Reading some books and being able to run some scripts does not make you an expert.  Lots and lots of practice may make you an expert, if you apply yourself well.

During the presentation he asked a number of questions of the audience made up mostly of DBA's. I will reprise a couple of them here.

Q1:  Assuming you have a simple heap table, with no indexes, you update a single column in 1 row of the table.  How many bytes of redo will that generate?

Q2: Who of you in the audience when you insert data into a table, deliberately insert duplicate data into the database?

I will leave you to speculate on the answers a bit.

Of those 2 questions, only 1 was answered correctly by the audience.

Leng Tan and Tom Kyte DBA 2.0 - Battle of the DBA's

What is the difference between a DBA 1.0 (the old days) and a DBA 2.0 ( the modern DBA)

DBA 2.0 has modern tools, self managing database enabled by AWR and the Diag and Tuning packs.

DBA 1.0 uses scripts and works from the command line.

One the stage in addition to Kyte and Tan were two DBA's, each with a laptop and an oracle server to work on.

Two scenarios were presented for a timed hands on problem that each DBA must work through.

First scenario - Security Audit

Each DBA is given 6 minutes to do a database audit and report on possible vulnerabilities

DBA 1.0 ran scripts to check for open accounts, default passwords, publicly granted packages and umask settings.

After doing so he ran a script to remove privileges granted to PUBLIC, and locked a couple of accounts.

DBA 2.0

DBA 2.0 worked from the Oracle Enterprise Manager console, using the Secure
Configuration for Oracle Database.

He was able to observe the database security score, navigate through several screens and correct the same security problems that DBA 1.0 did.  Following that he was able to see that the security score for the database had improved.

So the conclusion made by the presenter is that OEM is clearly superior because OEM will automatically generate the needed data every night.

By contrast DBA 1.0 can only do one db at a time.

I do not believe this demonstration to be a valid comparison - it's quite simple to run the script against any number of databases from a script, and report on anomalies.

At this point it should be mentioned that DBA 1.0 spent 4 minutes explaining what he was going to do, another minute explaining what the scripts were doing, with less than 1 minute spent actually running the scripts.

By comparison, DBA 2.0 was navigating through screens through nearly the entire 6 minutes.

The statement was made by the presented that doing this with scripts at the command line was far too tedious a task, and DBA 1.0 would never be able to accomplish the task for 200 databases.

I won't belabor the point (well, not too much) but automating these kinds of tasks is relatively simple for command line tools.  Which is easier and more productive?  Automating a set of scripts to poll all of your databases, or navigate through OEM for 200 databases?

The present referred to using OEM as "really convenient"  Sorry, but I have never found OEM to be at all convenient.  Whenever I run into problems with it, it requires a SR to fix it.

Thetre was a round 2 as well regarding testing execution plans both before and after setting optimizer_features _enable to a newer version.  OEM fared well here compared the the scripting method as the scripts used 'explain plan' and OEM actually executed the queries to gather execution plan information.

That isn't to say however that the scripts could not be modified to do the same.  No, I am not completely against GUI environments.  I am just against making more work for DBA tasks.

Enough for now, I will report on Tuesdays conferences later this week.

Sunday, September 06, 2009

Shell Tricks

DBAs from time to time must write shell scripts. If your environment is strictly Windows based, this article may hold little interest for you.

Many DBAs however rely on shell scripting to manage databases. Even if you use OEM for many tasks, you likely use shell scripts to manage some aspects of DBA work.

Lately I have been writing a number of scripts to manage database statistics - gathering, deleting, and importing exporting both to and from statistics tables exp files.

Years ago I started using the shell builtin getopts to gather arguments from the command line. A typical use might look like the following:

while getopts d:u:s:T:t:n: arg
do
case $arg in
d) DATABASE=$OPTARG
echo DATABASE: $DATABASE;;
u) USERNAME=$OPTARG
echo USERNAME: $USERNAME;;
s) SCHEMA=$OPTARG
echo SCHEMA: $SCHEMA;;
T) TYPE=$OPTARG
echo TYPE: $TYPE;;
t) TABLE_NAME=$OPTARG;;
#echo TABLE_NAME: $TABLE_NAME
n) OWNER=$OPTARG
echo OWNER: $OWNER;;
*) echo "invalid argument specified"; usage;exit 1;
esac

done


In this example, the valid arguments are -d, -u, -s, -T, -t and -n. All of these arguments require a value.

The command line arguments might look like this:
somescript.sh -d orcl -u system -s scott

If an invalid argument such as -z is passed, the script will exit with the exit code set to 1.

For the script to work correctly, some checking of the arguments passed to the script must be done.

For this script, the rules are as follows:
  • -d and -u must always be set
  • -s must be set if -T is 'SCHEMA'
  • -t and -n must both have a value or be blank
  • -s must be used with -T
In this example, values for -T other than 'SCHEMA' are not being checked.

The usual method (at least for me) to test the validity of command line arguments has always been to use the test, or [] operator with combinations of arguments.

For the command line arguments just discussed, the tests might look like the following:

[ -z "$DATABASE" -o -z "$USERNAME" ] && {
echo Database or Username is blank
exit 2
}

# include schema name if necessary
[ "$TYPE" == 'SCHEMA' -a -z "$SCHEMA" ] && {
echo Please include schema name
exit 3
}

# both owner and tablename must have a value, or both be blank
[ \( -z "$TABLE_NAME" -a -n "$OWNER" \) -o \( -n "$TABLE_NAME" -a -z "$OWNER" \) ] && {
echo Please specify both owner and tablename
echo or leave both blank
exit 4
}

# if -s is set, so must -T
[ -n "$SCHEMA" -a -z "$TYPE" ] && {
echo Please include a type with -T
exit 5
}


As you can see, there are a fair number of tests involved to determine the validity of the command line arguments. You may have guessed why I skipped one for this demo - I just did not want to write any more tests.

Validating command line arguments really gets difficult with a larger number of possible arguments. Worse yet, any later modifications to the script that require a new command line argument become dreaded tasks that are put off as long as possible due the complexity of testing the validity of command line arguments.

While writing a script that had 11 possible arguments, I was dreading writing the command line argument validation section, I thought there must be a better way.

It seemed that there must be a simple method of using regular expressions to validate combinations of command line arguments. I had never seen this done, and after spending a fair bit of time googling the topic it became apparent that there was not any code available for a cut and paste solution, so it seemed a nice opportunity to be innovative.

After experimenting a bit, I found what I think is a better way.

The method I use is to concatenate all possible command line arguments into a ':' delimited string, and then use a set of pre-prepared regexes to determine whether or not the command line arguments are valid.

One immediately obvious drawback to this method is that arguments containing the ':' character cannot be used. However the delimiting character can easily be changed if needed.

Using the same example as previous, the command line arguments are all concatenated into a string and converted to upper case:

ALLARGS=":$USERNAME:$DATABASE:$OWNER:$TABLE_NAME:$SCHEMA:$TYPE:"
# upper case arges
ALLARGS=$(echo $ALLARGS | tr "[a-z]" "[A-Z]")


Next a series of regular expressions are created. The first two are generic, and may or may not be used as building blocks for other regular expressions. The others all correspond to a specific command line argument


# alphanumeric only, at least 1 character
export ALNUM1="[[:alnum:]]+"
# alphanumeric only, at least 3 characters
export ALNUM3="[[:alnum:]]{3,}"
# username - alphanumeric only at least 3 characters
export USER_RE=$ALNUM3
# database - alphanumeric only at least 3 characters
export DATABASE_RE=$ALNUM3
# owner - alphanumeric and _ and $ characters
export OWNER_RE='[[:alnum:]_$]+'
# table_name - alphanumeric and _, # and $ characters
export TABLE_RE='[[:alnum:]_#$]+'
# schema - alphanumeric and _ and $ characters
export SCHEMA_RE='[[:alnum:]_$]+'


These regular expressions could use further refinement (such as username must start with alpha only ) but are sufficient for this demonstration.

Next, the regular expressions are concatenated together into ':' delimited strings, with each possible command line argument represented either by its corresponding regex, or by null.

The regexes are stuffed into a bash array. For our example, it looks like this:
#   :   user        :  db           :  owner        :  table     : schema        : type
VALID_ARGS=(
":$USER_RE:$DATABASE_RE:$OWNER_RE:$TABLE_RE::(DICTIONARY_STATS|SYSTEM_STATS|FIXED_OBJECTS_STATS):" \
":$USER_RE:$DATABASE_RE::::(DICTIONARY_STATS|SYSTEM_STATS|FIXED_OBJECTS_STATS):" \
":$USER_RE:$DATABASE_RE:$OWNER_RE:$TABLE_RE:$SCHEMA_RE:(SCHEMA):" \
":$USER_RE:$DATABASE_RE:::$SCHEMA_RE:SCHEMA:")

Notice that there are four different combitations of command line arguments represented.

In all cases the USERNAME and DATABASE are required and must correspond to the regex provided.

In the first combination of arguments, the owner and table must also be specified, and type (-T) must be either one of DICTIONARY_STATS, SYSTEM_STATS or FIXED_OBJECTS_STATS.

In the second possible combination, the only argument allowed in addition to DATABASE and USERNAME is the type (-T) argument.

The third combination requires the OWNER, TABLE_NAME and SCHEMA argument to have a valid value, and the TYPE argument must be set to SCHEMA.

The final combination of arguments requires just the SCHEMA argument and the TYPE argument must be set to SCHEMA, in addition to the USERNAME and DATABASE arguments.

By now you likely want to know just how these regular expressions are tested. The following function is used to test the command line arguments against each regular expression:
function validate_args {
typeset arglist
arglist=$1

while shift
do
[ -z "$1" ] && break
if [ $(echo $arglist | grep -E $1 ) ]; then
return 0
fi

done
return 1

}


Here's how it is used in the script:

# VALID_ARGS must NOT be quoted or it will appear as a single arg in the function
validate_args $ALLARGS ${VALID_ARGS[*]}


While this method may appear somewhat confusing at first, it becomes less so after using it a few times. It greatly simplifies the use of many command line arguments that may appear in differing combinations.

As far as I know, this method only works properly with the bash shell. I have done testing on only two shells, bash and ksh. It does not work properly on ksh.

Here's a demonstration of the ksh problem. The following script is run from both ksh and bash:

function va {

echo ARG1: $1
}


R1="[[:alnum]]+"
R2="[[:alnum]]{3,}"

va $R1
va $R2
And here are the results:
18:9-jkstill-18 > ksh t3
ARG1: [[:alnum]]+
ARG1: [[:alnum]]3
[ /home/jkstill/bin ]

jkstill-18 > bash t3
ARG1: [[:alnum]]+
ARG1: [[:alnum]]{3,}
[ /home/jkstill/bin ]



Notice that when the script is run with ksh, the '{', '}' and ',' are removed from the regular expression. I could find no combination of quoting and escape characters that could prevent that from happening. This method of command line argument validation could be made to work using ksh if those characters are not used in the regexes. That would be rather limiting though.

One other drawback you may have noticed with this method of validating command line arguments is that when an error condition is encountered, the exit code is always 1. With the [] method it was easy to exit with different codes to indicate the nature of the error. Something similar could likely be done by embedding a code into each set of regexes, but I will leave that as an exercise for the reader.

The complete prototype script, as well as a test script can be downloaded:


The next article will include a set of functions used along with the validate_args() function to make shell scripts a bit more robust.

Thursday, August 06, 2009

Detecting Corrupt Data Blocks

Or more accurately, how not to detect corrupt data blocks.

This thread on Oracle-L is regarding lost writes on a database.

One suggestion was made to use the exp utility to export the database, thereby determining if there are corrupt blocks in the database due to disk failure. I didn't give it much thought at first, but fellow Oak Table member Mark Farnham got me thinking about it.

Using exp to detect corrupt blocks, or rather, the absence of corrupt blocks may work, but then again, it may not. It is entirely possible to do a  full table scan on a table successfully, as would happen during an export, even though the blocks on disk have been corrupted.

This can be demonstrated by building a table, ensuring the contents are cached, then destroying the data in the data file, followed by a successful export of the table.

Granted, there are a lot of mitigating factors that could be taken into consideration as to whether or not this would happen in a production database. That's not the point: the point is that it could happen, so exp is not a reliable indicator of the state of the data files on disk.

This test was performed on Oracle 10.2.0.4 EE on RH Linux ES 4. Both are 32 bit.

First create a test tablespace:

create tablespace lost_write datafile '/u01/oradata/dv11/lost_write.dbf' size 1m
extent management local
uniform size 64k
/



Next the table LOST_WRITE is created in the tablespace of the same name. This will be used to test the assertion that a successful export of the table can be done even though the data on disk is corrupt.

create table lost_write
cache
tablespace lost_write
as
select * from dba_objects
where rownum <= 1000
/

begin
dbms_stats.gather_table_stats(user,'LOST_WRITE');
end;
/

select tablespace_name, blocks, bytes
from user_segments
where segment_name = 'LOST_WRITE'
/


TABLESPACE_NAME BLOCKS BYTES
------------------------------ ---------- ----------
LOST_WRITE 16 131072

1 row selected.



Next, do a full table scan and verify that the blocks are cached:

select * from lost_write;

Verify in cache:
select file#,block#,class#, status
from v$bh where ts# = (select ts# from sys.ts$ where name = 'LOST_WRITE')
order by block#
/

FILE#     BLOCK#     CLASS# STATUS
---------- ---------- ---------- -------
40          2         13 xcur
40          3         12 xcur
40          9          8 xcur
40         10          9 xcur
40         11          4 xcur
40         12          1 xcur
40         13          1 xcur
40         14          1 xcur
40         15          1 xcur
40         16          1 xcur
40         17          1 xcur
40         18          1 xcur
40         19          1 xcur
40         20          1 xcur
40         21          1 xcur
40         22          1 xcur
40         23          1 xcur




Now swap the bytes in the file, skipping the first 2 oracle blocks
Caveat: I don't know if that was the correct # of blocks, and I didn't spend any time trying to find out
Also, I belatedly saw that count probably should have been 22 rather than 16, but the results still served the purpose of corrupting the datafile, as we shall see in a bit.

What this dd command is doing is using the same file for both input and output, and rewriting blocks 3-18, swapping each pair of bytes.

dd if=/u01/oradata/dv11/lost_write.dbf of=/u01/oradata/dv11/lost_write.dbf bs=8129 skip=2 count=16 conv=swab,notrunc



The effect is demonstrated by this simple test:

jkstill-19 > echo hello | dd
hello
0+1 records in
0+1 records out
[ /home/jkstill ]

jkstill-19 > echo hello | dd conv=swab
ehll
o0+1 records in
0+1 records out


Now we can attempt the export:

exp tables=\(jkstill.lost_write\) ...

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, Data Mining and Real Application Testing options
Export done in WE8MSWIN1252 character set and AL16UTF16 NCHAR character set

About to export specified tables via Conventional Path ...
. . exporting table LOST_WRITE 1000 rows exported
Export terminated successfully without warnings.
19 > echo hello | dd
hello
0+1 records in
0+1 records out
[ /home/jkstill ]

jkstill-19 > echo hello | dd conv=swab
ehll
o0+1 records in
0+1 records out


So, even though the data on disk has been corrupted, the export succeeded. That is due to the table being created with the CACHE option, and all the blocks being cached at the time of export. It may not be necessary to use the CACHE option, but I used it to ensure the test would succeed.

Now let's see what happens when trying to scan the table again. First the NOCACHE option will be set on the table, then a checkpoint.

10:42:45 dv11 SQL> alter table lost_write nocache;

10:43:02 dv11 SQL> alter system checkpoint;

Now try to scan the table again:

10:43:14 ordevdb01.radisys.com - js001292@dv11 SQL> /
select * from lost_write
*
ERROR at line 1:
ORA-00376: file 40 cannot be read at this time
ORA-01110: data file 40: '/u01/oradata/dv11/lost_write.dbf'



A corollary conclusion can drawn from this example.

If you do discover bad data blocks, you just might be able to do an export of the tables that are in the affected region before doing any recovery. This might be a belt and suspenders approach, but DBA's are not generally being known for taking unnecessary chances when possible data loss is on the line.

Thursday, July 09, 2009

Classic MetaLink vs. My Oracle Support

If you are in any way involved with supporting Oracle products, then you know that the death knell for Classic MetaLink has sounded. MetaLink will be unplugged at the end of July 2009.

The new support site, My Oracle Support, seems to be causing some pain for quite a few people in the Oracle user community.

Some of the complaints regard limited platform support due to the Adobe Flash 9 requirements, navigation and response times.

On the other hand there are some cool new features such as Power View, Configuration Manager and the new Advanced Search options.

How do you feel about it?

Here's a chance to let your opinion be know as a poll has been created where you can vote on it.


At this time 637 voters have voiced their opinion about MetaLink and My Oracle Support.

Current Poll results can be found in this Excel File: MetaLink vs My Oracle Support Results

Wednesday, April 29, 2009

Querying v$lock

There have been a number of scripts made available for querying v$lock to diagnose locking issues.

One example is one I got long ago from tsawmiller on Oracle-L. The original script showlock.sql, or something close to it is still available at OraFaq.com showlock.sql

showlock.sql has morphed over the years to keep up with changing versions of Oracle.

At one time the showlock.sql resembled the OH/rdbms/admin/utllockt.sql script, in that it created a temporary table to speed up the results, as the join on v$lock, dba_sessions and dba_waiters was so slow.

That was remedied at one point by the use of the ordered hint. That hint may no longer be necessary, but the script is still fast on all versions of Oracle that I need it on, (9i-11g) and I am too lazy to test something that isn't broken.

This script could still be further updated by the use of the v$lock_type view, eliminating the large decode statements in the script. As v$lock_type is not available in 9i though, I leave the decodes in. When the last 9i database is gone from our environment however, the script can be shortened considerably.

The decode statements were mostly lifted from a script provided by Oracle. MetaLink document (or My Oracle Support now I guess) # 1020008.6 has a 'fully decoded' locking script that is current though 11g I believe.

The problem with that script however is that it does not correctly look up the name of the object that is locked.

The reason I have even brought this up is that a bit of my workday yesterday was spent updating the script, and making sure it worked as expected. The COMMAND column was also added. In addition, the outer joins were converted to the much neater ANSI join syntax, and one outer join was eliminated.

Here's the output from a test. It may be easier to read if you cut and paste it into a text editor, as the formatting here doesn't work well for wide output. Better yet, test the script and look at the output for yourself.


       Oracle              Database                                  Lock                  Mode            Mode       OS                 OS
SID Usernam WATR BLKR Object COMMAND Type Lock Description Held Requested Program Process
------ ------- ----- ----- ------------------------- --------------- ---- ---------------- --------------- ---------- ------------------ -------
73 JKSTILL 83 JKSTILL.A SELECT TM DML enqueue lock Exclusive None sqlplus@poirot (TN 21430
83 JKSTILL 73 JKSTILL.A LOCK TABLE TM DML enqueue lock None Exclusive sqlplus@poirot (TN 21455

2 rows selected.




Though utllockt.sql may work well enough, it does have a couple of drawbacks:

1. it does not provide enough information
2. it creates a temporary table.

That second item means that you better be sure to run the script from a session separate from any holding locks. In production that probably does not matter, as that is what would normally be done anyway. During testing however it can be a bit frustrating until you realize the the DDL in the script is causing your locks to be released.

What I like about this script is that it shows me what I need to know, and it is very fast.
Of course, now that I have stated that someone will run it on a system where it performs poorly...

For showlock.sql to work, the dba_waiters view must be created.
If this has not already been done, it can be created by logging in as SYSDBA and running the OH/rdbms/admin/catblock.sql script.

Here's how you can easily test sh0wlock.sql:

Session A -
create table a (a integer);
lock table a in exclusive mode;

Session B
lock table a in exclusive mode;

Now either from session A or a new session, run the showlock.sql script.

Here's the script.


-- showlock.sql - show all user locks
--
-- see ML Note 1020008.6 for fully decoded locking script
-- parts of the that script to not work correctly, but the
-- lock types are current
-- (script doesn't find object that is locked )
--
-- speeded up greatly by changing order of where clause,
-- jks 04/09/1997 - show lock addresses and lockwait

-- jks 04/09/1997 - outer join on all_objects
-- encountered situation on 7.2
-- where there was a lock with no
-- matching object_id
-- jks 02/24/1999 - join to dba_waiters to show waiters and blockers
-- jkstill 05/22/2006 - revert back to previous version without tmp tables
-- update lock info
-- add lock_description and rearrange output
-- jkstill 04/28/2008 - added command column
-- updated lock types
-- removed one outer join by using inline view on sys.user$
-- jkstill 04/28/2008 - added subquery factoring
-- converted to ANSI joins
-- changed alias for v$lock to l and v$session to s

set trimspool on
ttitle off
set linesize 150
set pagesize 60
column command format a15
column osuser heading 'OS|Username' format a7 truncate
column process heading 'OS|Process' format a7 truncate
column machine heading 'OS|Machine' format a10 truncate
column program heading 'OS|Program' format a18 truncate
column object heading 'Database|Object' format a25 truncate
column lock_type heading 'Lock|Type' format a4 truncate
column lock_description heading 'Lock Description'format a16 truncate
column mode_held heading 'Mode|Held' format a15 truncate
column mode_requested heading 'Mode|Requested' format a10 truncate
column sid heading 'SID' format 999
column username heading 'Oracle|Username' format a7 truncate
column image heading 'Active Image' format a20 truncate
column sid format 99999
col waiting_session head 'WATR' format 9999
col holding_session head 'BLKR' format 9999

with dblocks as (
select /*+ ordered */
l.kaddr,
s.sid,
s.username,
lock_waiter.waiting_session,
lock_blocker.holding_session,
(
select name
from sys.user$
where user# = o.owner#
) ||'.'||o.name
object,
decode(command,
0,'BACKGROUND',
1,'Create Table',
2,'INSERT',
3,'SELECT',
4,'CREATE CLUSTER',
5,'ALTER CLUSTER',
6,'UPDATE',
7,'DELETE',
8,'DROP',
9,'CREATE INDEX',
10,'DROP INDEX',
11,'ALTER INDEX',
12,'DROP TABLE',
13,'CREATE SEQUENCE',
14,'ALTER SEQUENCE',
15,'ALTER TABLE',
16,'DROP SEQUENCE',
17,'GRANT',
18,'REVOKE',
19,'CREATE SYNONYM',
20,'DROP SYNONYM',
21,'CREATE VIEW',
22,'DROP VIEW',
23,'VALIDATE INDEX',
24,'CREATE PROCEDURE',
25,'ALTER PROCEDURE',
26,'LOCK TABLE',
27,'NO OPERATION',
28,'RENAME',
29,'COMMENT',
30,'AUDIT',
31,'NOAUDIT',
32,'CREATE EXTERNAL DATABASE',
33,'DROP EXTERNAL DATABASE',
34,'CREATE DATABASE',
35,'ALTER DATABASE',
36,'CREATE ROLLBACK SEGMENT',
37,'ALTER ROLLBACK SEGMENT',
38,'DROP ROLLBACK SEGMENT',
39,'CREATE TABLESPACE',
40,'ALTER TABLESPACE',
41,'DROP TABLESPACE',
42,'ALTER SESSION',
43,'ALTER USER',
44,'COMMIT',
45,'ROLLBACK',
46,'SAVEPOINT',
47,'PL/SQL EXECUTE',
48,'SET TRANSACTION',
49,'ALTER SYSTEM SWITCH LOG',
50,'EXPLAIN',
51,'CREATE USER',
52,'CREATE ROLE',
53,'DROP USER',
54,'DROP ROLE',
55,'SET ROLE',
56,'CREATE SCHEMA',
57,'CREATE CONTROL FILE',
58,'ALTER TRACING',
59,'CREATE TRIGGER',
60,'ALTER TRIGGER',
61,'DROP TRIGGER',
62,'ANALYZE TABLE',
63,'ANALYZE INDEX',
64,'ANALYZE CLUSTER',
65,'CREATE PROFILE',
66,'DROP PROFILE',
67,'ALTER PROFILE',
68,'DROP PROCEDURE',
69,'DROP PROCEDURE',
70,'ALTER RESOURCE COST',
71,'CREATE SNAPSHOT LOG',
72,'ALTER SNAPSHOT LOG',
73,'DROP SNAPSHOT LOG',
74,'CREATE SNAPSHOT',
75,'ALTER SNAPSHOT',
76,'DROP SNAPSHOT',
79,'ALTER ROLE',
85,'TRUNCATE TABLE',
86,'TRUNCATE CLUSTER',
87,'-',
88,'ALTER VIEW',
89,'-',
90,'-',
91,'CREATE FUNCTION',
92,'ALTER FUNCTION',
93,'DROP FUNCTION',
94,'CREATE PACKAGE',
95,'ALTER PACKAGE',
96,'DROP PACKAGE',
97,'CREATE PACKAGE BODY',
98,'ALTER PACKAGE BODY',
99,'DROP PACKAGE BODY',
command||'-UNKNOWN'
) COMMAND,
-- lock type
-- will always be TM, TX or possible UL (user supplied) for user locks
l.type lock_type,
decode
(
l.type,
'BL','Buffer hash table instance lock',
'CF',' Control file schema global enqueue lock',
'CI','Cross-instance function invocation instance lock',
'CS','Control file schema global enqueue lock',
'CU','Cursor bind lock',
'DF','Data file instance lock',
'DL','Direct loader parallel index create',
'DM','Mount/startup db primary/secondary instance lock',
'DR','Distributed recovery process lock',
'DX','Distributed transaction entry lock',
'FI','SGA open-file information lock',
'FS','File set lock',
'HW','Space management operations on a specific segment lock',
'IN','Instance number lock',
'IR','Instance recovery serialization global enqueue lock',
'IS','Instance state lock',
'IV','Library cache invalidation instance lock',
'JQ','Job queue lock',
'KK','Thread kick lock',
'LA','Library cache lock instance lock (A=namespace)',
'LB','Library cache lock instance lock (B=namespace)',
'LC','Library cache lock instance lock (C=namespace)',
'LD','Library cache lock instance lock (D=namespace)',
'LE','Library cache lock instance lock (E=namespace)',
'LF','Library cache lock instance lock (F=namespace)',
'LG','Library cache lock instance lock (G=namespace)',
'LH','Library cache lock instance lock (H=namespace)',
'LI','Library cache lock instance lock (I=namespace)',
'LJ','Library cache lock instance lock (J=namespace)',
'LK','Library cache lock instance lock (K=namespace)',
'LL','Library cache lock instance lock (L=namespace)',
'LM','Library cache lock instance lock (M=namespace)',
'LN','Library cache lock instance lock (N=namespace)',
'LO','Library cache lock instance lock (O=namespace)',
'LP','Library cache lock instance lock (P=namespace)',
'LS','Log start/log switch enqueue lock',
'MB','Master buffer hash table instance lock',
'MM','Mount definition gloabal enqueue lock',
'MR','Media recovery lock',
'PA','Library cache pin instance lock (A=namespace)',
'PB','Library cache pin instance lock (B=namespace)',
'PC','Library cache pin instance lock (C=namespace)',
'PD','Library cache pin instance lock (D=namespace)',
'PE','Library cache pin instance lock (E=namespace)',
'PF','Library cache pin instance lock (F=namespace)',
'PF','Password file lock',
'PG','Library cache pin instance lock (G=namespace)',
'PH','Library cache pin instance lock (H=namespace)',
'PI','Library cache pin instance lock (I=namespace)',
'PI','Parallel operation lock',
'PJ','Library cache pin instance lock (J=namespace)',
'PK','Library cache pin instance lock (L=namespace)',
'PL','Library cache pin instance lock (K=namespace)',
'PM','Library cache pin instance lock (M=namespace)',
'PN','Library cache pin instance lock (N=namespace)',
'PO','Library cache pin instance lock (O=namespace)',
'PP','Library cache pin instance lock (P=namespace)',
'PQ','Library cache pin instance lock (Q=namespace)',
'PR','Library cache pin instance lock (R=namespace)',
'PR','Process startup lock',
'PS','Library cache pin instance lock (S=namespace)',
'PS','Parallel operation lock',
'PT','Library cache pin instance lock (T=namespace)',
'PU','Library cache pin instance lock (U=namespace)',
'PV','Library cache pin instance lock (V=namespace)',
'PW','Library cache pin instance lock (W=namespace)',
'PX','Library cache pin instance lock (X=namespace)',
'PY','Library cache pin instance lock (Y=namespace)',
'PZ','Library cache pin instance lock (Z=namespace)',
'QA','Row cache instance lock (A=cache)',
'QB','Row cache instance lock (B=cache)',
'QC','Row cache instance lock (C=cache)',
'QD','Row cache instance lock (D=cache)',
'QE','Row cache instance lock (E=cache)',
'QF','Row cache instance lock (F=cache)',
'QG','Row cache instance lock (G=cache)',
'QH','Row cache instance lock (H=cache)',
'QI','Row cache instance lock (I=cache)',
'QJ','Row cache instance lock (J=cache)',
'QK','Row cache instance lock (L=cache)',
'QL','Row cache instance lock (K=cache)',
'QM','Row cache instance lock (M=cache)',
'QN','Row cache instance lock (N=cache)',
'QO','Row cache instance lock (O=cache)',
'QP','Row cache instance lock (P=cache)',
'QQ','Row cache instance lock (Q=cache)',
'QR','Row cache instance lock (R=cache)',
'QS','Row cache instance lock (S=cache)',
'QT','Row cache instance lock (T=cache)',
'QU','Row cache instance lock (U=cache)',
'QV','Row cache instance lock (V=cache)',
'QW','Row cache instance lock (W=cache)',
'QX','Row cache instance lock (X=cache)',
'QY','Row cache instance lock (Y=cache)',
'QZ','Row cache instance lock (Z=cache)',
'RE','USE_ROW_ENQUEUE enforcement lock',
'RT','Redo thread global enqueue lock',
'RW','Row wait enqueue lock',
'SC','System commit number instance lock',
'SH','System commit number high water mark enqueue lock',
'SM','SMON lock',
'SN','Sequence number instance lock',
'SQ','Sequence number enqueue lock',
'SS','Sort segment lock',
'ST','Space transaction enqueue lock',
'SV','Sequence number value lock',
'TA','Generic enqueue lock',
'TD','DDL enqueue lock',
'TE','Extend-segment enqueue lock',
'TM','DML enqueue lock',
'TO','Temporary Table Object Enqueue',
'TS',decode(l.id2,
0,'Temporary segment enqueue lock (ID2=0)',
1,'New block allocation enqueue lock (ID2=1)',
'UNKNOWN!'
),
'TT','Temporary table enqueue lock',
'TX','Transaction enqueue lock',
'UL','User supplied lock',
'UN','User name lock',
'US','Undo segment DDL lock',
'WL','Being-written redo log instance lock',
'WS','Write-atomic-log-switch global enqueue lock',
'UNKOWN'
) lock_description,
decode
(
l.lmode,
0, 'None', /* Mon Lock equivalent */
1, 'No Lock', /* N */
2, 'Row-S (SS)', /* L */
3, 'Row-X (SX)', /* R */
4, 'Share', /* S */
5, 'S/Row-X (SRX)', /* C */
6, 'Exclusive', /* X */
to_char(l.lmode)
) mode_held,
decode
(
l.request,
0, 'None', /* Mon Lock equivalent */
1, 'No Lock', /* N */
2, 'Row-S (SS)', /* L */
3, 'Row-X (SX)', /* R */
4, 'Share', /* S */
5, 'S/Row-X (SSX)', /* C */
6, 'Exclusive', /* X */
to_char(l.request)
) mode_requested,
s.osuser,
s.machine,
s.program,
s.process
from
v$lock l
join v$session s on s.sid = l.sid
left outer join sys.dba_waiters lock_blocker on lock_blocker.waiting_session = s.sid
left outer join sys.dba_waiters lock_waiter on lock_waiter.holding_session = s.sid
left outer join sys.obj$ o on o.obj# = l.id1
where s.type != 'BACKGROUND'
)
select
--kaddr,
sid,
username,
waiting_session,
holding_session,
object,
command,
lock_type,
lock_description,
mode_held,
mode_requested,
--osuser,
--machine,
program,
process
from dblocks
order by sid, object
/

Thursday, March 05, 2009

The Evils of Encoding Meaning Into Data

About a year ago I worked on collating and transforming data from an application so that it could be imported into another app. I've performed this exercise a number of times in the past 20 or so years, and while it is never boring, it is sometimes quite challenging.

Oft times when trying to make the data suitable for inclusion in the new applications, I ask my self "What were they thinking?"

I will leave the answer to that up to your imagination, as my answers to that particular question are not always complimentary.

One of the problems run into is when the data modeler and database designer chose to allow data with encoded special meanings.

In other words, there is key data in the database, the meaning of which can only be ascertained by asking someone that knows, or finding it in the documentation (Is there documentation?)

The example I will use is a real one I ran into.

Given a table containing projects, and revisions of those projects that are indicated as such by a revision "number".

A sensible series of revision numbers might be a numeric range beginning with 1, or even an alpha range beginning with 'A', or even a combination thereof.

Personally, I prefer the unambiguous nature of an ascending numeric range. 2 is always greater than 1. There is no case involved as there would be with an alpha range.

Which is greater, "a1" or "A1"? You know how this will sort in the database, but will it be what the users expect?

While a numeric range would have been preferable, the data I was given to work with used a range of revision "numbers" that was numeric and alpha, with a numeric value being regarding as greater than the "numeric" value. The "numeric" is in quotes, as this obviously must be stored as a string, and in this case with a leading zero.

Given this rule, a project with revisions of A,B,C,D,01,02 would have a most recent revision of "02". This is not the way it works in the standard collating order in the database.

11:34:03 SQL> create table rev (version varchar2(2));

Table created.

11:34:03 SQL>
11:34:03 SQL> insert into rev values('A');
11:34:03 SQL> insert into rev values('B');
11:34:03 SQL> insert into rev values('C');
11:34:03 SQL> insert into rev values('01');
11:34:03 SQL> insert into rev values('02');
11:34:03 SQL>
11:34:03 SQL> select * from rev order by version;

VE
--
01
02
A
B
C

5 rows selected.

In a perfect world, the data modeler or data architect would work with the users to create a reasonable versioning method. In this case however there is no choice but to work with what I was given.

From the data provided, only the most recent version was to be included in the imported data.
With a versioning system that doesn't follow the normal collating order, this requires some code to determine what is really the greatest version.

If you know a method to accomplish this in straight SQL, please feel free to post it. I could not think of a pure SQL solution.

The following generic function was created to determine the greatest revision value given a table name and the key columns.

-- maxrev.sql
-- return the maximum numeric revision
-- returna alpha if no numeric exists
-- maximum rev is numeric
-- revs are alpha - mixed number and character

create or replace function maxrev
(
table_name_in varchar2,
key_column_in varchar2,
value_column_in varchar2,
key_value_in varchar2
)
return varchar2
as
v_max_rev integer := 0;
v_tmp_rev integer;
v_col_rev varchar2(10);
v_return_rev varchar2(30);
type curType is ref cursor;
l_cursor curType;
v_sql varchar2(1000);

function is_number( chk_data_in varchar2 )
return boolean
is
dummy number(38,4);
begin
dummy := to_number(chk_data_in);
return true;
exception
when value_error then
return false;
when others then
raise;
end;

begin
-- get the maximum rev, whether alpha or numeric
-- there may not be any numerics
v_sql := 'select max(' || value_column_in || ') from ' || table_name_in || ' where ' || key_column_in || ' = :1';

execute immediate v_sql into v_return_rev using key_value_in;

--return v_return_rev;

v_sql := 'select ' || value_column_in || ' from ' || table_name_in || ' where ' || key_column_in || ' = :1';

open l_cursor for v_sql using key_value_in;
loop
fetch l_cursor into v_col_rev;
exit when l_cursor%notfound;

if (is_number(v_col_rev)) then
v_tmp_rev := to_number(v_col_rev);
end if;
if (v_tmp_rev > v_max_rev) then
v_max_rev := v_tmp_rev;
v_return_rev := v_col_rev;
end if;
end loop;

return v_return_rev;
end;
/

( Reviewing this function just now, I see what could be considered a programming error.
Let me know if you spot it. )

Here's a test case to prove that the function works as expected.

-- maxrev_test.sql
-- should always return numeric if it exists, otherwise alpha

drop table maxrev_test;

create table maxrev_test ( id varchar2(4), rev varchar2(2));

insert into maxrev_test values('T1', 'A');
insert into maxrev_test values('T1', 'B');
insert into maxrev_test values('T1', '01');
insert into maxrev_test values('T1', '02');
insert into maxrev_test values('T2', '01');
insert into maxrev_test values('T2', '02');
insert into maxrev_test values('T2', '03');
insert into maxrev_test values('T3', 'X');
insert into maxrev_test values('T3', 'Y');
insert into maxrev_test values('T3', 'Z');

commit;

select * from maxrev_test order by id,rev;

col rev format a10

prompt
prompt Incorrect results
prompt

select id, max(rev) rev
from maxrev_test
group by id
order by id
/

prompt
prompt Correct results
prompt

select id, maxrev('maxrev_test','id','rev',id) rev
from maxrev_test
group by id
order by id
/



And the results:

Incorrect results

ID REV
---- ----------
T1 B
T2 03
T3 Z

3 rows selected.

Correct results

ID REV
---- ----------
T1 02
T2 03
T3 Z

3 rows selected.