This thread on Oracle-L is regarding lost writes on a database.
One suggestion was made to use the exp utility to export the database, thereby determining if there are corrupt blocks in the database due to disk failure. I didn't give it much thought at first, but fellow Oak Table member Mark Farnham got me thinking about it.
Using exp to detect corrupt blocks, or rather, the absence of corrupt blocks may work, but then again, it may not. It is entirely possible to do a full table scan on a table successfully, as would happen during an export, even though the blocks on disk have been corrupted.
This can be demonstrated by building a table, ensuring the contents are cached, then destroying the data in the data file, followed by a successful export of the table.
Granted, there are a lot of mitigating factors that could be taken into consideration as to whether or not this would happen in a production database. That's not the point: the point is that it could happen, so exp is not a reliable indicator of the state of the data files on disk.
This test was performed on Oracle 10.2.0.4 EE on RH Linux ES 4. Both are 32 bit.
First create a test tablespace:
create tablespace lost_write datafile '/u01/oradata/dv11/lost_write.dbf' size 1m extent management local uniform size 64k /
Next the table LOST_WRITE is created in the tablespace of the same name. This will be used to test the assertion that a successful export of the table can be done even though the data on disk is corrupt.
create table lost_write cache tablespace lost_write as select * from dba_objects where rownum <= 1000 / begin dbms_stats.gather_table_stats(user,'LOST_WRITE'); end; / select tablespace_name, blocks, bytes from user_segments where segment_name = 'LOST_WRITE' / TABLESPACE_NAME BLOCKS BYTES ------------------------------ ---------- ---------- LOST_WRITE 16 131072 1 row selected.
Next, do a full table scan and verify that the blocks are cached:
select * from lost_write; Verify in cache: select file#,block#,class#, status from v$bh where ts# = (select ts# from sys.ts$ where name = 'LOST_WRITE') order by block# / FILE# BLOCK# CLASS# STATUS ---------- ---------- ---------- ------- 40 2 13 xcur 40 3 12 xcur 40 9 8 xcur 40 10 9 xcur 40 11 4 xcur 40 12 1 xcur 40 13 1 xcur 40 14 1 xcur 40 15 1 xcur 40 16 1 xcur 40 17 1 xcur 40 18 1 xcur 40 19 1 xcur 40 20 1 xcur 40 21 1 xcur 40 22 1 xcur 40 23 1 xcur
Now swap the bytes in the file, skipping the first 2 oracle blocks
Caveat: I don't know if that was the correct # of blocks, and I didn't spend any time trying to find out
Also, I belatedly saw that count probably should have been 22 rather than 16, but the results still served the purpose of corrupting the datafile, as we shall see in a bit.
What this dd command is doing is using the same file for both input and output, and rewriting blocks 3-18, swapping each pair of bytes.
dd if=/u01/oradata/dv11/lost_write.dbf of=/u01/oradata/dv11/lost_write.dbf bs=8129 skip=2 count=16 conv=swab,notrunc
The effect is demonstrated by this simple test:
jkstill-19 > echo hello | dd hello 0+1 records in 0+1 records out [ /home/jkstill ] jkstill-19 > echo hello | dd conv=swab ehll o0+1 records in 0+1 records out
Now we can attempt the export:
exp tables=\(jkstill.lost_write\) ... Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production With the Partitioning, Oracle Label Security, Data Mining and Real Application Testing options Export done in WE8MSWIN1252 character set and AL16UTF16 NCHAR character set About to export specified tables via Conventional Path ... . . exporting table LOST_WRITE 1000 rows exported Export terminated successfully without warnings. 19 > echo hello | dd hello 0+1 records in 0+1 records out [ /home/jkstill ] jkstill-19 > echo hello | dd conv=swab ehll o0+1 records in 0+1 records out
So, even though the data on disk has been corrupted, the export succeeded. That is due to the table being created with the CACHE option, and all the blocks being cached at the time of export. It may not be necessary to use the CACHE option, but I used it to ensure the test would succeed.
Now let's see what happens when trying to scan the table again. First the NOCACHE option will be set on the table, then a checkpoint.
10:42:45 dv11 SQL> alter table lost_write nocache; 10:43:02 dv11 SQL> alter system checkpoint; Now try to scan the table again: 10:43:14 ordevdb01.radisys.com - js001292@dv11 SQL> / select * from lost_write * ERROR at line 1: ORA-00376: file 40 cannot be read at this time ORA-01110: data file 40: '/u01/oradata/dv11/lost_write.dbf'
A corollary conclusion can drawn from this example.
If you do discover bad data blocks, you just might be able to do an export of the tables that are in the affected region before doing any recovery. This might be a belt and suspenders approach, but DBA's are not generally being known for taking unnecessary chances when possible data loss is on the line.