This thread on Oracle-L is regarding lost writes on a database.
One suggestion was made to use the exp utility to export the database, thereby determining if there are corrupt blocks in the database due to disk failure. I didn't give it much thought at first, but fellow Oak Table member Mark Farnham got me thinking about it.
Using exp to detect corrupt blocks, or rather, the absence of corrupt blocks may work, but then again, it may not. It is entirely possible to do a full table scan on a table successfully, as would happen during an export, even though the blocks on disk have been corrupted.
This can be demonstrated by building a table, ensuring the contents are cached, then destroying the data in the data file, followed by a successful export of the table.
Granted, there are a lot of mitigating factors that could be taken into consideration as to whether or not this would happen in a production database. That's not the point: the point is that it could happen, so exp is not a reliable indicator of the state of the data files on disk.
This test was performed on Oracle 10.2.0.4 EE on RH Linux ES 4. Both are 32 bit.
First create a test tablespace:
create tablespace lost_write datafile '/u01/oradata/dv11/lost_write.dbf' size 1m
extent management local
uniform size 64k
/
Next the table LOST_WRITE is created in the tablespace of the same name. This will be used to test the assertion that a successful export of the table can be done even though the data on disk is corrupt.
create table lost_write
cache
tablespace lost_write
as
select * from dba_objects
where rownum <= 1000
/
begin
dbms_stats.gather_table_stats(user,'LOST_WRITE');
end;
/
select tablespace_name, blocks, bytes
from user_segments
where segment_name = 'LOST_WRITE'
/
TABLESPACE_NAME BLOCKS BYTES
------------------------------ ---------- ----------
LOST_WRITE 16 131072
1 row selected.
Next, do a full table scan and verify that the blocks are cached:
select * from lost_write;
Verify in cache:
select file#,block#,class#, status
from v$bh where ts# = (select ts# from sys.ts$ where name = 'LOST_WRITE')
order by block#
/
FILE# BLOCK# CLASS# STATUS
---------- ---------- ---------- -------
40 2 13 xcur
40 3 12 xcur
40 9 8 xcur
40 10 9 xcur
40 11 4 xcur
40 12 1 xcur
40 13 1 xcur
40 14 1 xcur
40 15 1 xcur
40 16 1 xcur
40 17 1 xcur
40 18 1 xcur
40 19 1 xcur
40 20 1 xcur
40 21 1 xcur
40 22 1 xcur
40 23 1 xcur
Now swap the bytes in the file, skipping the first 2 oracle blocks
Caveat: I don't know if that was the correct # of blocks, and I didn't spend any time trying to find out
Also, I belatedly saw that count probably should have been 22 rather than 16, but the results still served the purpose of corrupting the datafile, as we shall see in a bit.
What this dd command is doing is using the same file for both input and output, and rewriting blocks 3-18, swapping each pair of bytes.
dd if=/u01/oradata/dv11/lost_write.dbf of=/u01/oradata/dv11/lost_write.dbf bs=8129 skip=2 count=16 conv=swab,notrunc
The effect is demonstrated by this simple test:
jkstill-19 > echo hello | dd
hello
0+1 records in
0+1 records out
[ /home/jkstill ]
jkstill-19 > echo hello | dd conv=swab
ehll
o0+1 records in
0+1 records out
Now we can attempt the export:
exp tables=\(jkstill.lost_write\) ...
Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, Data Mining and Real Application Testing options
Export done in WE8MSWIN1252 character set and AL16UTF16 NCHAR character set
About to export specified tables via Conventional Path ...
. . exporting table LOST_WRITE 1000 rows exported
Export terminated successfully without warnings.
19 > echo hello | dd
hello
0+1 records in
0+1 records out
[ /home/jkstill ]
jkstill-19 > echo hello | dd conv=swab
ehll
o0+1 records in
0+1 records out
So, even though the data on disk has been corrupted, the export succeeded. That is due to the table being created with the CACHE option, and all the blocks being cached at the time of export. It may not be necessary to use the CACHE option, but I used it to ensure the test would succeed.
Now let's see what happens when trying to scan the table again. First the NOCACHE option will be set on the table, then a checkpoint.
10:42:45 dv11 SQL> alter table lost_write nocache;
10:43:02 dv11 SQL> alter system checkpoint;
Now try to scan the table again:
10:43:14 ordevdb01.radisys.com - js001292@dv11 SQL> /
select * from lost_write
*
ERROR at line 1:
ORA-00376: file 40 cannot be read at this time
ORA-01110: data file 40: '/u01/oradata/dv11/lost_write.dbf'
A corollary conclusion can drawn from this example.
If you do discover bad data blocks, you just might be able to do an export of the tables that are in the affected region before doing any recovery. This might be a belt and suspenders approach, but DBA's are not generally being known for taking unnecessary chances when possible data loss is on the line.
3 comments:
Hi Jared, you raise a valid point, but assuming that RMAN can perform a valid backup (i.e. verifying that data blocks can be physically read) and then you attempt an export (i.e. verifying that the data can be logically read) I think you're in pretty good shape. dbverfiy could also be used, but it doesn't have the best reputation. I think what you really want is a direct read of all of your data blocks without the worry that corruption will occur during your check. Export can perform direct reads, but there have been problems with it.
Hi again, I ran your test on 11.1.0.7 and interestingly enough an export direct exported 593 rows, in my case, and reported no errors for the missing rows. An RMAN backup could not read the file header. The rest of your tests behaved the same. So, it looks like export fails all the way around. I'm wondering if the dd had only touched one block in the middle of the file if the results would have been the same. Unfortunately I don't have the time now to try it out.
Thanks for running more tests Andy. There were several others I could have run as well, but ran out of time also.
For that test, I was just trying to prove that the results found were indeed possible.
Post a Comment