When faced with error messages like ORA-1578 Oracle database users just got the task to handle a database block corruption. I’ve already written about this topic before and today I’d like to share some tips and tricks to get this task done quicker.
Hint #1 – Be prepared
I cannot restate this more often – but the in and out of DB administration is to actually know what you’re doing before you do it.
Have a plan, think it through and test it in a non-production environment.
Once you are sure it works – try to break it (ask any untrained co-worker, usually this helps much…).
Once your procedures are “fool-proof”: employ it on the QA-system and try them out there.
Finally, implement your procedures in the production environment and check if they work there too.
As soon as you get a new system with some new functions – try out if your procedures still hold.
Hint #2 – RMAN verification
The very first step in getting corruptions out of the system is the complete database verification.
As all available standard tools to check an Oracle database (analyze, dbverify/rman, full export) check for slightly different errors it’s highly advisable to perform all of them. Anyhow, except usually dbverify/rman the checks take quite some time and can even lead to problems in the production system.
Therefore I suggest starting with the least disturbing check: verification via RMAN.
This feature is available now for some years and had been integrated into the brtools last year.
The big advantage of using RMAN over using DBV is that RMAN uses the same I/O access (DirectIO, AsyncIO etc.) as the Oracle Server processes do and that the output comes in a much more convenient way – via an Oracle view.
The verification via RMAN can either be started via the menus of BRTOOLS (I highly recommend that!) or via command line. The command checks the tablespace EXAMPLE with RMAN via brbackup it could look like this:
brbackup -u / -t online -w only_rmv -m example
Let’s compare the output of DBV and RMAN for a data file that has corruptions in it:
DBV via BRTOOLS:
DBVERIFY: Release 10.2.0.2.0 - Production on Mon Apr 14 15:39:24 2008
Copyright (c) 1982, 2005, Oracle. All rights reserved.
DBVERIFY - Verification starting : FILE = C:\\ORACLE\\TDB\\EXAMPLE.DATA1
DBV-00201: Block, DBA 16777741, marked corrupt for invalid redo application
DBV-00201: Block, DBA 16777742, marked corrupt for invalid redo application
DBV-00201: Block, DBA 16777743, marked corrupt for invalid redo application
[... omitting 50 lines ...]
DBV-00201: Block, DBA 16777996, marked corrupt for invalid redo application
DBV-00201: Block, DBA 16777997, marked corrupt for invalid redo application
DBV-00201: Block, DBA 16777998, marked corrupt for invalid redo application
DBV-00201: Block, DBA 16777999, marked corrupt for invalid redo application
DBV-00201: Block, DBA 16778000, marked corrupt for invalid redo application
DBV-00201: Block, DBA 16778001, marked corrupt for invalid redo application
DBVERIFY - Verification complete
Total Pages Examined : 6400 Total Pages Processed (Data) : 415 Total Pages Failing (Data) : 0 Total Pages Processed (Index): 49 Total Pages Failing (Index): 0 Total Pages Processed (Other): 264 Total Pages Processed (Seg) : 0 Total Pages Failing (Seg) : 0 Total Pages Empty : 5672 Total Pages Marked Corrupt : 123 Total Pages Influx : 0 Highest block SCN : 20622347 (0.20622347)
RMAN via BRTOOLS:
BR0568I Verifying C:\\ORACLE\\TDB\\EXAMPLE.DATA1 using RMAN...
BR0398E RMAN detected corrupted blocks in C:\\ORACLE\\TDB\\EXAMPLE.DATA1
BR0548I Please check Oracle alert log C:\\ORACLE\\TDB\\saptrace\\background\\alert_TDB.log for further information about this error
Let’s ignore the Alert.log for now but rather check the database view V$COPY_CORRUPTION:
SQL> set linesize 130 SQL> select * from v$backup_corruption;
RECID STAMP SET_STAMP SET_COUNT PIECE# FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# MAR CORRUPTIO ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ------------------ --- --------- 1 652022089 652022088 42 1 4 525 7 20596264 NO LOGICAL 2 652027964 652027963 53 1 4 525 7 20619321 NO LOGICAL 3 652030881 652030880 54 1 4 525 4 20619321 NO LOGICAL 4 652030881 652030880 54 1 4 549 4 20620431 NO LOGICAL 5 652030881 652030880 54 1 4 589 3 20622009 NO LOGICAL 6 652030881 652030880 54 1 4 597 3 20622013 NO LOGICAL 7 652030881 652030880 54 1 4 605 3 20622017 NO LOGICAL 8 652030881 652030880 54 1 4 613 3 20622021 NO LOGICAL 9 652030881 652030880 54 1 4 621 3 20622025 NO LOGICAL 10 652030881 652030880 54 1 4 637 12 20622285 NO LOGICAL 11 652030881 652030880 54 1 4 650 8 20622289 NO LOGICAL 12 652030881 652030880 54 1 4 669 12 20622293 NO LOGICAL 13 652030881 652030880 54 1 4 682 8 20622297 NO LOGICAL 14 652030881 652030880 54 1 4 701 12 20622309 NO LOGICAL 15 652030881 652030880 54 1 4 714 8 20622311 NO LOGICAL 16 652030881 652030880 54 1 4 733 12 20622317 NO LOGICAL 17 652030881 652030880 54 1 4 746 8 20622321 NO LOGICAL 18 652030881 652030880 54 1 4 765 12 20622331 NO LOGICAL 19 652030881 652030880 54 1 4 778 8 20622333 NO LOGICAL
So, instead of having one error message line per corrupt block with a reference to a dba (database block address, used Oracle internal) or rdba (relative database block address) we now have one single line for all corrupt blocks that are ‘neighbours’ to each other.
We also see what kind of corruption was found – in this case, ‘LOGICAL’ which means, that the data has been marked corrupt for the sake of data consistency. In this example, it was due to a recovery of NOLOGGING blocks.
Furthermore, we see at which SCN the block was marked corrupt, which may help to find out up to which point-in-time changes could have happened to that block:
SQL> select scn_to_timestamp (corruption_change#) from v$copy_corruption where recid=1;
SCN_TO_TIMESTAMP(CORRUPTION_CHANGE#) --------------------------------------------------------------------------- 14-APR-08 12.52.05.000000000 PM
Hint #3 – Materialize DBA_EXTENTS
Once we know all corrupted blocks it is necessary to figure out, which objects are stored in them.
Usually, this is done with a query against the DBA_EXTENTS view. This is fine as long as you’ve only a small number of corrupt blocks.
But as soon as you’re faced with several hundred blocks running the queries one by one, this can be very time wasting.
The time is usually spent a) in setting up the statements from the DBV output and b) waiting for the query to finish as the DBA_EXTENTS view is known to be not the fastest one on earth.
We will try to speed up both a) and b).
Point b) is a bit easier so let’s start with that. The DBA_EXTENTS view is notoriously slow for different reasons.
Since we don’t need the most current version of the view for each select we’re going to run, we can materialize the table and index it:
SQL> create table TMP_EXTENTS tablespace SAPDATA as 2 (select owner, segment_name, partition_name, segment_type, file_id, block_id, blocks 3 from dba_extents where rownum<1);
Table created. Elapsed: 00:00:00.39
SQL> insert /*+append*/ into tmp_extents 2 (select owner, segment_name, partition_name, segment_type, file_id, block_id, blocks 3 from dba_extents);
4695 rows created.
SQL> create index i_tmp_ext on tmp_extents ( file_id, block_id, blocks) tablespace sapdata; Index created.
Elapsed: 00:00:00.34
SQL> analyze table tmp_extents compute statistics; Table analyzed.
Elapsed: 00:00:00.53
As you see this takes some time, but it’s still quicker than running hundreds of queries against DBA_EXTENTS. Let’s check that…
Original Query (from note #365481):
SQL> select owner, segment_name, partition_name, segment_type, block_id, blocks 2 from dba_extents 3 where (525 between 4 block_id and (block_id + blocks - 1)) 5 and file_id = 4 6 and rownum < 2;
no rows selected Elapsed: 00:00:09.99
Query against the TMP_EXTENTS:
SQL> select owner, segment_name, partition_name, segment_type, block_id, blocks 2 from tmp_extents 3 where (525 between 4 block_id and (block_id + blocks - 1)) 5 and file_id = 4 6 and rownum < 2;
no rows selected Elapsed: 00:00:00.02
In this case, no segment was found and the block is not a tablespace header block, therefore the block in question is free space. Freespace corruptions is usually nothing to worry about since the blocks are reformatted the next time they’re used.
ATTENTION: Be aware that the TMP_EXTENTS won’t get updated … so make sure you rebuild it before using it. And afterwards: don’t forget to drop it again!
To make point a) quicker see the next hint …
Hint #4 – Use SQL to get a list of corrupt objects
After we have materialized the DBA_EXTENTS and used RMAN to fill the V$BACKUP_CORRUPTION it would not be too clever to start running the statement (s.a.) for each block one for one.
If you use DBV to verify your database the following won’t work!
Instead, we can have all the blocks looked up at once.
Just join both V$BACKUP_CORRUPTION and TMP_EXTENTS like this:
SQL> set linesize 130 SQL> set pagesize 100 SQL> col owner for a8 SQL> col segment_name for a20 SQL> col partition_name for a20
SQL> select cc.recid, te.owner, te.segment_name, te.partition_name, te.segment_type, te.block_id, te.blocks 2 from tmp_extents te, 3 v$backup_corruption cc 4 where (cc.block# between te.block_id and (te.block_id + te.blocks - 1) ) 5* and cc.file# = te.file_id
RECID OWNER SEGMENT_NAME PARTITION_NAME SEGMENT_TYPE BLOCK_ID BLOCKS ---------- -------- -------------------- -------------------- ------------------ ---------- ---------- 3 SAPR3 F_DATA_BM_TIME P6 INDEX PARTITION 521 8 2 SAPR3 F_DATA_BM_TIME P6 INDEX PARTITION 521 8 1 SAPR3 F_DATA_BM_TIME P6 INDEX PARTITION 521 8 4 SAPR3 F_DATA_BM_CW P3 INDEX PARTITION 545 8 5 SAPR3 F_DATA_BM_PACKET P2 INDEX PARTITION 585 8 6 SAPR3 F_DATA_BM_PACKET P3 INDEX PARTITION 593 8 7 SAPR3 F_DATA_BM_PACKET P4 INDEX PARTITION 601 8 8 SAPR3 F_DATA_BM_PACKET P5 INDEX PARTITION 609 8 9 SAPR3 F_DATA_BM_PACKET P6 INDEX PARTITION 617 8 10 SAPR3 F_DATA_INDEX1 P2 INDEX PARTITION 633 8 11 SAPR3 F_DATA_INDEX1 P2 INDEX PARTITION 649 8 12 SAPR3 F_DATA_INDEX1 P3 INDEX PARTITION 665 8 13 SAPR3 F_DATA_INDEX1 P3 INDEX PARTITION 681 8 14 SAPR3 F_DATA_INDEX1 P4 INDEX PARTITION 697 8 15 SAPR3 F_DATA_INDEX1 P4 INDEX PARTITION 713 8 16 SAPR3 F_DATA_INDEX1 P5 INDEX PARTITION 729 8 17 SAPR3 F_DATA_INDEX1 P5 INDEX PARTITION 745 8 18 SAPR3 F_DATA_INDEX1 P6 INDEX PARTITION 761 8 19 SAPR3 F_DATA_INDEX1 P6 INDEX PARTITION 777 8
19 rows selected.
If we just want to see one line per segment – no problem:
SQL> select distinct te.owner, te.segment_name, te.partition_name, te.segment_type 2 from tmp_extents te, 3 v$backup_corruption cc 4 where (cc.block# between te.block_id and (te.block_id + te.blocks - 1) ) 5 and cc.file# = te.file_id 6 order by te.owner, te.segment_name, te.partition_name, te.segment_type;
OWNER SEGMENT_NAME PARTITION_NAME SEGMENT_TYPE -------- -------------------- -------------------- ------------------ SAPR3 F_DATA_BM_CW P3 INDEX PARTITION SAPR3 F_DATA_BM_PACKET P2 INDEX PARTITION SAPR3 F_DATA_BM_PACKET P3 INDEX PARTITION SAPR3 F_DATA_BM_PACKET P4 INDEX PARTITION SAPR3 F_DATA_BM_PACKET P5 INDEX PARTITION SAPR3 F_DATA_BM_PACKET P6 INDEX PARTITION SAPR3 F_DATA_BM_TIME P6 INDEX PARTITION SAPR3 F_DATA_INDEX1 P2 INDEX PARTITION SAPR3 F_DATA_INDEX1 P3 INDEX PARTITION SAPR3 F_DATA_INDEX1 P4 INDEX PARTITION SAPR3 F_DATA_INDEX1 P5 INDEX PARTITION SAPR3 F_DATA_INDEX1 P6 INDEX PARTITION
12 rows selected.
That’s a lot more convenient – isn’t it?
Hint #5 – Rebuild NOLOGGING Indexes half-automatically
Many BW users encounter corruption errors like DBV-200 or DBV-201 after they recover index-tablespaces. The reason for that is the fact that the NOLOGGING feature is usually used by BI to speed up things a bit. The downside is of course that the indexes need to be recreated if they had been build after the last data backup was taken.
There are several ways to accomplish that, but the most simple one is: use the BRTools to restore and recover your database.´
In the current versions, the BRTools are aware of the NOLOGGING feature and what it implicates. So BRRECOVER tries to figure out which BW-indexes need to be rebuilt and starts just that.
Another way is to create the rebuild statements for those Indexes that have been created after the last data backup and that are either NOLOGGING and/or have local partitions that are flagged NOLOGGING.
That could be done with a statement similar to this:
!! ATTENTION: STATEMENT CORRECTED !! (17.06.2008)
select distinct 'alter index "'|| o.owner || '"."' || o.object_name || '" rebuild ' || decode(i.partition_name, NULL, '', ' PARTITION "'|| i.partition_name ||'"' ) ||' online nologging;' as rebuild_list from (select owner, object_name, subobject_name from dba_objects where (( object_type = 'INDEX PARTITION' and subobject_name is not null) or (object_type ='INDEX' and subobject_name is null)) and created > to_date('2008-04-14-11:58:00', 'YYYY-MM-DD-HH24:MI:SS') ) o, (select id.owner, id.index_name, ip.partition_name from dba_indexes id, dba_ind_partitions ip where 'NO' in (ip.logging, id.logging) and id.owner = ip.index_owner (+) and id.index_name = ip.index_name (+)) i where o.owner = i.owner and o.object_name = i.index_name and ( o.subobject_name = i.partition_name or o.subobject_name is null);
REBUILD_LIST ---------------------------------------------------------------------------------- alter index "SAPR3"."F_DATA_BM_TIME" rebuild PARTITION "P5" online nologging; alter index "SAPR3"."F_DATA_BM_CW" rebuild PARTITION "P5" online nologging; alter index "SAPR3"."F_DATA_BM_CW" rebuild PARTITION "P6" online nologging; alter index "SAPR3"."F_DATA_BM_PACKET" rebuild PARTITION "P4" online nologging; alter index "SAPR3"."F_DATA_INDEX1" rebuild PARTITION "P2" online nologging; alter index "SAPR3"."F_DATA_INDEX1" rebuild PARTITION "P3" online nologging; alter index "SAPR3"."F_DATA_BM_PACKET" rebuild PARTITION "P1" online nologging; alter index "SAPR3"."F_DATA_INDEX1" rebuild PARTITION "P6" online nologging; alter index "SAPR3"."F_DATA_INDEX1" rebuild PARTITION "P1" online nologging; alter index "SAPR3"."F_DATA_BM_CW" rebuild PARTITION "P2" online nologging; alter index "SAPR3"."F_DATA_BM_CW" rebuild PARTITION "P1" online nologging; alter index "SAPR3"."F_DATA_BM_PACKET" rebuild PARTITION "P6" online nologging; alter index "SAPR3"."F_DATA_BM_TIME" rebuild PARTITION "P6" online nologging; alter index "SAPR3"."F_DATA_BM_PACKET" rebuild PARTITION "P2" online nologging; alter index "SAPR3"."F_DATA_BM_PACKET" rebuild PARTITION "P3" online nologging; alter index "SAPR3"."F_DATA_BM_TIME" rebuild PARTITION "P2" online nologging; alter index "SAPR3"."F_DATA_BM_TIME" rebuild PARTITION "P4" online nologging; alter index "SAPR3"."F_DATA_BM_CW" rebuild PARTITION "P4" online nologging; alter index "SAPR3"."F_DATA_BM_CW" rebuild PARTITION "P3" online nologging; alter index "SAPR3"."F_DATA_INDEX1" rebuild PARTITION "P4" online nologging; alter index "SAPR3"."F_DATA_INDEX1" rebuild PARTITION "P5" online nologging; alter index "SAPR3"."F_DATA_BM_TIME" rebuild PARTITION "P3" online nologging; alter index "SAPR3"."F_DATA_BM_TIME" rebuild PARTITION "P1" online nologging; alter index "SAPR3"."F_DATA_BM_PACKET" rebuild PARTITION "P5" online nologging;
24 rows selected.
Note: ‘2008-04-14-11:58:00’ is the timestamp of the last data backup taken.
You can just copy and paste the commands or have them written to a spool file and start this as a script.
Hint #6 – get prepared (see hint #1)
I’ve shown you a few things that worked out for me when solving customer messages.
So go ahead and try out, if they work for you as well.
Try to create some corruptions on your test database e.g. restore the backup of the BW instance from two weeks ago. Try to recover it, take timings (set timing on works fine in sqlplus!), try to make it work.
DISCLAIMER:
The things I showed you here are for educational purposes only. Should you encounter corruptions in your production databases please do open a support message.´
The supported procedures to handle corruptions are described in these notes:
#540463 – FAQ: Consistency Checks + Block Corruptions
#365481 – Block corruptions
#023345 – Consistency check of ORACLE database
Once you want to do something different (like the techniques described above), please do check with SAP support before implementing it.
Best regards and happy corruption handling,
Lars