Wednesday, 11 May 2011

How to find Master Node in Oracle RAC

 

I have seen many users asking how to find “Master node” in Oracle RAC, let me clear some of their doubts.

There are two types of Masters in Oracle RAC, one is Mater node at Oracle Clusterware level and other is Master node for specific resource or block or object.

The node which gets the active state during startup is authorized to be a master node by Cluster Synchronization Service.

Run the below command to find which node is master at Clusterware level
$cat $ORA_CRS_HOME/log/`hostname`/cssd/ocssd* |grep master
or
$ for x in `ls -tr $ORA_CRS_HOME/log/`hostname`/cssd/ocssd* `; do grep -i "master node" $x ; done | tail -1

The OCR Automatic backups are taken only by master node. If the Master fails, the OCR backups will be created on the new Master. The Master node which has OCR backups goes down due to failure then we cannot be recover the OCR that’s why Oracle recommends taking backups using “ocrconfig” and also integrating OCR backups with backup strategy.

Run the below command to find which node is OCR Master and taking automatic backups.
$ocrconfig –showbackup
testrac02 2010/08/30 16:29:52 /oracle/crs/cdata/crs
testrac02 2010/08/30 16:29:52 /oracle/crs/cdata/crs
testrac02 2010/08/30 12:29:49 /oracle/crs/cdata/crs
testrac02 2010/08/30 08:29:46 /oracle/crs/cdata/crs
testrac02 2010/08/29 00:29:23 /oracle/crs/cdata/crs

The block level masters are used by Cache fusion while transferring the block. Any node can become the master node of a particular block and you can also see which node acting as master in V$GES_RESOURCE table (MASTER_NODE column)

You can manually remaster an object with oradebug command:
SQL> oradebug lkdebug -m pkey "object_id"

Monday, 9 May 2011

Why is excessive redo generated during an Online/Hot Backup ......

There is not excessive redo generated, there is additional information logged into
the online redo log during a hot backup the first time a block is modified in a
tablespace that is in hot backup mode. 

in hot backup mode only 2 things are different:

o the first time a block is changed in a datafile that is in hot backup mode, the ENTIRE
BLOCK is written to the redo log files, not just the changed bytes.  Normally only the
changed bytes (a redo vector) is written. In hot backup mode, the entire block is logged
the FIRST TIME.  This is because you can get into a situation where the process copying
the datafile and DBWR are working on the same block simultaneously.  Lets say they are
and the OS blocking read factor is 512bytes (the OS reads 512 bytes from disk at a time).
The backup program goes to read an 8k Oracle block.  The OS gives it 4k.  Meanwhile --
DBWR has asked to rewrite this block.  the OS schedules the DBWR write to occur right
now.  The entire 8k block is rewritten.  The backup program starts running again
(multi-tasking OS here) and reads the last 4k of the block.  The backup program has now
gotten an impossible block -- the head and tail are from two points in time.  We cannot
deal with that during recovery.  Hence, we log the entire block image so that during
recovery, this block is totally rewritten from redo and is consistent with itself at
least.  We can recover it from there.

o the datafile headers which contain the SCN of the last completed checkpoint are NOT
updated while a file is in hot backup mode.  This lets the recovery process understand
what archive redo log files might be needed to fully recover this file.

To limit the effect of this additional logging, you should ensure you only place one
tablepspace at a time in backup mode and bring the tablespace out of backup mode as soon
as you have backed it up.  This will reduce the number of blocks that may have to be
logged to the minimum possible.
---by tom

Thursday, 5 May 2011

How much Recovery is enough Recovery

Long ago in one of the non-technical group discussions we had to decide ‘How much money is enough money?’ Obviously no answer is enough answer. Fortunately in Oracle, we know how much recovery is just sufficient to OPEN the DB.

During the recovery of the DB, when the archives are being applied one after other and another, is it possible to definitively confirm that the most recently applied archive was the last one required to be applied and it is safe thereafter to open the DB in RESETLOGS?
There should be several ways of identifying where to stop, and here are three that I can think of:
  1. Alert log – If the backup was hot, finding out the last occurrence of END BACKUP and its corresponding SEQ# should tell us where to stop. Usually we stop after applying the above identified SEQ# and the DB should then be good to OPEN RESETLOGS.
  2. RMAN log – If the backup was taken using RMAN, its log also shows the SEQ# where the backup ends. We can stop applying archives once we reach the max SEQ#
  3. Query the Data Dictionary – Query the FHSTA column in X$KCVFH table during recovery to know if the recovery was sufficient or does it need more archives.
As I said, there must be other ways too.
I prefer option (3) because the information is right there in the DB to be queried for, anytime.

Consider a typical scenario where a user, say Lucy, is recovering the DB from the hot backup of the source database. She has already applied 30 archives and swears that she hasn’t seen another hungrier database. She has no access to the alert log of the source database to check how many more archives are to be applied. After each archive log has been applied, she CANCELs the recovery and gets the below error:
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
CANCEL
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: ‘/u11/data/PRD/PRD_SYSTEM_01.DBF’
The better way to confirm if the recovery was sufficient or not is to query the data dictionary. The SQL is:
SELECT hxfil FILENUMBER, fhsta STATUS, fhscn SCN, fhrba_seq SEQUENCE FROM x$kcvfh;
SQL> select hxfil FILENUMBER, fhsta STATUS, fhscn SCN, fhrba_seq SEQUENCE from x$kcvfh


The output will look like above.
The STATUS (or fhsta) column has the value of 1 which means the datafile with FILENUMBER needs more recovery. This is the time to apply more archives.
The STATUS is 0 when the recovery is sufficient and it is safe to OPEN the database.
The fhsta can have one of the below values:
0     – DB is consistent. No more recovery required.
1     – DB needs more recovery. It’s time to apply more archives.
4     – DB is in a FUZZY state. Was the backup good?
8192  -
8196  -
I have come across 8192 and 8196 statuses too, but I don’t know their significance yet.
When the archives have been sufficiently applied, the fhsta column will be zero and looks like below:

At this stage, it is possible to cancel the recovery and bring up the DB in RESETLOGS.

Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
CANCEL
Media recovery cancelled.
SQL>
SQL> alter database open resetlogs;
Database altered.
The table X$KCVFH is a fixed table in Oracle. It stores the file headers along with their statuses. The name is derived from:
K     – Kernel layer
C     – Cache layer
V     – RecoVery component
FH    - File Header
Hope this post was useful.