ext3 – beware of periodic file system checks

Users running Oracle on Red Hat or Oracle Enterprise Linux will most probaly use ext3 as default file system. It is robust, well maintained, a journaling file system and supports file system sizes up to 32 TB.

Database servers must be “high available” – so they are booted infrequently if ever.

Even if ext3 is a journaling file system there are periodic full file system checks which will check the entire file system just like ext2 did after a crash. On todays large file systems this can take hours. Thats especially bad if you just “quickly” rebooted the server after applying a patch and now facing a long running file system check.

Many users complained about this behavior and said ext3 is a journaling file system and should be running without periodic checks. On the other hand this is a security feature. Even with journaling corruptions may be introduced and remain undetected for a long time.

So lets take a look at the defaults chosen on Oracle Enterprise Linux 5 Update 3 when creating a ext3 file system:

[root@rac1 ~]# mkfs.ext3 /dev/asm/testvol1-132
mke2fs 1.39 (29-May-2006)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
131072 inodes, 262144 blocks
13107 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=268435456
8 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks:
 32768, 98304, 163840, 229376

Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 22 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

The last two lines are the ones of interest: “The file system will be checked every 22 mounts or 180 days, whichever comes first”. On database serverss 22 mounts seems a long time but 180 days are quite short. So administrators should change these values accordingly. Turning it off completely is a loss of data protection but still possible.

You can check the mount counts, maximum mount counts and check intervals with the following command:

tune2fs -l /dev/asm/testvol1-132
[..]
Mount count:              0
Maximum mount count:      22
[..]
Last checked:             Thu Oct  1 10:44:53 2009
Check interval:           15552000 (6 months)
Next check after:         Tue Mar 30 10:44:53 2010

You are free to change these values with tune2fs:

[root@rac1 ~]# tune2fs -c 100 /dev/asm/testvol1-132
tune2fs 1.39 (29-May-2006)
Setting maximal mount count to 100

[root@rac1 ~]# tune2fs -i 360 /dev/asm/testvol1-132
tune2fs 1.39 (29-May-2006)
Setting interval between checks to 31104000 seconds

…verifying:

tune2fs -l /dev/asm/testvol1-132
Filesystem created:       Thu Oct  1 10:44:53 2009
Last mount time:          n/a
Last write time:          Thu Oct  1 11:02:09 2009
Mount count:              0
Maximum mount count:      100
Last checked:             Thu Oct  1 10:44:53 2009
Check interval:           31104000 (12 months)
Next check after:         Sun Sep 26 10:44:53 2010

So based on this information:

To which values should max mount count and check interval be set?

The actual values depend completely on your requirements. But they should be set completely different for different volumes. This will prevent a full check on all your volumes at once.

Storing Oracle database connection strings in OpenLDAP instead of OID

Storing Oracle database connection strings in OpenLDAP – or – “Oracle Names for the masses”

Typically Oracle stores it´s database connection strings in a file called TNSNAMES.ORA. Among storing every database connection string in this file there are other different naming schemas such as EZCONNECT, Oracle Names “ONAMES” which is deprecated with >= 10g or LDAP.

I´ve seen a lot of users with a centralized tnsnames.ora file placed on a cifs share and access by every oracle client. Using such a technique with unix hosts is not straight forward (i am no friend of having cifs shares mounted on database servers) and often due to security restrictions not possible. Sometimes i have also seen the tnsnames distributed by using rsync or just copy-on-demand.

From my point of view there is a superior solution for this problem: Oracle Names.

Continue reading Storing Oracle database connection strings in OpenLDAP instead of OID

Oracle 11g Release 2 – SSH equivalence gone after reboot

Today i came across a strange phenomenon:

I set up ssh equivalence during database installation but after rebooting the system some tool complained about misconfigured ssh equivalence. So i digged a little bit. I found out SSH is complaining about wrong directory permissions:

Sep 28 13:57:03 rac1 sshd[31620]: Authentication refused: bad ownership or modes for directory /u01/app/oracle/product/11.2.0/ora11p

The directory mentioned here is an ACFS shared oracle home and the home directory for the oracle user. After changing the mode for the directory to 755 ssh equivalence was restored. But after stopping and starting acfs resource the equivalence was gone again.

So i worked a little bit on that:

Lets first check directory permission with ACFS resource not started and not mounted:

[root@rac1 ~]# ll /u01/app/oracle/product/11.2.0/
total 4
drwxr-xr-x 2 ora11p dba 4096 Sep 13 00:02 ora11p
[root@rac1 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda3              26G   13G   12G  52% /
/dev/sda1              99M   12M   83M  13% /boot
tmpfs                 1.9G  164M  1.7G   9% /dev/shm

Directory Mode is 775.

So now lets mount ACFS volume manually and check the permission again:

[root@rac1 ~]# mount /dev/asm/ora11p_home-132 /u01/app/oracle/product/11.2.0/ora11p/ -t acfs
[root@rac1 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda3              26G   13G   12G  52% /
/dev/sda1              99M   12M   83M  13% /boot
tmpfs                 1.9G  164M  1.7G   9% /dev/shm
/dev/asm/ora11p_home-132
 10G  4.7G  5.4G  47% /u01/app/oracle/product/11.2.0/ora11p
[root@rac1 ~]# ll /u01/app/oracle/product/11.2.0/
total 16
drwxrwx--- 83 ora11p dba 12288 Sep 25 10:37 ora11p

775 – still no surprise. So we change them to 775 as required by SSH:

[root@rac1 ~]# chmod 755 /u01/app/oracle/product/11.2.0/ora11p
[root@rac1 ~]# ll /u01/app/oracle/product/11.2.0/
total 16
drwxr-xr-x 83 ora11p dba 12288 Sep 25 10:37 ora11p

The directory show required permissions. We will now unmount and mount the file system again:

[root@rac1 ~]# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda3             26306612  12828996  12119724  52% /
/dev/sda1               101086     11683     84184  13% /boot
tmpfs                  1933304    167016   1766288   9% /dev/shm
/dev/asm/ora11p_home-132
 10485760   4864328   5621432  47% /u01/app/oracle/product/11.2.0/ora11p
[root@rac1 ~]# umount /u01/app/oracle/product/11.2.0/ora11p
[root@rac1 ~]#
[root@rac1 ~]# umount /u01/app/oracle/product/11.2.0/ora11p
[root@rac1 ~]# mount /dev/asm/ora11p_home-132 /u01/app/oracle/product/11.2.0/ora11p/ -t acfs           
[root@rac1 ~]# ll /u01/app/oracle/product/11.2.0/
total 16
drwxr-xr-x 83 ora11p dba 12288 Sep 25 10:37 ora11p

There we are: Permission mode 775 is consistent accross mounts. So lets unmount everything and start acfs resource again:

[root@rac1 ~]# umount /u01/app/oracle/product/11.2.0/ora11p
[root@rac1 ~]#
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/crsctl start resource  ora.data2.ora11p_home.acfs
CRS-2672: Attempting to start 'ora.data2.ora11p_home.acfs' on 'rac1'
CRS-2672: Attempting to start 'ora.data2.ora11p_home.acfs' on 'rac2'
CRS-2676: Start of 'ora.data2.ora11p_home.acfs' on 'rac2' succeeded
CRS-2676: Start of 'ora.data2.ora11p_home.acfs' on 'rac1' succeeded
[root@rac1 ~]# ll /u01/app/oracle/product/11.2.0/
total 16
drwxrwx--- 83 ora11p dba 12288 Sep 25 10:37 ora11p

The directory mode 770 was changed to 770…. but by which component and configuration?

Until now i did not find a solution for this behavior yet. I will open a SR and see what Oracle thinks about this “feature”.

In the meantime to get rid of this behavior you can set “StrictModes no” in sshd_config to prevent ssh from checking directory ownerships.

Partition layout on Linux – Recommendations

Partition recommendations for Oracle on Linux

Every system engineer or even every DBA comes along the following question from time to time: “How do i arrange my partitions for using oracle on linux?”

In the following i am blogging my thoughts and recommendations on partition/storage design on the linux operating sytsem for oracle. My opinion might not be solely true and you are free to develop your own standards. But maybe the following lines help you.

Which file system shall i use?

Metalink has several notes on what file systems can/shall be used for oracle on linux. Metalink note 236826.1 states:

------- Begin Quote ---------

Supported File Systems

One of the most interesting features of the Linux OS is its support for multiple and various file systems. File systems can be defined and built on a partition basis. VFAT, ext2, ext3 and Reiser file systems can co-exist on the same Linux system, along with several other file systems and raw partitions (Note 224302.1 – Raw Devices on RedHat Advanced Server – FAQ).

Your choice of which one to use then becomes based on supportability, reliability, security and performance. Oracle generally does not certify file systems and does certify operating systems, but Linux is a specific case. On different Linux distributions, Oracle might choose to have certifications on different filesystems. When a problem is encontered, if a non-supported filesystem is being used for Oracle data, control and redo log files, Oracle will try to reproduce the problem on a supported and certified filesystem. If the problem does not reproduce on a certified filesystem, the specific filesystem / O/S vendor should be engaged for resolution.

The current support includes ext2, ext3 and NFS-based storage systems (e.g. NetApp). To be acceptable for database usage, the file system must support certain minimum functions such as write acknowledgement, write-through cache and file permissions. Counter examples are simple NFS, which lacks acknowledge and VFAT, which lacks file permissions.

Recommended File Systems

There are various file systems available for Linux OS:

– The ext2 and ext3 file systems are robust and fully supported. ext2 was the default file system under the 2.2 kernel. ext3 is simply the enhanced ext2 filesystem having journaling feature. ext3 is the detault filesystem for RHEL3 and 4.

– Oracle Cluster File System (OCFS) is a shared file system designed specifically for Oracle Real Application Cluster (RAC). OCFS eliminates the requirement for Oracle database files to be linked to logical drivers. OCFS volumes can span one shared disk or multiple shared disks for redundancy and performance enhancements (Note 224586.1).OCFS2 is the next generation of the Oracle Cluster File System for Linux. It is an extent based, POSIX compliant file system. Unlike the previous release (OCFS),

– OCFS2 is a general-purpose file system that can be used for shared Oracle home installations making management of Oracle Real Application Cluster (RAC) installations even easier.

In summary, the recommended filesystems are:

– Single node: Any filesystem that is supported by the Linux vendor. Note that any filesystem issues are need to be resolved by the Linux vendor.

– Multi-node (RAC): OCFS, raw, NFS-based storage systems (e.g. NetApp). (See Note 329530.1, Note 423207.1 and Note 279069.1 for GFS support)

------- End Quote ---------

From the authors point of view the following file systems / technologies shall be used for oracle on linux:

  • single node oracle
    • ext3
    • ASM
  • multi-node (RAC) oracle
    • ASM
    • NFS

Todays hard disk sizes always require a journaling file system. Using a non-journalig file system (like ext2) will create problems after a system crash when checking the file system on larger hard disks will take serveral hours. Thats completely avoidable. Using ext4 is not recommended for stability reasons and it´s reliability has yet to be prooven. In addition to that ext4 is currently not supported anyway. XFS is only supported on SuSE Enterprise Server and according to Note 414673.1 it cannot use asynch and direct io simultaneously. In addition to that the same notes states users shall not use XFS due to an undoucumented bug which can lead to redo log corruptions. At the time of writing the future of reiserfs is uncertain since Hans Reiser (developer of the file system) was sent to jail for murder.

So ext3 is currently the only available choice for oracle on linux because it is quite stable and – for use with LVM most important – supports online file system growth. For shrinking the file system still need to be unmounted. Please note you ext3 might force a full file system check every X months or every Y mounts. You can tune this behavior with “tune2fs”.

For clusters i do not recommend OCFS because from my point of view ASM is the future and further advanced piece of technology. With 11g Release 2 Oracle extended ASM to be a complete volume manager and added a cluster file system for ASM. You can read here more about that topic.

LVM – the logical volume manager

What is an logical volume manager?

Using a so called “logical volume manager”  – short: LVM – enables to abstract disk devices from file systems or partitions. A LVM takes one ore more disks and forms a storage unit. This unit is called differently depending on the LVM being used. The linux lvm name it: “volume group”.

The following picture illustrates how the LVM works:

lvm

As you can see the LVM takes one or more disks “physical device” or “physical volume” and forms a so called “volume group”. There can be one or more volume group. One disk always belongs to one volume group. From the volume group storage is exported as “logical volume”. On top of a logical volume a file system can be created and mounted or the logical volume can be used directly as raw device.

More information can be found in the linux lvm howto.

Features

  • Snapshots
  • Online-Resize (growing and shrinking) of logical volumes (not the file systems!)
  • supports multipathing
  • supports several RAID levels (concat, raid-0, raid-1, raid-5)

Restrictions / Disadvantages

Every recent Linux distribution (Red Hat ES/AS 4, Red Hat ES/AS 5, OEL 5, SuSE Linux Enterprise Server 10, SuSE Linux Enterprise Server 11) offers the possibility to use LVM out-of-the-box. However there are a few restrictions:

  • Due to limitations in grub bootloader /boot (i.e. the kernel and initial ram disk) cannot be placed inside a LVM; /boot needs to be a separate partition. There is a re-implementation of grub which will offer the possibility to boot from a /boot encapsulated in LVM.
  • Fixing boot problems is a little bit more complex because accessing the volumes require the LVM to be started and supported by the rescue system. Most distributions offering LVM support offer this as well in their recue systems. If not you can use SystemRescueCD (see “Tools” section at the end of this post)
  • Although it is possible to use LVM in multipathing scenarios most storage vendors require dedicated kernel drivers coming from the storage vendors directly.
  • Although a LVM can resize a logical volume the file system stored on this logical volume cannot be resized by the LVM directly. For doing so you have to use the file system specific tools (for ext2/ext3 the tool is named “resize2fs”)
  • Resizing a LUN/Disk already used by LVM requires a reboot after the partition used by the LVM was edited with fdisk to reflect the new partition size so the kernel recognizes the new size of the partition (this is also true for non-lvm configurations).

Shall i use an LVM?

From my point of view an LVM shall be used for:

  • easier and more flexible management of operating system storage
  • easier and more flexible management for oracle database binary installation location (according to the OFS this is /u01)

An LVM shall not be used for:

  • Multipathing if it is not supported/certified by your storage vendor
  • Storing any type of oracle data files (this includes: data files, archive logs, redo logs, flashback logs, control files, ….): LVM introduces a additional layer of software, complexity and source of problems especially in conjunction with oracle data files.
  • SWAP: There might be scenarios where the kernel runs out of memory and needs some pages from memory to be swapped out; if swap is lvm encapsulated the kernel might not be able to complete the request because the LVM requires some memory to be available which is not available – voila: a race condition! In addition to that swapping is already slow… encapsulating it in LVM makes it more slower. Furthermore SWAP is a quite static device.

Putting it all toghether

Partitions

Boot disks

Due to the restrictions mentioned above a typical hard disk layout for the boot disk (here: “/dev/sda”) looks like this:

Device          Size            Part.-Type       File system     Mountpoint
/dev/sda1       150 MB          linux            ext3            /boot
/dev/sda2       depends         linux swap       -               -
/dev/sda3       remain. space   linux lvm (8e)   -               -

The size of the swap space depends directly on the amount of memory you have. According to the oracle documentation you will need up to 16 GB swap (if you have more than 16 gb memory the swap space size will be static at 16 gb regardless of the actual memory size).

Non-Boot disks (for use with LVM)

For any additional disk used  by the LVM the partition layout migh look like this:

Device          Size            Part.-Type       File system     Mountpoint
/dev/sdb1       whole disk      linux lvm (8e)   -               -

Non-Boot disks (NOT used by the LVM)

Device          Size            Part.-Type       File system     Mountpoint
/dev/sdc1       whole disk      linux            ext3            /u02

Mount Points / Logical Volume / Physical device separation

The following listing is my standard listing for oracle database servers on linux:

Device                           Mount Point      File system      Size
/dev/sda1                        /boot            ext3             150 MB
/dev/sda2                        swap             swap             depends on memory
/dev/mapper/vg_root/lv_root      /                ext3             15 GB
/dev/mapper/vg_root/lv_var       /var             ext3             10 GB
/dev/mapper/vg_root/lv_tmp       /tmp             ext3             20 GB
/dev/mapper/vg_root/lv_u01       /u01             ext3             50 GB
/dev/sdc1                        /u02             ext3             complete disk
/dev/sdd1                        /u03             ext3             complete disk
...
/dev/sdN                         /u0N             ext3             complete disk

You note /boot and swap is not LVM encapsulated due to some resrictions outlined above. /, /var, /tmp and /u01 are separated by logical volumes with appropriate sizes. You might think the device sizes are way too large: You´re right. I tent to make the logical volumes a little bit larger so i do not need to resize them too soon. Every resize operatin might cause total data loss. So it is advisable to resize file systems or logical volumes as less often as possible. The first level directories /var, /tmp and /u01 are separated in logical volumes because data will be placed under this top level directories.

If you like to you can even have separate volumes for /home or the  oracle admin or the diagnostic directory.

I do not have a separate volume for /home because there are no users beside “root” and the oracle user on my systems which are allowed to store files. If you need space for user directories i recommend creating a dedicated /home volume as well.

With this volume layout we are using 50+20+10+15 GB = 95 GB plus 150 MB for /boot plus swap. Todays smallest server disks have at least 147 gb capacity (formatted something like 138 gb …). So this layout fits easily on these disks. If you have a server with 73 gb disks you can shrink primary /u01 and after that all the other volumes.

Tools

Increasing data checks performed by RMAN with "check logical" switch

By default RMAN checks for physical corruptions in data files when performing backups. For a logical check you can add “check logical” to your rman backup command. With this clause rman also checks the block integrity logically.

A few examples how to use it:

backup validate check logical database;

backup check logical database;

backup check logical incremental level 0 filesperset 4 database;

Checking for both logical and physical corruptions requires the init.ora parameter “db_block_checksum” set to “true” which is also the default. This parameter forces oracle to calculate a checksum for the block before writing it to disk. Doing logical checks with rman has a slight performance overhead when doing backups – typically 1 to 10 percent.

It is not recommended to turn this off for performance reasons bercause the performance overhead is minimal (< 2%) and you will be unable to detect corruptions!

RMAN in Data Guard configurations – Problems of restoring rman disk backups and standby site and vice versa

… or: the problem of backup association

From time to time i come across the same problems in data guard configurations using rman-to-disk: Backups taken on the standby database are not taken into account when trying to perform an rman recovery on the primary site and vice versa.

This is an expected behavior according to the oracle documentation:

The accessibility of a backup is different from its association. In a Data Guard environment, the recovery catalog considers disk backups as accessible only to the database with which it is associated, whereas tape backups created on one database are accessible to all databases. If a backup file is not associated with any database, then the row describing it in the recovery catalog view shows null for the SITE_KEY column. By default, RMAN associates files whose SITE_KEY is null with the database to which it is connected as TARGET.

RMAN commands such as BACKUP, RESTORE, and CROSSCHECK work on any accessible backup. For example, for a RECOVER COPY operation, RMAN considers only image copies that are associated with the database as eligible to be recovered. RMAN considers the incremental backups on disk and tape as eligible to recover the image copies. In a database recovery, RMAN considers only the disk backups associated with the database and all files on tape as eligible to be restored.

To illustrate the differences in backup accessibility, assume that databases prod and standby1 reside on different hosts. RMAN backs up datafile 1 on prod to /prmhost/disk1/df1.dbf on the production host and also to tape. RMAN backs up datafile 1 on standby1 to /sbyhost/disk2/df1.dbf on the standby host and also to tape. If RMAN is connected to database prod, then you cannot use RMAN commands to perform operations with the /sbyhost/disk2/df1.dbf backup located on the standby host. However, RMAN does consider the tape backup made on standby1 as eligible to be restored.

Note: This is a quote from the 11g R1 documentation. The same text apprears in 10g R2 documentation as well.

Managing the AWR (automatic workload repository)

Overview on the AWR

The automatic workload repository “AWR” was introduced with Oracle 10g. It collects performance data in intervals (default: 60 minutes) and keeps them by default for one week. The data is stored in the SYSAUX tablespace.

The AWR collects:

  • Wait events
  • time model statistics (view: V$SESS_TIME_MODEL and V$SYS_TIME_MODEL)
  • active session hitory information (view: V$ACTIVE_SESSION_HISTORY)
  • system statistics (view: V$SYSSTAT, V$SESSTAT)
  • resource intensive sql statements
  • information on object usage

For AWR to collect these data the database parameter STATISTICS_LEVEL has to be set to TYPICAL (= the defaul) or ALL. Setting it to BASIC disables data collection!

Changing the AWR settings

Changing the snapshot interval and retention time

For changing the snapshot interval and/or the retention time use the following syntax:

exec dbms_workload_repository.modify_snapshot_settings(
interval => 60
retention => 525600);

The example printed above changes the retention time to one year (60 minutes per hour * 24 hours a day * 365 days per year = 525600). It does not alter the snapshot interval (60 minutes is the default). If you want to you can alter this as well but keep in mind you need aditional storage to do so. Setting the interval value to zero completely disables data collection.

According to the oracle documentation depending on the size of the database the SYSAUX tablespace which contains the AWR objects can be as big as 5 gb in default configuration (60 minutes snapshot interval, 7 days retention time). So if you want to keep your data for one year you might end up with a SYSAUX of 260 GB (assuming a linear growth with the retention time).

Manually creating a snapshot

You can of course create a snapshot maually by executing:

exec dbms_workload_repository.create_snapshot;

Query available snapshots

To query the available snapshot you can use this query:

select snap_id, begin_interval_time, end_interval_time
from dba_hist_snapshot
order by snap_id;

Removing a snapshot

To remove a snapshot just do:

exec dbms_workload_repository.drop_snapshot_range
  (low_snap_id=>1, high_snap_id=>10);

Oracle Dataguard – automatic archive log file management on primary site

System Environment

Suppose you have an oracle data guard environment with one primary database and one physical standby database running.

You perform online backups with RMAN from the PRIMARY database. Your users complain about slowness during the night. You notice the slowness correlates with the time the online backup runs. Thanks to the physical standby database you can offload your backups to the standby database. This short guide shows how to do that.

First of all the ability to offload backups from primary to standby is a major plus in physical standby configurations. Note that this is not possible on logical standby configurations.

Furthermore the offload to the standby database requires to have a more or less separated storage (either separated storage boxes, separated disks, separated ports,….).General speaking: the more you separate primary from standby storage the better it is from performance point of view.

Implementation on primary site

Database requirements

In order to use the automatic deletion of archive logs on the primary site after they have been applied on the standby database you need to enable the FRA (flashback recovery area) on the primary database set setting the following two parameters:

db_recovery_file_dest='<path>’
db_recovery_file_dest_size=xyG

The value for db_recovery_file_dest must be a valid path and db_recovery_file_dest_size specified the total size of the FRA in GB (specified by the suffix “G”), MB (suffix “M”) and so on.

Note that the FRA will fill up to this value before starting to delete archive logs applied on standby. So if you configure the FRA to a size of 200 GB you will end up with 200 GB archive logs stored in the FRA.

You also need to unset the parameter archive_log_dest_n if it points to a local directory. Note that dataguard uses archive_log_dest_n parameter for specifying the remote archive log destination.

Once you set everything you can check it with “archive log list” from SQLPlus. The output should look like this:

SQL> archive log list
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence     62686
Next log sequence to archive   62695
Current log sequence           62695

If you have configured the environment correctly the archive destination should show up as “USE_DB_RECOVERY_FILE_DEST”.

Backup on primary site

On primary site all you need to backup is the current control file. For use with EMC networker i am using the following script. If you backup with another backup software you just need to alter the SBT_TAPE configuration accordingly to the documentation or backup the control file to disk rather than to tape.

connect target sys/password@primary;
connect catalog rman_catalog/catalog_password@catalogd;
run {
 backup
 (current controlfile format 'ctrl_PRIMARY_s%s_p%p');
 sql 'alter database backup controlfile to trace';
 }
run {
copy current controlfile to '/u01/sicherung/controlfiles/ctl_PRIMARY.ctl';
}

The script shown above is just an example. You need to check and alter it to suit your needs. As you may assume the database in this example is named “PRIMARY”; the standby is named “STANDBY”.

You can see i am quite paranoid about control file backups: I backup them to tape, to trace in a readable format and directly to disk as a copy.

In restore scenarios this enables:

  • restore of the control file with rman
  • re-creatation of the control file from the dumped control file
  • copy the control file from  /u01/sicherung/controlfiles/ctl_PRIMARY.ctl to the destinations named in the database parameter “control_files”
    (Note: The file created with “copy controlfile” is not wrapped in an rman stream… therefore you can just “copy” it with cp for instance)

Implementation on standby site

Required INIT.ORA parameters

If you use dataguard in a Max Performance or Max Availability mode you need to set

_log_deletion_policy='ALL'

in your standby database init.ora. This is a know problem documented in Metalink Note 331924.1.

Required RMAN configuration

In order to enable automatic deletion on primary site you need to configure rman on standby site as follows:

CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON STANDBY

This tells oracle to make archive logs on primary size egible for deleteion after they have been applied on the standby site.

Backup scripts

Full backup

The following script shows how to perform an full backup with rman on the STANDBY database. Primary and standby databases share the same recovery catalog (because they are basically the same databases)!

My recommendation is to daily backup your database fully. With large databases a full backup might take longer than 24 hours. In this cases you can do daily incremental and weekly full backups.

connect target sys/password@standby;
connect catalog rman_catalog/catalog_password@catalogd;
run {
 CONFIGURE DEVICE TYPE SBT_TAPE PARALLELISM 4;
 backup check logical INCREMENTAL LEVEL 0 filesperset 8
 format 'FULL0_%d_%u'
 (database include current controlfile);
 backup not backed up 1 times filesperset 10
 (archivelog all delete input format  'al_STANDBY_%s_%p');
 backup
 (current controlfile format 'ctrl_STANDBY_s%s_p%p'); }
run {
copy current controlfile to '/u01/sicherung/controlfiles/ctl_STANDBY.ctl';
}

Archive log only backup

The following script shows how to backup the archive logs on STANDBY site. The script can be started any time.

I recommend to backup the archive logs every hour. Thanks to “backup not backed up 1 times” we only backup the archive logs once even if they are still on disk when we backup the archive logs once again.

connect target sys/password@standby;
connect catalog rman_catalog/catalog_password@catalogd;
run {
 CONFIGURE DEVICE TYPE SBT_TAPE PARALLELISM 4;
 backup not backed up 1 times as compressed backupset filesperset 10
 (archivelog all delete input format  'al_STANDBY_%s_%p');
 backup
 (current controlfile format 'ctrl_STANDBY_s%s_p%p');
 }
run {
copy current controlfile to '/u01/sicherung/controlfiles/ctl_STANDBY.ctl';
}

You also do not need to worry about RMAN deleting archive logs which are not yet applied on standby (in MAXIMUM PERFORMANCE mode). If rman tries to delete an archive log which is not yet applied on standby you see the following message:

archive log filename=/u02/oradata/STANDBY/arch/1_62487_646478938.dbf
  thread=1 sequence=62487
RMAN-08120: WARNING: archive log not deleted, not yet applied by standby

Test it

After you configured everything you need to TEST you configuration. This involves:

  • What happens to the primary database if the standby database is down and does not apply archive logs?
  • Does the archive logs expire once they are applied on the standby?
  • and so ….

Query the FRA

The following query queries the current status of the FRA and shows how much space is occupied in the FRA:

select * from v$flash_recovery_area_usage;

fra

In the screenshot above you can see the FRA is filled to 99.83 % with 58.29% reclaimable (this means: 58.29% of space occupied by archive logs in the FRA are eligible for deletetion because they are already applied on the standby database). This query was performed on a data guard enviroment running in MAXIMUM AVAILABLILITY mode.

Changing ADR Destination for 11g Listener

With 11g Release 1 Oracle introduced the Automatic Diagnostic Repository  – “ADR”. The ADR is designed as central repository for storing all oracle-related log files such as alert.log, listener.log and so on.

The log files are stored in XML format and in a subdirectory called “trace” in the pre-11g format.

By default the ADR-Home is located in $ORACLE_BASE. The path can be changed in the database by changing the parameter DIAGNOSTIC_DEST.

 

For the listener this change does not work. If you want to change the location the listener places its log files add the following parameter to the LISTENER.ORA file to change the path to the ADR:

ADR_BASE_listener = <path>

“listener” is the name of the listener to be changed. If your listener is named differently the parameter must be changed as well!

Building and using the kfed utility

When using ASM sometimes it it extremely helpful to get more information on the asm disk header. The “kfed” utility from oracle enables to dump the asm disk header and many more.

With this tool corruptions to the asm can be easily checked (and repaired).

 

 

Building kfed

This method works from 10g onwards to and including 11g R2:

 

-bash-3.2$ cd $ORACLE_HOME/rdbms/lib
-bash-3.2$ make -f ins_rdbms.mk ikfed
Linking KFED utility (kfed)
rm -f /u01/app/oracle/product/ora11r2p/rdbms/lib/kfed
gcc -o /u01/app/oracle/product/ora11r2p/rdbms/lib/kfed -m64 -L/u01/app/oracle/product/ora11r2p/rdbms/lib/
-L/u01/app/oracle/product/ora11r2p/lib/ -L/u01/app/oracle/product/ora11r2p/lib/stubs/ 
/u01/app/oracle/product/ora11r2p/lib/s0main.o /u01/app/oracle/product/ora11r2p/rdbms/lib/sskfeded.o
/u01/app/oracle/product/ora11r2p/rdbms/lib/skfedpt.o -ldbtools11 -lasmclnt11 -lcommon11 -lcell11 -lskgxp11
-lhasgen11 -lskgxn2 -lnnz11 -lzt11 -lxml11 -locr11 -locrb11 -locrutl11 -lhasgen11 -lskgxn2 -lnnz11 -lzt11
-lxml11 -lasmclnt11 -lcommon11 -lcell11 -lskgxp11 -lgeneric11  -lcommon11 -lgeneric11  -lclntsh 
`cat /u01/app/oracle/product/ora11r2p/lib/ldflags`    -lncrypt11 -lnsgr11 -lnzjs11 -ln11 -lnl11
-lnro11 `cat /u01/app/oracle/product/ora11r2p/lib/ldflags`    -lncrypt11 -lnsgr11 -lnzjs11 -ln11 -lnl11
-lnnz11 -lzt11 -lztkg11 -lztkg11 -lclient11 -lnnetd11  -lvsn11 -lcommon11 -lgeneric11 -lmm -lsnls11
-lnls11  -lcore11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lxml11 -lcore11 -lunls11 -lsnls11 -lnls11
-lcore11 -lnls11 `cat /u01/app/oracle/product/ora11r2p/lib/ldflags`    -lncrypt11 -lnsgr11 -lnzjs11 -ln11
-lnl11 -lnro11 `cat /u01/app/oracle/product/ora11r2p/lib/ldflags`    -lncrypt11 -lnsgr11 -lnzjs11 -ln11 -lnl11
-lclient11 -lnnetd11  -lvsn11 -lcommon11 -lgeneric11   -lsnls11 -lnls11  -lcore11 -lsnls11 -lnls11 -lcore11
-lsnls11 -lnls11 -lxml11 -lcore11 -lunls11 -lsnls11 -lnls11 -lcore11 -lnls11 -lclient11 -lnnetd11  -lvsn11
-lcommon11 -lgeneric11 -lsnls11 -lnls11  -lcore11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lxml11 -lcore11
-lunls11 -lsnls11 -lnls11 -lcore11 -lnls11   `cat /u01/app/oracle/product/ora11r2p/lib/sysliblist`
-Wl,-rpath,/u01/app/oracle/product/ora11r2p/lib -lm    `cat /u01/app/oracle/product/ora11r2p/lib/sysliblist`
-ldl -lm   -L/u01/app/oracle/product/ora11r2p/lib
test ! -f /u01/app/oracle/product/ora11r2p/bin/kfed ||\
           mv -f /u01/app/oracle/product/ora11r2p/bin/kfed /u01/app/oracle/product/ora11r2p/bin/kfedO
mv /u01/app/oracle/product/ora11r2p/rdbms/lib/kfed /u01/app/oracle/product/ora11r2p/bin/kfed
chmod 751 /u01/app/oracle/product/ora11r2p/bin/kfed

 

Using kfed to dump disk header

In the following example we use kfed to dump the asm disk header of an asm disk represented by device “/dev/sdg1”:

 

/u01/app/oracle/product/11.2.0/ora11p/bin/kfed read /dev/sdg1
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:              2147483648 ; 0x008: TYPE=0x8 NUMB=0x0
kfbh.check:                  2733723458 ; 0x00c: 0xa2f14f42
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr: ORCLDISKDISK003A ; 0x000: length=16
kfdhdb.driver.reserved[0]:   1263749444 ; 0x008: 0x4b534944
kfdhdb.driver.reserved[1]:   1093873712 ; 0x00c: 0x41333030
kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                186646528 ; 0x020: 0x0b200000
kfdhdb.dsknum:                        0 ; 0x024: 0x0000
kfdhdb.grptyp:                        2 ; 0x026: KFDGTP_NORMAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:                DISK003A ; 0x028: length=8
kfdhdb.grpname:                   DATA2 ; 0x048: length=5
kfdhdb.fgname:              CONTROLLER1 ; 0x068: length=11
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.crestmp.hi:             32925079 ; 0x0a8: HOUR=0x17 DAYS=0xc MNTH=0x9 YEAR=0x7d9
kfdhdb.crestmp.lo:           2888263680 ; 0x0ac: USEC=0x0 MSEC=0x1da SECS=0x2 MINS=0x2b
kfdhdb.mntstmp.hi:             32925206 ; 0x0b0: HOUR=0x16 DAYS=0x10 MNTH=0x9 YEAR=0x7d9
kfdhdb.mntstmp.lo:           1862809600 ; 0x0b4: USEC=0x0 MSEC=0x20e SECS=0x30 MINS=0x1b
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000
kfdhdb.ausize:                  4194304 ; 0x0bc: 0x00400000
kfdhdb.mfact:                    454272 ; 0x0c0: 0x0006ee80
kfdhdb.dsksize:                    5119 ; 0x0c4: 0x000013ff
kfdhdb.pmcnt:                         2 ; 0x0c8: 0x00000002
kfdhdb.fstlocn:                       1 ; 0x0cc: 0x00000001
kfdhdb.altlocn:                       2 ; 0x0d0: 0x00000002
kfdhdb.f1b1locn:                      2 ; 0x0d4: 0x00000002
kfdhdb.redomirrors[0]:                0 ; 0x0d8: 0x0000
kfdhdb.redomirrors[1]:                0 ; 0x0da: 0x0000
kfdhdb.redomirrors[2]:                0 ; 0x0dc: 0x0000
kfdhdb.redomirrors[3]:                0 ; 0x0de: 0x0000
kfdhdb.dbcompat:              186646528 ; 0x0e0: 0x0b200000
kfdhdb.grpstmp.hi:             32925079 ; 0x0e4: HOUR=0x17 DAYS=0xc MNTH=0x9 YEAR=0x7d9
kfdhdb.grpstmp.lo:           2887388160 ; 0x0e8: USEC=0x0 MSEC=0x283 SECS=0x1 MINS=0x2b
kfdhdb.vfstart:                       0 ; 0x0ec: 0x00000000
kfdhdb.vfend:                         0 ; 0x0f0: 0x00000000
kfdhdb.spfile:                        0 ; 0x0f4: 0x00000000
kfdhdb.spfflg:                        0 ; 0x0f8: 0x00000000
kfdhdb.ub4spare[0]:                   0 ; 0x0fc: 0x00000000
kfdhdb.ub4spare[1]:                   0 ; 0x100: 0x00000000
kfdhdb.ub4spare[2]:                   0 ; 0x104: 0x00000000
kfdhdb.ub4spare[3]:                   0 ; 0x108: 0x00000000
kfdhdb.ub4spare[4]:                   0 ; 0x10c: 0x00000000
kfdhdb.ub4spare[5]:                   0 ; 0x110: 0x00000000
kfdhdb.ub4spare[6]:                   0 ; 0x114: 0x00000000
kfdhdb.ub4spare[7]:                   0 ; 0x118: 0x00000000
kfdhdb.ub4spare[8]:                   0 ; 0x11c: 0x00000000
kfdhdb.ub4spare[9]:                   0 ; 0x120: 0x00000000
kfdhdb.ub4spare[10]:                  0 ; 0x124: 0x00000000
kfdhdb.ub4spare[11]:                  0 ; 0x128: 0x00000000
kfdhdb.ub4spare[12]:                  0 ; 0x12c: 0x00000000
kfdhdb.ub4spare[13]:                  0 ; 0x130: 0x00000000
kfdhdb.ub4spare[14]:                  0 ; 0x134: 0x00000000
kfdhdb.ub4spare[15]:                  0 ; 0x138: 0x00000000
kfdhdb.ub4spare[16]:                  0 ; 0x13c: 0x00000000
kfdhdb.ub4spare[17]:                  0 ; 0x140: 0x00000000
kfdhdb.ub4spare[18]:                  0 ; 0x144: 0x00000000
kfdhdb.ub4spare[19]:                  0 ; 0x148: 0x00000000
kfdhdb.ub4spare[20]:                  0 ; 0x14c: 0x00000000
kfdhdb.ub4spare[21]:                  0 ; 0x150: 0x00000000
kfdhdb.ub4spare[22]:                  0 ; 0x154: 0x00000000
kfdhdb.ub4spare[23]:                  0 ; 0x158: 0x00000000
kfdhdb.ub4spare[24]:                  0 ; 0x15c: 0x00000000
kfdhdb.ub4spare[25]:                  0 ; 0x160: 0x00000000
kfdhdb.ub4spare[26]:                  0 ; 0x164: 0x00000000
kfdhdb.ub4spare[27]:                  0 ; 0x168: 0x00000000
kfdhdb.ub4spare[28]:                  0 ; 0x16c: 0x00000000
kfdhdb.ub4spare[29]:                  0 ; 0x170: 0x00000000
kfdhdb.ub4spare[30]:                  0 ; 0x174: 0x00000000
kfdhdb.ub4spare[31]:                  0 ; 0x178: 0x00000000
kfdhdb.ub4spare[32]:                  0 ; 0x17c: 0x00000000
kfdhdb.ub4spare[33]:                  0 ; 0x180: 0x00000000
kfdhdb.ub4spare[34]:                  0 ; 0x184: 0x00000000
kfdhdb.ub4spare[35]:                  0 ; 0x188: 0x00000000
kfdhdb.ub4spare[36]:                  0 ; 0x18c: 0x00000000
kfdhdb.ub4spare[37]:                  0 ; 0x190: 0x00000000
kfdhdb.ub4spare[38]:                  0 ; 0x194: 0x00000000
kfdhdb.ub4spare[39]:                  0 ; 0x198: 0x00000000
kfdhdb.ub4spare[40]:                  0 ; 0x19c: 0x00000000
kfdhdb.ub4spare[41]:                  0 ; 0x1a0: 0x00000000
kfdhdb.ub4spare[42]:                  0 ; 0x1a4: 0x00000000
kfdhdb.ub4spare[43]:                  0 ; 0x1a8: 0x00000000
kfdhdb.ub4spare[44]:                  0 ; 0x1ac: 0x00000000
kfdhdb.ub4spare[45]:                  0 ; 0x1b0: 0x00000000
kfdhdb.ub4spare[46]:                  0 ; 0x1b4: 0x00000000
kfdhdb.ub4spare[47]:                  0 ; 0x1b8: 0x00000000
kfdhdb.ub4spare[48]:                  0 ; 0x1bc: 0x00000000
kfdhdb.ub4spare[49]:                  0 ; 0x1c0: 0x00000000
kfdhdb.ub4spare[50]:                  0 ; 0x1c4: 0x00000000
kfdhdb.ub4spare[51]:                  0 ; 0x1c8: 0x00000000
kfdhdb.ub4spare[52]:                  0 ; 0x1cc: 0x00000000
kfdhdb.ub4spare[53]:                  0 ; 0x1d0: 0x00000000
kfdhdb.acdb.aba.seq:                  0 ; 0x1d4: 0x00000000
kfdhdb.acdb.aba.blk:                  0 ; 0x1d8: 0x00000000
kfdhdb.acdb.ents:                     0 ; 0x1dc: 0x0000
kfdhdb.acdb.ub2spare:                 0 ; 0x1de: 0x0000

 

What does it all mean?

For most usages the following rows are most important:

 

Status of disk:

kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER

Name of the disk:

kfdhdb.dskname:                DISK003A ; 0x028: length=8

 Name of disk group the disk belongs to:

kfdhdb.grpname:                   DATA2 ; 0x048: length=5

 Name of failure group the disk belongs to

kfdhdb.fgname:              CONTROLLER1 ; 0x068: length=11

Sector size of disk:

kfdhdb.secsize:                     512 ; 0x0b8: 0x0200

Blocksize of disk:

kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000

Provision string for use with asm:

kfdhdb.driver.provstr: ORCLDISKDISK003A

–> which finally means: “ORCL:DISK003A”

AU size:

kfdhdb.ausize:                  4194304

 

In a followup i will dig all little bit deeper in ASM and provide samples of different headers in different combinations (external / normal / high redundancy, ….).