ASM resilvering – or – how to recover your crashed cluster

In this and the following posts i will perform some crash and recover scenarios and show how to recover the cluster successfully.

At the moment the following tests are planned and will be published during the next days:

The environment used for the posts are explained in detail here.

Useful scripts can be found here.

ASM resilvering – or – how to recover your crashed cluster – Test no 4

Test #4: Corrupting the ASM disk with ASM disk group being online and active

After overwriting the ASM disk header while the disk group was offline we will now put some load on the full running cluster and corrupt the asm disk slightly.

Continue reading ASM resilvering – or – how to recover your crashed cluster – Test no 4

Oracle 11 Release 2 Install Guide – ACFS and ADVM

ACFS and ADVM

System configuration

We will use the system configured and installed in part 2

General information

  • ADVM = ASM dynamic volume manager
  • ACFS = ASM cluster file system
  • Basic layout

step3_010

(Source: Oracle Database Storage Administrator’s Guide 11g Release 2)

ADVM – Advantages

  • Integrated in ASM; this can be an disadvantage as well :-)
  • Inhertits storage from ASM hence enables host-based mirroring (either 2- or 3-way-mirroring)
  • multiple volumes within a disk group can be created with an file system such as ext3, ext4, reiserfs, … on top of it and will support storage of any file type as the file system normally woud – EXCEPT files which can be place in ASM directly
  • ADVM volume dynamically resizeable

ADVM – Disadvantages

  • ADVM volumes may be resized online; but the used file system must support it as well (ext3 on OEL 5 does support online resizing but does not support online shrinking)
  • Storing files which can be stored in ASM directly in ADVM + file system is not supported
  • NFS on top of ADVM is also not supported
  • ASM configuration assistant (asmca) only supports creation of volumes / file system… delete a volume / file system requires command line

ACFS – Advantages

  • cluster file system on top of ASM and ADVM
  • as available as ASM is (inherits storage from ASM disk group and ADVM volume)
  • Supports storage of files which cannot be directly stored in ASM, i.e.
    • executables
    • trace files
    • log files
    • Supports even oracle database binary installations
  • On ACFS read-only Snapshots can be created
  • dynamically resizeableUseable accross plattforms
  • Thoughts
    • Do i need licenses for grid infrastructrue?
    • If not: What if grid infrastructure + ASM used to provide host-based mirroring and cluster file system for non-oracle applications, for instance web servers ACFS Mount registry: used for mouting ACFS and ADVM file system across reboots

ACFS – Disadvantages

  • for example storing database files in ACFS is not supported, according to the documentation
    „Oracle Support Services will not take calls and development will not fix bugs associated with storing unsupported file types in Oracle ACFS“
  • Only available with RedHat Server 5 or Oracle Enterprise Linux 5 !
  • Disk group compatible parameter COPATBILE.ASM and COMPATIBLE.ADVM must be set so 11.2
  • ASM configuration assistant (asmca) only supports creation of volumes / file system… delete a volume / file system requires command line

First steps with ADVM and ACFS

Create a disk group for use with ADVM and ACFS

Lets first create an additional disk group called „DATA2“ which consists for two iSCSI LUNs with 30 GB each

Preparation:

  • LUNs visible with „fdisk -l“
  • Partition created (one on each LUN)
  • disk labeled with „oracleasm createdisk <name> <devpath>“
  • Create disk group in ASM (remember to connect as „sys as sysASM“!)

step3_011

  • Notes on disk groups
    • AUSIZE of 4 MB recommended by Oracle documentation due to:
      • Increased I/O through the I/O subsystem if the I/O size is increased to the AU size.
      • Reduced SGA size to manage the extent maps in the database instance.
      • Faster datafile initialization if the I/O size is increased to the AU size.
      • Increased file size limits.
      • Reduced database open time.
    • Large AUSIZE requires as well
      • Increasing size of maximum IO request to at least 4 MB in operating system, drive, HBA, storage system
      • Larger stripe size in storage system (pre 11g R2: 1 MB stripe size, with 11g R2: 4 MB? → to be tested)

Read the documentation on COMPATIBLE parameters;most „cool“ features are only available with 11.2 COMPATIBLE parameter hence require 11g R2 database

Creating an ADVM and afterwards an ACFS

Create an ADVM

  • ACFS requires ADVM in which ACFS can be created
  • volcreate creates ADVM volume

step3_012

  • The command above shows minimal command creating an ADVM volume; redundancy is derived from disk group, our data group was created with „normal“ redundancy so the volume inherits this as well)
  • Creation with SQL also possible: „ALTER DISKGROUP data2 ADD VOLUME volume1 SIZE 10G;“

Create ACFS on top of ADVM

  • Requires ADVM in which ACFS can be created
  • volinfo shows detailed information
  • Especially device path is important for creating the file system

step3_013

  • create ACFS from operating system (only on one node)

step3_014

  • register acfs in registry to be mounted across reboots with „acfsutil“
  • ATTENTION: DO NOT register shared oracle home directories with acfsutil; this will be done later by the clusterware itself!
  • test of everything works by issueing „mount.acfs -o all“ on all nodes; the file system should be mounted and accessible

step3_015

Simple Performance tests

dd if=/dev/zero bs=1024k of=<path> count=1000 oflag=direct

→ direct I/O used; no caching, performed 10 times

  • write to ACFS
    • ACFS 2-way mirror: ~ 6 MB/s average
    • ACFS unprotected: ~ 12 MB/s averga
    • → expected… one disk, double I/O halfed throughput
  • direct write to iSCSI LUN: ~ 14.3 MB/s average

Attention: Tests were performed within a VMWare… so results are most likely not accurate… but we get an impression.. we will check this on real hardware later!

Building and using the kfed utility

When using ASM sometimes it it extremely helpful to get more information on the asm disk header. The “kfed” utility from oracle enables to dump the asm disk header and many more.

With this tool corruptions to the asm can be easily checked (and repaired).

 

 

Building kfed

This method works from 10g onwards to and including 11g R2:

 

-bash-3.2$ cd $ORACLE_HOME/rdbms/lib
-bash-3.2$ make -f ins_rdbms.mk ikfed
Linking KFED utility (kfed)
rm -f /u01/app/oracle/product/ora11r2p/rdbms/lib/kfed
gcc -o /u01/app/oracle/product/ora11r2p/rdbms/lib/kfed -m64 -L/u01/app/oracle/product/ora11r2p/rdbms/lib/
-L/u01/app/oracle/product/ora11r2p/lib/ -L/u01/app/oracle/product/ora11r2p/lib/stubs/ 
/u01/app/oracle/product/ora11r2p/lib/s0main.o /u01/app/oracle/product/ora11r2p/rdbms/lib/sskfeded.o
/u01/app/oracle/product/ora11r2p/rdbms/lib/skfedpt.o -ldbtools11 -lasmclnt11 -lcommon11 -lcell11 -lskgxp11
-lhasgen11 -lskgxn2 -lnnz11 -lzt11 -lxml11 -locr11 -locrb11 -locrutl11 -lhasgen11 -lskgxn2 -lnnz11 -lzt11
-lxml11 -lasmclnt11 -lcommon11 -lcell11 -lskgxp11 -lgeneric11  -lcommon11 -lgeneric11  -lclntsh 
`cat /u01/app/oracle/product/ora11r2p/lib/ldflags`    -lncrypt11 -lnsgr11 -lnzjs11 -ln11 -lnl11
-lnro11 `cat /u01/app/oracle/product/ora11r2p/lib/ldflags`    -lncrypt11 -lnsgr11 -lnzjs11 -ln11 -lnl11
-lnnz11 -lzt11 -lztkg11 -lztkg11 -lclient11 -lnnetd11  -lvsn11 -lcommon11 -lgeneric11 -lmm -lsnls11
-lnls11  -lcore11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lxml11 -lcore11 -lunls11 -lsnls11 -lnls11
-lcore11 -lnls11 `cat /u01/app/oracle/product/ora11r2p/lib/ldflags`    -lncrypt11 -lnsgr11 -lnzjs11 -ln11
-lnl11 -lnro11 `cat /u01/app/oracle/product/ora11r2p/lib/ldflags`    -lncrypt11 -lnsgr11 -lnzjs11 -ln11 -lnl11
-lclient11 -lnnetd11  -lvsn11 -lcommon11 -lgeneric11   -lsnls11 -lnls11  -lcore11 -lsnls11 -lnls11 -lcore11
-lsnls11 -lnls11 -lxml11 -lcore11 -lunls11 -lsnls11 -lnls11 -lcore11 -lnls11 -lclient11 -lnnetd11  -lvsn11
-lcommon11 -lgeneric11 -lsnls11 -lnls11  -lcore11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lxml11 -lcore11
-lunls11 -lsnls11 -lnls11 -lcore11 -lnls11   `cat /u01/app/oracle/product/ora11r2p/lib/sysliblist`
-Wl,-rpath,/u01/app/oracle/product/ora11r2p/lib -lm    `cat /u01/app/oracle/product/ora11r2p/lib/sysliblist`
-ldl -lm   -L/u01/app/oracle/product/ora11r2p/lib
test ! -f /u01/app/oracle/product/ora11r2p/bin/kfed ||\
           mv -f /u01/app/oracle/product/ora11r2p/bin/kfed /u01/app/oracle/product/ora11r2p/bin/kfedO
mv /u01/app/oracle/product/ora11r2p/rdbms/lib/kfed /u01/app/oracle/product/ora11r2p/bin/kfed
chmod 751 /u01/app/oracle/product/ora11r2p/bin/kfed

 

Using kfed to dump disk header

In the following example we use kfed to dump the asm disk header of an asm disk represented by device “/dev/sdg1”:

 

/u01/app/oracle/product/11.2.0/ora11p/bin/kfed read /dev/sdg1
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:              2147483648 ; 0x008: TYPE=0x8 NUMB=0x0
kfbh.check:                  2733723458 ; 0x00c: 0xa2f14f42
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr: ORCLDISKDISK003A ; 0x000: length=16
kfdhdb.driver.reserved[0]:   1263749444 ; 0x008: 0x4b534944
kfdhdb.driver.reserved[1]:   1093873712 ; 0x00c: 0x41333030
kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                186646528 ; 0x020: 0x0b200000
kfdhdb.dsknum:                        0 ; 0x024: 0x0000
kfdhdb.grptyp:                        2 ; 0x026: KFDGTP_NORMAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:                DISK003A ; 0x028: length=8
kfdhdb.grpname:                   DATA2 ; 0x048: length=5
kfdhdb.fgname:              CONTROLLER1 ; 0x068: length=11
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.crestmp.hi:             32925079 ; 0x0a8: HOUR=0x17 DAYS=0xc MNTH=0x9 YEAR=0x7d9
kfdhdb.crestmp.lo:           2888263680 ; 0x0ac: USEC=0x0 MSEC=0x1da SECS=0x2 MINS=0x2b
kfdhdb.mntstmp.hi:             32925206 ; 0x0b0: HOUR=0x16 DAYS=0x10 MNTH=0x9 YEAR=0x7d9
kfdhdb.mntstmp.lo:           1862809600 ; 0x0b4: USEC=0x0 MSEC=0x20e SECS=0x30 MINS=0x1b
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000
kfdhdb.ausize:                  4194304 ; 0x0bc: 0x00400000
kfdhdb.mfact:                    454272 ; 0x0c0: 0x0006ee80
kfdhdb.dsksize:                    5119 ; 0x0c4: 0x000013ff
kfdhdb.pmcnt:                         2 ; 0x0c8: 0x00000002
kfdhdb.fstlocn:                       1 ; 0x0cc: 0x00000001
kfdhdb.altlocn:                       2 ; 0x0d0: 0x00000002
kfdhdb.f1b1locn:                      2 ; 0x0d4: 0x00000002
kfdhdb.redomirrors[0]:                0 ; 0x0d8: 0x0000
kfdhdb.redomirrors[1]:                0 ; 0x0da: 0x0000
kfdhdb.redomirrors[2]:                0 ; 0x0dc: 0x0000
kfdhdb.redomirrors[3]:                0 ; 0x0de: 0x0000
kfdhdb.dbcompat:              186646528 ; 0x0e0: 0x0b200000
kfdhdb.grpstmp.hi:             32925079 ; 0x0e4: HOUR=0x17 DAYS=0xc MNTH=0x9 YEAR=0x7d9
kfdhdb.grpstmp.lo:           2887388160 ; 0x0e8: USEC=0x0 MSEC=0x283 SECS=0x1 MINS=0x2b
kfdhdb.vfstart:                       0 ; 0x0ec: 0x00000000
kfdhdb.vfend:                         0 ; 0x0f0: 0x00000000
kfdhdb.spfile:                        0 ; 0x0f4: 0x00000000
kfdhdb.spfflg:                        0 ; 0x0f8: 0x00000000
kfdhdb.ub4spare[0]:                   0 ; 0x0fc: 0x00000000
kfdhdb.ub4spare[1]:                   0 ; 0x100: 0x00000000
kfdhdb.ub4spare[2]:                   0 ; 0x104: 0x00000000
kfdhdb.ub4spare[3]:                   0 ; 0x108: 0x00000000
kfdhdb.ub4spare[4]:                   0 ; 0x10c: 0x00000000
kfdhdb.ub4spare[5]:                   0 ; 0x110: 0x00000000
kfdhdb.ub4spare[6]:                   0 ; 0x114: 0x00000000
kfdhdb.ub4spare[7]:                   0 ; 0x118: 0x00000000
kfdhdb.ub4spare[8]:                   0 ; 0x11c: 0x00000000
kfdhdb.ub4spare[9]:                   0 ; 0x120: 0x00000000
kfdhdb.ub4spare[10]:                  0 ; 0x124: 0x00000000
kfdhdb.ub4spare[11]:                  0 ; 0x128: 0x00000000
kfdhdb.ub4spare[12]:                  0 ; 0x12c: 0x00000000
kfdhdb.ub4spare[13]:                  0 ; 0x130: 0x00000000
kfdhdb.ub4spare[14]:                  0 ; 0x134: 0x00000000
kfdhdb.ub4spare[15]:                  0 ; 0x138: 0x00000000
kfdhdb.ub4spare[16]:                  0 ; 0x13c: 0x00000000
kfdhdb.ub4spare[17]:                  0 ; 0x140: 0x00000000
kfdhdb.ub4spare[18]:                  0 ; 0x144: 0x00000000
kfdhdb.ub4spare[19]:                  0 ; 0x148: 0x00000000
kfdhdb.ub4spare[20]:                  0 ; 0x14c: 0x00000000
kfdhdb.ub4spare[21]:                  0 ; 0x150: 0x00000000
kfdhdb.ub4spare[22]:                  0 ; 0x154: 0x00000000
kfdhdb.ub4spare[23]:                  0 ; 0x158: 0x00000000
kfdhdb.ub4spare[24]:                  0 ; 0x15c: 0x00000000
kfdhdb.ub4spare[25]:                  0 ; 0x160: 0x00000000
kfdhdb.ub4spare[26]:                  0 ; 0x164: 0x00000000
kfdhdb.ub4spare[27]:                  0 ; 0x168: 0x00000000
kfdhdb.ub4spare[28]:                  0 ; 0x16c: 0x00000000
kfdhdb.ub4spare[29]:                  0 ; 0x170: 0x00000000
kfdhdb.ub4spare[30]:                  0 ; 0x174: 0x00000000
kfdhdb.ub4spare[31]:                  0 ; 0x178: 0x00000000
kfdhdb.ub4spare[32]:                  0 ; 0x17c: 0x00000000
kfdhdb.ub4spare[33]:                  0 ; 0x180: 0x00000000
kfdhdb.ub4spare[34]:                  0 ; 0x184: 0x00000000
kfdhdb.ub4spare[35]:                  0 ; 0x188: 0x00000000
kfdhdb.ub4spare[36]:                  0 ; 0x18c: 0x00000000
kfdhdb.ub4spare[37]:                  0 ; 0x190: 0x00000000
kfdhdb.ub4spare[38]:                  0 ; 0x194: 0x00000000
kfdhdb.ub4spare[39]:                  0 ; 0x198: 0x00000000
kfdhdb.ub4spare[40]:                  0 ; 0x19c: 0x00000000
kfdhdb.ub4spare[41]:                  0 ; 0x1a0: 0x00000000
kfdhdb.ub4spare[42]:                  0 ; 0x1a4: 0x00000000
kfdhdb.ub4spare[43]:                  0 ; 0x1a8: 0x00000000
kfdhdb.ub4spare[44]:                  0 ; 0x1ac: 0x00000000
kfdhdb.ub4spare[45]:                  0 ; 0x1b0: 0x00000000
kfdhdb.ub4spare[46]:                  0 ; 0x1b4: 0x00000000
kfdhdb.ub4spare[47]:                  0 ; 0x1b8: 0x00000000
kfdhdb.ub4spare[48]:                  0 ; 0x1bc: 0x00000000
kfdhdb.ub4spare[49]:                  0 ; 0x1c0: 0x00000000
kfdhdb.ub4spare[50]:                  0 ; 0x1c4: 0x00000000
kfdhdb.ub4spare[51]:                  0 ; 0x1c8: 0x00000000
kfdhdb.ub4spare[52]:                  0 ; 0x1cc: 0x00000000
kfdhdb.ub4spare[53]:                  0 ; 0x1d0: 0x00000000
kfdhdb.acdb.aba.seq:                  0 ; 0x1d4: 0x00000000
kfdhdb.acdb.aba.blk:                  0 ; 0x1d8: 0x00000000
kfdhdb.acdb.ents:                     0 ; 0x1dc: 0x0000
kfdhdb.acdb.ub2spare:                 0 ; 0x1de: 0x0000

 

What does it all mean?

For most usages the following rows are most important:

 

Status of disk:

kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER

Name of the disk:

kfdhdb.dskname:                DISK003A ; 0x028: length=8

 Name of disk group the disk belongs to:

kfdhdb.grpname:                   DATA2 ; 0x048: length=5

 Name of failure group the disk belongs to

kfdhdb.fgname:              CONTROLLER1 ; 0x068: length=11

Sector size of disk:

kfdhdb.secsize:                     512 ; 0x0b8: 0x0200

Blocksize of disk:

kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000

Provision string for use with asm:

kfdhdb.driver.provstr: ORCLDISKDISK003A

–> which finally means: “ORCL:DISK003A”

AU size:

kfdhdb.ausize:                  4194304

 

In a followup i will dig all little bit deeper in ASM and provide samples of different headers in different combinations (external / normal / high redundancy, ….).

Using ASM with files (either locally or via NFS)

From time to time i need to configure additional and temporary storage for data migration or just testing purposes. Raw Devices are not recommended to use from 11g onwards.

So why not use the loopback device for that? With this scenario you can even place these disks on NFS and use it across nodes in a cluster:

 

1. Create the file

dd if=/dev/zero bs=1024k count=1000 of=/disk1

 

2. Setup loopback driver

losetup /dev/loop1 /disk1

 

3. Label device

oracleasm createdisk LOOP1 /dev/loop1

 

4. Use it with ASM