ASM resilvering – or – how to recover your crashed cluster – Test no 5

In this post we will reproduce a more common scenario: We will forcefully remove the asm disk device from the operating system. This simulates common errors like a crashed storage system or some kind of interrupted connectivity (cable plugged, power failed, …).

All test are available here.

Testcase

ATTENTION: Removing a device from the operating system forcefully is dangerous and can destroy your data. Only do this i you´re certain what you are doing and you verified you have a valid and restoreable backup.

Steps for removing the device from operating system (Method 1)

For removing the disk from the operating system we will use the following command:

echo "scsi remove-single-device <host> <channel> <ID> <LUN>" > /proc/scsi/scsi

In order to find out which argument to use we will first query the physical device path belonging to each ASM disk:

ASM disk DISK001A based on /dev/sde1 [8, 65]
ASM disk DISK001B based on /dev/sdb1 [8, 17]
ASM disk DISK002A based on /dev/sdf1 [8, 81]
ASM disk DISK002B based on /dev/sdc1 [8, 33]
ASM disk DISK003A based on /dev/sdd1 [8, 49]
ASM disk DISK003B based on /dev/sdg1 [8, 97]

We are focusing on disk DISK003B which is /dev/sdg1. To find out the scsi parameters we switch to the directory /sys/block which contains a list of all block devices. For device “sdg” which we want to remove we switch to directory “/sys/block/sdg”. The contents of the directory looks like this:

-r--r--r-- 1 root root 4096 Oct  5 11:49 dev
lrwxrwxrwx 1 root root    0 Oct  5 11:49 device ->
   ../../devices/platform/host2/session2/target2:0:0/2:0:0:2
drwxr-xr-x 2 root root    0 Oct  5 11:49 holders
drwxr-xr-x 3 root root    0 Oct  5 11:49 queue
-r--r--r-- 1 root root 4096 Oct  5 11:49 range
-r--r--r-- 1 root root 4096 Oct  5 11:49 removable
drwxr-xr-x 3 root root    0 Oct  5 11:49 sdg1
-r--r--r-- 1 root root 4096 Oct  5 12:43 size
drwxr-xr-x 2 root root    0 Oct  5 11:49 slaves
-r--r--r-- 1 root root 4096 Oct  5 12:43 stat
lrwxrwxrwx 1 root root    0 Oct  5 11:49 subsystem -> ../../block
--w------- 1 root root 4096 Oct  5 12:43 uevent

The link named “device” contains the details we need. The string ends with “2:0:0:2” which contains the values required to remove the device in the right order (<host> <channel> <ID> <LUN>).

Steps for removing the device from operating system (Method 2)

Another way of removing a named device is to enter the appropriate directory under “/sys/block” (for instance: “/sys/block/sdg”) and issue:

echo 1 > delete

This will also remove the device from the operating system.

Putting load on the database

We will basically copy one table into another by issuing:

insert into test2 select * from test;
commit;

While this statement runs we will remove the deivce from linux operating system which will disable access to the disk completely.

Removing the disk on node “rac1” only

In this test we will remove the disk on node “rac1” only while the disk keeps being accessible on node “rac2”.

/var/log/message file on node “rac1”

Oct  5 13:14:34 rac1 kernel: scsi 2:0:0:2: rejecting I/O to dead device

Database Alert.log on node “rac1”

WARNING: IO Failed. group:1 disk(number.incarnation):2.0xe9689725 disk_path:ORCL:DISK003B
 AU:2625 disk_offset(bytes):11012649984 io_size:4096 operation:Write type:asynchronous
 result:I/O error process_id:5259
Errors in file /u01/app/oracle/diag/rdbms/ora11p/ora11p1/trace/ora11p1_dbw0_5257.trc:
ORA-15080: synchronous I/O operation to a disk failed
WARNING: failed to write mirror side 2 of virtual extent 105 logical extent 1 of file 321 in group 1 on disk 2 allocation unit 3007
NOTE: process 5257 initiating offline of disk 2.3915945765 (DISK003B) with mask 0x7e in group 1
Errors in file /u01/app/oracle/diag/rdbms/ora11p/ora11p1/trace/ora11p1_lgwr_5259.trc:
ORA-15080: synchronous I/O operation to a disk failed
WARNING: failed to write mirror side 2 of virtual extent 5 logical extent 1 of file 260 in group 1 on disk 2 allocation unit 2625
WARNING: IO Failed. group:1 disk(number.incarnation):2.0xe9689725 disk_path:ORCL:DISK003B
 AU:3664 disk_offset(bytes):15371264000 io_size:262144 operation:Read type:synchronous
 result:I/O error process_id:20103
WARNING: failed to read mirror side 1 of virtual extent 762 logical extent 0 of file 321 in group DATA2 from disk DISK003B  allocation unit 3664 reason error; if possible,will try another mirror side
NOTE: successfully read mirror side 2 of virtual extent 762 logical extent 1 of file 321 in group DATA2 from disk DISK003A allocation unit 3664
Mon Oct 05 13:14:37 2009
WARNING: IO Failed. group:1 disk(number.incarnation):2.0xe9689725 disk_path:ORCL:DISK003B
 AU:39 disk_offset(bytes):163627008 io_size:16384 operation:Write type:asynchronous
 result:I/O error process_id:5261

ASM Alert.log on node “rac1”

Mon Oct 05 13:14:35 2009
NOTE: repairing group 1 file 321 extent 762
Mon Oct 05 13:14:35 2009
NOTE: process 20160 initiating offline of disk 2.3915945765 (DISK003B) with mask 0x7e in group 1
WARNING: Disk DISK003B in mode 0x7f is now being taken offline
WARNING: IO Failed. group:1 disk(number.incarnation):2.0xe9689725 disk_path:ORCL:DISK003B
 AU:0 disk_offset(bytes):0 io_size:4096 operation:Read type:synchronous
 result:I/O error process_id:20382
WARNING: block repair initiating disk offline
NOTE: initiating PST update: grp = 1, dsk = 2/0xe9689725, mode = 0x15
kfdp_updateDsk(): 38
Mon Oct 05 13:14:35 2009
kfdp_updateDskBg(): 38
NOTE: group DATA2: updated PST location: disk 0001 (PST copy 0)
NOTE: PST update grp = 1 completed successfully
NOTE: initiating PST update: grp = 1, dsk = 2/0xe9689725, mode = 0x1
kfdp_updateDsk(): 39
kfdp_updateDskBg(): 39
NOTE: group DATA2: updated PST location: disk 0001 (PST copy 0)
NOTE: PST update grp = 1 completed successfully
Mon Oct 05 13:14:41 2009
NOTE: cache closing disk 2 of grp 1: DISK003B
SUCCESS: extent 762 of file 321 group 1 repaired - all online mirror sides found readable, no repair required
NOTE: repairing group 1 file 321 extent 762
SUCCESS: extent 762 of file 321 group 1 repaired - all online mirror sides found readable, no repair required
Mon Oct 05 13:15:19 2009
GMON SlaveB: Deferred DG Ops completed.

ASM Alert.log on node “rac2”

Mon Oct 05 13:14:35 2009
WARNING: Disk DISK003B in mode 0x7f is now being offlined
Mon Oct 05 13:14:38 2009
kfdp_queryTimeout(DATA2)
kfdp_queryTimeout(DATA2)
NOTE: cache closing disk 2 of grp 1: DISK003B
Mon Oct 05 13:17:27 2009
WARNING: Disk (DISK003B) will be dropped in: (12960) secs on ASM inst: (2)
GMON SlaveB: Deferred DG Ops completed.

SQL message

SQL> insert into test2 select * from test;
67108864 rows created.

The instance kept running and the statement continued to work. This was expected because our disk group runs with normal redundancy (i.e. a 2-way-mirror).

Adding the disk

Regardless if your SAN connection was interrupted, network down or your SCSI cables pluged out – you should first of all identify the error, fix it and add the disk back to the ASM instance again.

Adding the disk back to ASM requires you to choose between one of the following scenarios:

Resync
Full Rebuild

Resynchronizing the disk requires the error to be fixed before ASM drops the disk from the disk group automatically. By default this timeout is set to 14400 seconds (i.e. 4 hours). You adjust this value accordingly. Disk added within this timeframe will be resynchronized by synchronizing only those extents which have changed. This will result in a much lower resilvering time.

Full Rebuild: Adding a blank disk (i.e. which contains no data (anymore) )

If your disk failed due to a failure which caused the data on the disk to be destroyed you need to drop the failed disk from the disk group anyway and add the disk (with the same or a different name) to the disk group again. This will force ASM to do a full rebalance which takes some time depending on the disk size. The procedure was described in previous posts.

Resynchronizing the disk: Bringing the disk back online

In our testcase we removed the device from the operating system and simluated a hardware error. The data on disk is still intact but represents data consistent to a previous point in time.While the disk was offline data changed. These changes needs to be resynchronized.

Adding to disk to the operating system can be done by rescanning the iSCSI connection:

iscsiadm -m session -R

Afterwards the former device /dev/sdg appears as /dev/sdi (or any other device name):

Disk /dev/sdi: 21.4 GB, 21474836480 bytes
64 heads, 32 sectors/track, 20480 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Device Boot Start End Blocks Id System
/dev/sdi1 1 20480 20971504 83 Linux

We need the ASM library to re-scan all disks in order to recognize a disk was added:

[root@rac1 ~]# oracleasm scandisks
Reloading disk partitions: done
Cleaning any stale ASM disks...
Cleaning disk "DISK003B"
Scanning system for ASM disks...
Instantiating disk "DISK003B"

In the example output ASM recognized a disk with label “DISK003B” was removed and another disk with label “DISK003B” was added to the system. Our ASM mapping script reflects this change as well:

[root@rac1 ~]# ./chk_asm_mapping.sh
ASM disk DISK001A based on /dev/sde1 [8, 65]
ASM disk DISK001B based on /dev/sdb1 [8, 17]
ASM disk DISK002A based on /dev/sdf1 [8, 81]
ASM disk DISK002B based on /dev/sdc1 [8, 33]
ASM disk DISK003A based on /dev/sdd1 [8, 49]
ASM disk DISK003B based on /dev/sdi1 [8, 129]

Now we bring disk “DISK003B” back online:

SQL> alter diskgroup data2 online disk DISK003B;
Diskgroup altered.

The log file from the ASM instance on node “rac1”:

SQL> alter diskgroup data2 online disk DISK003B
Mon Oct 05 14:12:14 2009
NOTE: initiating online disk group 1 disks
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=1
NOTE: F1X0 copy 2 relocating from 2:2 to 2:4294967294 for diskgroup 1 (DATA2)
NOTE: initiating PST update: grp = 1, dsk = 2/0x0, mode = 0x19
kfdp_updateDsk(): 40
Mon Oct 05 14:12:15 2009
kfdp_updateDskBg(): 40
NOTE: group DATA2: updated PST location: disk 0001 (PST copy 0)
NOTE: PST update grp = 1 completed successfully
NOTE: Found ORCL:DISK003B for disk DISK003B
WARNING: ignoring disk  in deep discovery
NOTE: requesting all-instance disk validation for group=1
Mon Oct 05 14:12:17 2009
NOTE: disk validation pending for group 1/0x89a867e5 (DATA2)
NOTE: cache opening disk 2 of grp 1: DISK003B label:DISK003B
SUCCESS: validated disks for 1/0x89a867e5 (DATA2)
kfdp_query(DATA2): 41
kfdp_queryBg(): 41
NOTE: membership refresh pending for group 1/0x89a867e5 (DATA2)
kfdp_query(DATA2): 42
kfdp_queryBg(): 42
SUCCESS: refreshed membership for 1/0x89a867e5 (DATA2)
NOTE: initiating PST update: grp = 1, dsk = 2/0x0, mode = 0x5d
kfdp_updateDsk(): 43
SUCCESS: alter diskgroup data2 online disk DISK003B
kfdp_updateDskBg(): 43
NOTE: group DATA2: updated PST location: disk 0001 (PST copy 0)
NOTE: group DATA2: updated PST location: disk 0002 (PST copy 1)
NOTE: PST update grp = 1 completed successfully
NOTE: initiating PST update: grp = 1, dsk = 2/0x0, mode = 0x7d
kfdp_updateDsk(): 44
kfdp_updateDskBg(): 44
NOTE: group DATA2: updated PST location: disk 0001 (PST copy 0)
NOTE: group DATA2: updated PST location: disk 0002 (PST copy 1)
Mon Oct 05 14:12:25 2009
NOTE: PST update grp = 1 completed successfully
NOTE: Voting File refresh pending for group 1/0x89a867e5 (DATA2)
NOTE: F1X0 copy 2 relocating from 2:4294967294 to 2:2 for diskgroup 1 (DATA2)
Mon Oct 05 14:25:45 2009
NOTE: initiating PST update: grp = 1, dsk = 2/0x0, mode = 0x7f
kfdp_updateDsk(): 45
Mon Oct 05 14:25:45 2009
kfdp_updateDskBg(): 45
NOTE: group DATA2: updated PST location: disk 0001 (PST copy 0)
NOTE: group DATA2: updated PST location: disk 0002 (PST copy 1)
NOTE: PST update grp = 1 completed successfully
NOTE: reset timers for disk: 2
NOTE: completed online of disk group 1 disks

And corresponding asm alert log file on node “rac2”:

Mon Oct 05 14:12:16 2009
NOTE: disk validation pending for group 1/0x89a86821 (DATA2)
NOTE: Found ORCL:DISK003B for disk DISK003B
WARNING: ignoring disk  in deep discovery
NOTE: cache opening disk 2 of grp 1: DISK003B label:DISK003B
SUCCESS: validated disks for 1/0x89a86821 (DATA2)
NOTE: membership refresh pending for group 1/0x89a86821 (DATA2)
kfdp_query(DATA2): 39
kfdp_queryBg(): 39
SUCCESS: refreshed membership for 1/0x89a86821 (DATA2)
kfdp_queryTimeout(DATA2)
kfdp_queryTimeout(DATA2)
NOTE: Voting File refresh pending for group 1/0x89a86821 (DATA2)
Mon Oct 05 14:25:47 2009
kfdp_queryTimeout(DATA2)

Conclusion

ASM does not bother if one or more disk belonging to the same failure group fail and you are running in normal or high redundancy configuration. Your system continues to run normally. You can fix the error and re-add the disk without problems.

If you are running with external redundancy you´re in trouble – your system will go down/crash/become unavailable. So for a highly/highest available system you should use at least normal or even high redundancy.

So let me give you the following recommendations for configuring systems highly available:

in ASM:
- use normal or even high redundancy for data files and FRA
- use high redundancy for OCR and Voting disks
in a SAN configuration:
- have at least TWO san adapters with TWO ports each
- have at least TWO fabrics (that means two physically separated networks)
- have at least TWO connections to each fabric while using one port on hba A and one port on hba B
- run a multipathing software
in NAS configuations:
- have at least TWO, better four network connections
- do not use nic ports on the mainboard together… use a port on the main board and a port on an external nic card
- have at least TWO physically separated switches
- use some kind of bonding software to configure an active/passive or active/active connection

3 Responses to ASM resilvering – or – how to recover your crashed cluster – Test no 5

Pingback: Ronny Egners Blog » ASM resilvering – or – how to recover your crashed cluster
Anonymous says:

November 5, 2009 at 02:07

Very nice post. I thought to let you know that you website isn’t getting displayed properly on msie-mobile browser on my mobile phone.

I hope that more and more number of web site owners would reckon the fact that there is an ever growing number of users browsing webpages on the mobile.
Regards

Pingback: Blogroll Report 23/10/2009-30/10/2009 « Coskan’s Approach to Oracle