In this post we will reproduce a more common scenario: We will forcefully remove the asm disk device from the operating system. This simulates common errors like a crashed storage system or some kind of interrupted connectivity (cable plugged, power failed, …).
All test are available here.
Testcase
ATTENTION: Removing a device from the operating system forcefully is dangerous and can destroy your data. Only do this i you´re certain what you are doing and you verified you have a valid and restoreable backup.
Steps for removing the device from operating system (Method 1)
For removing the disk from the operating system we will use the following command:
echo "scsi remove-single-device <host> <channel> <ID> <LUN>" > /proc/scsi/scsi
In order to find out which argument to use we will first query the physical device path belonging to each ASM disk:
ASM disk DISK001A based on /dev/sde1 [8, 65] ASM disk DISK001B based on /dev/sdb1 [8, 17] ASM disk DISK002A based on /dev/sdf1 [8, 81] ASM disk DISK002B based on /dev/sdc1 [8, 33] ASM disk DISK003A based on /dev/sdd1 [8, 49] ASM disk DISK003B based on /dev/sdg1 [8, 97]
We are focusing on disk DISK003B which is /dev/sdg1. To find out the scsi parameters we switch to the directory /sys/block which contains a list of all block devices. For device “sdg” which we want to remove we switch to directory “/sys/block/sdg”. The contents of the directory looks like this:
-r--r--r-- 1 root root 4096 Oct 5 11:49 dev lrwxrwxrwx 1 root root 0 Oct 5 11:49 device -> ../../devices/platform/host2/session2/target2:0:0/2:0:0:2 drwxr-xr-x 2 root root 0 Oct 5 11:49 holders drwxr-xr-x 3 root root 0 Oct 5 11:49 queue -r--r--r-- 1 root root 4096 Oct 5 11:49 range -r--r--r-- 1 root root 4096 Oct 5 11:49 removable drwxr-xr-x 3 root root 0 Oct 5 11:49 sdg1 -r--r--r-- 1 root root 4096 Oct 5 12:43 size drwxr-xr-x 2 root root 0 Oct 5 11:49 slaves -r--r--r-- 1 root root 4096 Oct 5 12:43 stat lrwxrwxrwx 1 root root 0 Oct 5 11:49 subsystem -> ../../block --w------- 1 root root 4096 Oct 5 12:43 uevent
The link named “device” contains the details we need. The string ends with “2:0:0:2” which contains the values required to remove the device in the right order (<host> <channel> <ID> <LUN>).
Steps for removing the device from operating system (Method 2)
Another way of removing a named device is to enter the appropriate directory under “/sys/block” (for instance: “/sys/block/sdg”) and issue:
echo 1 > delete
This will also remove the device from the operating system.
Putting load on the database
We will basically copy one table into another by issuing:
insert into test2 select * from test; commit;
While this statement runs we will remove the deivce from linux operating system which will disable access to the disk completely.
Removing the disk on node “rac1” only
In this test we will remove the disk on node “rac1” only while the disk keeps being accessible on node “rac2”.
/var/log/message file on node “rac1”
Oct 5 13:14:34 rac1 kernel: scsi 2:0:0:2: rejecting I/O to dead device
Database Alert.log on node “rac1”
WARNING: IO Failed. group:1 disk(number.incarnation):2.0xe9689725 disk_path:ORCL:DISK003B AU:2625 disk_offset(bytes):11012649984 io_size:4096 operation:Write type:asynchronous result:I/O error process_id:5259 Errors in file /u01/app/oracle/diag/rdbms/ora11p/ora11p1/trace/ora11p1_dbw0_5257.trc: ORA-15080: synchronous I/O operation to a disk failed WARNING: failed to write mirror side 2 of virtual extent 105 logical extent 1 of file 321 in group 1 on disk 2 allocation unit 3007 NOTE: process 5257 initiating offline of disk 2.3915945765 (DISK003B) with mask 0x7e in group 1 Errors in file /u01/app/oracle/diag/rdbms/ora11p/ora11p1/trace/ora11p1_lgwr_5259.trc: ORA-15080: synchronous I/O operation to a disk failed WARNING: failed to write mirror side 2 of virtual extent 5 logical extent 1 of file 260 in group 1 on disk 2 allocation unit 2625 WARNING: IO Failed. group:1 disk(number.incarnation):2.0xe9689725 disk_path:ORCL:DISK003B AU:3664 disk_offset(bytes):15371264000 io_size:262144 operation:Read type:synchronous result:I/O error process_id:20103 WARNING: failed to read mirror side 1 of virtual extent 762 logical extent 0 of file 321 in group DATA2 from disk DISK003B allocation unit 3664 reason error; if possible,will try another mirror side NOTE: successfully read mirror side 2 of virtual extent 762 logical extent 1 of file 321 in group DATA2 from disk DISK003A allocation unit 3664 Mon Oct 05 13:14:37 2009 WARNING: IO Failed. group:1 disk(number.incarnation):2.0xe9689725 disk_path:ORCL:DISK003B AU:39 disk_offset(bytes):163627008 io_size:16384 operation:Write type:asynchronous result:I/O error process_id:5261
ASM Alert.log on node “rac1”
Mon Oct 05 13:14:35 2009 NOTE: repairing group 1 file 321 extent 762 Mon Oct 05 13:14:35 2009 NOTE: process 20160 initiating offline of disk 2.3915945765 (DISK003B) with mask 0x7e in group 1 WARNING: Disk DISK003B in mode 0x7f is now being taken offline WARNING: IO Failed. group:1 disk(number.incarnation):2.0xe9689725 disk_path:ORCL:DISK003B AU:0 disk_offset(bytes):0 io_size:4096 operation:Read type:synchronous result:I/O error process_id:20382 WARNING: block repair initiating disk offline NOTE: initiating PST update: grp = 1, dsk = 2/0xe9689725, mode = 0x15 kfdp_updateDsk(): 38 Mon Oct 05 13:14:35 2009 kfdp_updateDskBg(): 38 NOTE: group DATA2: updated PST location: disk 0001 (PST copy 0) NOTE: PST update grp = 1 completed successfully NOTE: initiating PST update: grp = 1, dsk = 2/0xe9689725, mode = 0x1 kfdp_updateDsk(): 39 kfdp_updateDskBg(): 39 NOTE: group DATA2: updated PST location: disk 0001 (PST copy 0) NOTE: PST update grp = 1 completed successfully Mon Oct 05 13:14:41 2009 NOTE: cache closing disk 2 of grp 1: DISK003B SUCCESS: extent 762 of file 321 group 1 repaired - all online mirror sides found readable, no repair required NOTE: repairing group 1 file 321 extent 762 SUCCESS: extent 762 of file 321 group 1 repaired - all online mirror sides found readable, no repair required Mon Oct 05 13:15:19 2009 GMON SlaveB: Deferred DG Ops completed.
ASM Alert.log on node “rac2”
Mon Oct 05 13:14:35 2009 WARNING: Disk DISK003B in mode 0x7f is now being offlined Mon Oct 05 13:14:38 2009 kfdp_queryTimeout(DATA2) kfdp_queryTimeout(DATA2) NOTE: cache closing disk 2 of grp 1: DISK003B Mon Oct 05 13:17:27 2009 WARNING: Disk (DISK003B) will be dropped in: (12960) secs on ASM inst: (2) GMON SlaveB: Deferred DG Ops completed.
SQL message
SQL> insert into test2 select * from test; 67108864 rows created.
The instance kept running and the statement continued to work. This was expected because our disk group runs with normal redundancy (i.e. a 2-way-mirror).
Adding the disk
Regardless if your SAN connection was interrupted, network down or your SCSI cables pluged out – you should first of all identify the error, fix it and add the disk back to the ASM instance again.
Adding the disk back to ASM requires you to choose between one of the following scenarios:
- Resync
- Full Rebuild
Resynchronizing the disk requires the error to be fixed before ASM drops the disk from the disk group automatically. By default this timeout is set to 14400 seconds (i.e. 4 hours). You adjust this value accordingly. Disk added within this timeframe will be resynchronized by synchronizing only those extents which have changed. This will result in a much lower resilvering time.
Full Rebuild: Adding a blank disk (i.e. which contains no data (anymore) )
If your disk failed due to a failure which caused the data on the disk to be destroyed you need to drop the failed disk from the disk group anyway and add the disk (with the same or a different name) to the disk group again. This will force ASM to do a full rebalance which takes some time depending on the disk size. The procedure was described in previous posts.
Resynchronizing the disk: Bringing the disk back online
In our testcase we removed the device from the operating system and simluated a hardware error. The data on disk is still intact but represents data consistent to a previous point in time.While the disk was offline data changed. These changes needs to be resynchronized.
Adding to disk to the operating system can be done by rescanning the iSCSI connection:
iscsiadm -m session -R
Afterwards the former device /dev/sdg appears as /dev/sdi (or any other device name):
Disk /dev/sdi: 21.4 GB, 21474836480 bytes
64 heads, 32 sectors/track, 20480 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Device Boot Start End Blocks Id System
/dev/sdi1 1 20480 20971504 83 Linux
We need the ASM library to re-scan all disks in order to recognize a disk was added:
[root@rac1 ~]# oracleasm scandisks Reloading disk partitions: done Cleaning any stale ASM disks... Cleaning disk "DISK003B" Scanning system for ASM disks... Instantiating disk "DISK003B"
In the example output ASM recognized a disk with label “DISK003B” was removed and another disk with label “DISK003B” was added to the system. Our ASM mapping script reflects this change as well:
[root@rac1 ~]# ./chk_asm_mapping.sh ASM disk DISK001A based on /dev/sde1 [8, 65] ASM disk DISK001B based on /dev/sdb1 [8, 17] ASM disk DISK002A based on /dev/sdf1 [8, 81] ASM disk DISK002B based on /dev/sdc1 [8, 33] ASM disk DISK003A based on /dev/sdd1 [8, 49] ASM disk DISK003B based on /dev/sdi1 [8, 129]
Now we bring disk “DISK003B” back online:
SQL> alter diskgroup data2 online disk DISK003B; Diskgroup altered.
The log file from the ASM instance on node “rac1”:
SQL> alter diskgroup data2 online disk DISK003B Mon Oct 05 14:12:14 2009 NOTE: initiating online disk group 1 disks NOTE: GroupBlock outside rolling migration privileged region NOTE: requesting all-instance membership refresh for group=1 NOTE: F1X0 copy 2 relocating from 2:2 to 2:4294967294 for diskgroup 1 (DATA2) NOTE: initiating PST update: grp = 1, dsk = 2/0x0, mode = 0x19 kfdp_updateDsk(): 40 Mon Oct 05 14:12:15 2009 kfdp_updateDskBg(): 40 NOTE: group DATA2: updated PST location: disk 0001 (PST copy 0) NOTE: PST update grp = 1 completed successfully NOTE: Found ORCL:DISK003B for disk DISK003B WARNING: ignoring disk in deep discovery NOTE: requesting all-instance disk validation for group=1 Mon Oct 05 14:12:17 2009 NOTE: disk validation pending for group 1/0x89a867e5 (DATA2) NOTE: cache opening disk 2 of grp 1: DISK003B label:DISK003B SUCCESS: validated disks for 1/0x89a867e5 (DATA2) kfdp_query(DATA2): 41 kfdp_queryBg(): 41 NOTE: membership refresh pending for group 1/0x89a867e5 (DATA2) kfdp_query(DATA2): 42 kfdp_queryBg(): 42 SUCCESS: refreshed membership for 1/0x89a867e5 (DATA2) NOTE: initiating PST update: grp = 1, dsk = 2/0x0, mode = 0x5d kfdp_updateDsk(): 43 SUCCESS: alter diskgroup data2 online disk DISK003B kfdp_updateDskBg(): 43 NOTE: group DATA2: updated PST location: disk 0001 (PST copy 0) NOTE: group DATA2: updated PST location: disk 0002 (PST copy 1) NOTE: PST update grp = 1 completed successfully NOTE: initiating PST update: grp = 1, dsk = 2/0x0, mode = 0x7d kfdp_updateDsk(): 44 kfdp_updateDskBg(): 44 NOTE: group DATA2: updated PST location: disk 0001 (PST copy 0) NOTE: group DATA2: updated PST location: disk 0002 (PST copy 1) Mon Oct 05 14:12:25 2009 NOTE: PST update grp = 1 completed successfully NOTE: Voting File refresh pending for group 1/0x89a867e5 (DATA2) NOTE: F1X0 copy 2 relocating from 2:4294967294 to 2:2 for diskgroup 1 (DATA2) Mon Oct 05 14:25:45 2009 NOTE: initiating PST update: grp = 1, dsk = 2/0x0, mode = 0x7f kfdp_updateDsk(): 45 Mon Oct 05 14:25:45 2009 kfdp_updateDskBg(): 45 NOTE: group DATA2: updated PST location: disk 0001 (PST copy 0) NOTE: group DATA2: updated PST location: disk 0002 (PST copy 1) NOTE: PST update grp = 1 completed successfully NOTE: reset timers for disk: 2 NOTE: completed online of disk group 1 disks
And corresponding asm alert log file on node “rac2”:
Mon Oct 05 14:12:16 2009 NOTE: disk validation pending for group 1/0x89a86821 (DATA2) NOTE: Found ORCL:DISK003B for disk DISK003B WARNING: ignoring disk in deep discovery NOTE: cache opening disk 2 of grp 1: DISK003B label:DISK003B SUCCESS: validated disks for 1/0x89a86821 (DATA2) NOTE: membership refresh pending for group 1/0x89a86821 (DATA2) kfdp_query(DATA2): 39 kfdp_queryBg(): 39 SUCCESS: refreshed membership for 1/0x89a86821 (DATA2) kfdp_queryTimeout(DATA2) kfdp_queryTimeout(DATA2) NOTE: Voting File refresh pending for group 1/0x89a86821 (DATA2) Mon Oct 05 14:25:47 2009 kfdp_queryTimeout(DATA2)
Conclusion
ASM does not bother if one or more disk belonging to the same failure group fail and you are running in normal or high redundancy configuration. Your system continues to run normally. You can fix the error and re-add the disk without problems.
If you are running with external redundancy you´re in trouble – your system will go down/crash/become unavailable. So for a highly/highest available system you should use at least normal or even high redundancy.
So let me give you the following recommendations for configuring systems highly available:
- in ASM:
- use normal or even high redundancy for data files and FRA
- use high redundancy for OCR and Voting disks
- in a SAN configuration:
- have at least TWO san adapters with TWO ports each
- have at least TWO fabrics (that means two physically separated networks)
- have at least TWO connections to each fabric while using one port on hba A and one port on hba B
- run a multipathing software
- in NAS configuations:
- have at least TWO, better four network connections
- do not use nic ports on the mainboard together… use a port on the main board and a port on an external nic card
- have at least TWO physically separated switches
- use some kind of bonding software to configure an active/passive or active/active connection
Pingback: Ronny Egners Blog » ASM resilvering – or – how to recover your crashed cluster
Very nice post. I thought to let you know that you website isn’t getting displayed properly on msie-mobile browser on my mobile phone.
I hope that more and more number of web site owners would reckon the fact that there is an ever growing number of users browsing webpages on the mobile.
Regards
Pingback: Blogroll Report 23/10/2009-30/10/2009 « Coskan’s Approach to Oracle