Test #4: Corrupting the ASM disk with ASM disk group being online and active
After overwriting the ASM disk header while the disk group was offline we will now put some load on the full running cluster and corrupt the asm disk slightly.
After overwriting the ASM disk header while the disk group was offline we will now put some load on the full running cluster and corrupt the asm disk slightly.
The following post is a summary of my tests with DBFS on a 11g Release 2 RAC.
Since the introduction of LOBs the database was used to store all kinds of files, for instance images, audio files, video files or even text files. Starting with Oracle 11g Release 1 introduced SecureFiles which introduced compression and de-duplication.
Oracle Database Filesystem (DBFS) creates a mountable file system which can be used to access files stored in the database as SecureFile LOBs. DBFS is a shared file system like NFS and consists of a server (the database) and a client. Just like a traditional database connection server and client can be on different systems which communicate over SQLNet. The same applies to DBFS as well.
The following figure illustrates DBFS:
(Source: http://download.oracle.com/docs/cd/E11882_01/appdev.112/e10645/adlob_fs.htm)
Recently i came across a problem i thought was fixed since 10g:
After a fresh installation of oracle 10g when trying to connect to the instance (for instance to create the database for the first time) you get the following error:
oracle@host#:> sqlplus /nolog SQL*Plus: Release 10.2.0.4.0 - Production on Do Oct 8 08:23:18 2009 Copyright (c) 1982, 2007, Oracle. All Rights Reserved. SQL> connect / as sysdba ERROR: ORA-12157: TNS: internal network communication error.
This error is related to the linking phase when installing oracle. In your environment used during installation the variable TMPDIR pointed to a non existing directory. When linking oracle some adapters (like tcp) were not linked correctly. You can verify this with:
oracle@host#:> cd $ORACLE_HOME oracle@host#:~/bin> adapters oracle /u01/app/oracle/product/oracle/bin/adapters: line 70: /u01/app/oracle/product/oracle/temp/naetab18456: no such file or directory /u01/app/oracle/product/oracle/bin/adapters: line 71: /u01/app/oracle/product/oracle/temp/nautab18456: no such file or directory /u01/app/oracle/product/oracle/bin/adapters: line 296: /u01/app/oracle/product/oracle/temp/exec18456: no such file or directory Oracle Net transport protocols linked with oracle are: Oracle Net naming methods linked with oracle are: Oracle Advanced Security options linked with oracle are:
The solution for this is to relink oracle again. Make sure you create the missing $TMPDIR first or change $TMPDIR to point to a existing and writeable directory:
oracle@host#:> cd $ORACLE_HOME/network/lib oracle@host#:> make -f ins_net_client.mk ntcontab.o oracle@host#:> cd $ORACLE_HOME/bin oracle@host#:> genclntsh oracle@host#:> cd $ORACLE_HOME/rdbms/lib oracle@host#:> make -f ins_rdbms.mk install oracle@host#:> cd $ORACLE_HOME/sqlplus/lib oracle@host#:> make -f ins_sqlplus.mk install oracle@host#:> cd $ORACLE_HOME/network/lib oracle@host#:> make -f ins_net_client.mk install oracle@host#:> cd $ORACLE_HOME/network/lib oracle@host#:> make -f ins_net_server.mk install
You can verify it by running “adapters oracle” again:
oracle@host#:> cd $ORACLE_HOME oracle@host#:~/bin> adapters oracle Oracle Net transport protocols linked with oracle are: IPC BEQ TCP/IP SSL SDP/IB RAW Oracle Net naming methods linked with oracle are: Local Naming (tnsnames.ora) Oracle Directory Naming Oracle Host Naming Oracle Advanced Security options linked with oracle are: RC4 40-bit encryption RC4 56-bit encryption RC4 128-bit encryption RC4 256-bit encryption DES40 40-bit encryption DES 56-bit encryption 3DES 112-bit encryption 3DES 168-bit encryption AES 128-bit encryption AES 192-bit encryption AES 256-bit encryption MD5 crypto-checksumming SHA-1 crypto-checksumming Kerberos v5 authentication RADIUS authentication
oracle@host#:> cd $ORACLE_HOME oracle@host#:~/bin> adapters oracle/u01/app/oracle/product/oracle/bin/adapters: line 70:
/u01/app/oracle/product/oracle/temp/naetab18456: no such file or directory/u01/app/oracle/product/oracle/bin/adapters: line 71:
/u01/app/oracle/product/oracle/temp/nautab18456: no such file or directory/u01/app/oracle/product/oracle/bin/adapters: line 296:
/u01/app/oracle/product/oracle/temp/exec18456: no such file or directoryOracle Net transport protocols linked with oracle are:
Oracle Net naming methods linked with oracle are:
Oracle Advanced Security options linked with oracle are:
In this post we will dig a little bit deeper with ASM. We will overwrite the ASM disk header on one disk while the disk group is offline, check the results and repair the LUN again.
dd if=/dev/random bs=8k count=1 of=/dev/sdg1
SQL> alter diskgroup DATA2 mount; alter diskgroup data2 mount * ERROR at line 1: ORA-15032: not all alterations performed ORA-15040: diskgroup is incomplete ORA-15042: ASM disk "0" is missing from group number "2"
This short post covers some useful ASM queries and scripts.
The following is a script i found somewhere on the web. Unfortunately it was not totally correct for newer asm libraries. The script itself:
/etc/init.d/oracleasm querydisk -d `/etc/init.d/oracleasm listdisks -d` |
cut -f2,10,11 -d" " | perl -pe 's/"(.*)".*\[(.*), *(.*)\]/$1 $2 $3/g;' |
while read v_asmdisk v_minor v_major
do
v_device=`ls -la /dev | grep " $v_minor, *$v_major " | awk '{print $10}'`
echo "ASM disk $v_asmdisk based on /dev/$v_device [$v_minor, $v_major]"
done
When started it produces the following output:
ASM disk DISK001A based on /dev/sdc1 [8, 33] ASM disk DISK001B based on /dev/sdb1 [8, 17] ASM disk DISK002A based on /dev/sdd1 [8, 49] ASM disk DISK002B based on /dev/sde1 [8, 65] ASM disk DISK003A based on /dev/sdg1 [8, 97] ASM disk DISK003B based on /dev/sdf1 [8, 81]
set pages 40000 lines 120 col NAME for a15 select GROUP_NUMBER DG#, name, ALLOCATION_UNIT_SIZE AU_SZ, STATE, TYPE, TOTAL_MB, FREE_MB, OFFLINE_DISKS from v$asm_diskgroup;
DG# NAME AU_SZ STATE TYPE TOTAL_MB FREE_MB OFFLINE_DISKS --- ----- ------- -------- ------ ------ ------- ------------- 1 DATA2 4194304 MOUNTED NORMAL 40952 4168 0 2 DATA 1048576 MOUNTED NORMAL 30717 25667 0
set pages 40000 lines 120 col PATH for a30 select DISK_NUMBER,MOUNT_STATUS,HEADER_STATUS,MODE_STATUS,STATE, PATH FROM V$ASM_DISK; DISK_NUMBER MOUNT_S HEADER_STATU MODE_ST STATE PATH ----------- ------- ------------ ------- -------- ------------- 0 CLOSED MEMBER ONLINE NORMAL ORCL:DISK001A 3 CLOSED MEMBER ONLINE NORMAL ORCL:DISK003B 1 CACHED MEMBER ONLINE NORMAL ORCL:DISK001B 2 CACHED MEMBER ONLINE NORMAL ORCL:DISK002A 3 CACHED MEMBER ONLINE NORMAL ORCL:DISK002B
The following query combines ASM disk and diskgroup information. You can edit this query to suit your needs easily. If you use ASM files instead of disks you have to use v$asm_file instead of v$asm_disk. If you use ASM files you have to add v$asm_file to the query.
col PATH for a15 col DG_NAME for a15 col DG_STATE for a10 col FAILGROUP for a10 select dg.name dg_name, dg.state dg_state, dg.type, d.disk_number dsk_no, d.path, d.mount_status, d.FAILGROUP, d.state from v$asm_diskgroup dg, v$asm_disk d where dg.group_number=d.group_number order by dg_name, dsk_no; DG_NAME DG_STATE TYPE DSK_NO PATH MOUNT_S FAILGROUP STATE -------- -------- ------ ------ ------------- ------- ---------- -------- DATA MOUNTED NORMAL 1 ORCL:DISK001B CACHED DISK001B NORMAL DATA MOUNTED NORMAL 2 ORCL:DISK002A CACHED DISK002A NORMAL DATA MOUNTED NORMAL 3 ORCL:DISK002B CACHED DISK002B NORMAL DATA2 MOUNTED NORMAL 1 ORCL:DISK003A CACHED DISK003A NORMAL DATA2 MOUNTED NORMAL 2 ORCL:DISK003B CACHED DISK003B NORMAL
select GROUP_NUMBER, OPERATION, STATE, ACTUAL, SOFAR, EST_MINUTES from v$asm_operation;
If there is a operating going on (like rebalancing) the query will return some rows. For instance for our just added disk the query yields:
GROUP_NUMBER OPERA STAT ACTUAL SOFAR EST_MINUTES ------------ ----- ---- ---------- ---------- ----------- 2 REBAL RUN 1 49 16
In this post we will terminating the private network interconnect between the two nodes by moving node “rac1” lan interfaces to a different vlan. This will result in still connected network ports but interrupted connectivity.
During the night my SYSAUX tablespace filled up to the maximum allowed size so AWR could not save it´s data. In the database alert.log i found the following:
Tue Oct 06 02:02:38 2009 ORA-1688: unable to extend table SYS.WRH$_ACTIVE_SESSION_HISTORY partition WRH$_ACTIVE_270645195_176 by 8 in tablespace SYSAUX MMON Flush encountered SYSAUX out of space error(1688). MMON (emergency) purge of WR snapshots (105) and older
It seems if there is not enough space available for saving AWR data AWR will automatically delete the oldest snapshots to free up space. Nice feature :-)
In the first test we will just disable one node under normal load. Lets see what the different components (clusterware, asm instance, advm/acfs) report:
We use the same environment we installed here which consists of two virtual machines running Oracle Enterprise Linux 5 Update 3 64-bit with Oracle 11g Release 2 (Grid Infrastructure and Oracle binaries) installed. All data is placed in ASM with normal redundancy. Storage is exported from two Netapp Filers via iSCSI.
Writing posts for this blog i wondered: Am i allowed to post Metalink Notes entirely or partially in my blog posts?
Looking at the web i did not find any information. I might have read the Oracle Support License but due to lengthiness – and because i am not a lawyer i refused from this :-)
So i opened a SR and asked my question. This is the reply i got:
The customer can refer to Document IDs in My Oracle Support, but is NOT allowed to reproduce any Support content which currently requires a login to Oracle Support, doing so would violate the terms of their Oracle Support License.
I guess that answers the question entirely.