Archive

Archive for October, 2011

PRVF-5300: Failed to retrieve active version for CRS on this node when installing 11.2.0.2 DB on 11.2.0.3.0 Grid Infrastructure

October 13th, 2011 9 comments

I just played with 11.2.0.3.0 patchset on Linux x86_64 (in my testcase Oracle Enterprise Linux 5.6) and tried to install a 11.2.0.2.0 database on it. It fails with:

PRVF-5300: Failed to retrieve active version for CRS on this node


The error stack in the installation log is:

ID: oracle.install.commons.util.exception.DefaultErrorAdvisor:745
oracle.cluster.verification.VerificationException: An internal error occurred within cluster
verification framework

ERRORMSG(linux): PRVF-5300 : Failed to retrieve active version for CRS on this node
        at oracle.cluster.verification.ClusterVerification.getPreReqTasksForSIDBInst(ClusterVerification.java:615)
        at oracle.install.ivw.db.action.PrereqAction.getProductVerificationTasks(PrereqAction.java:111)
        at oracle.install.commons.base.interview.common.action.AbstractPrereqAction.execute
        (AbstractPrereqAction.java:86)
        at oracle.install.commons.flow.AbstractFlowExecutor.startAction(AbstractFlowExecutor.java:358)
        at oracle.install.commons.flow.AbstractFlowExecutor.enterVertex(AbstractFlowExecutor.java:571)
        at oracle.install.commons.flow.AbstractFlowExecutor.transition(AbstractFlowExecutor.java:333)
        at oracle.install.commons.flow.AbstractFlowExecutor.nextState(AbstractFlowExecutor.java:268)
        at oracle.install.commons.flow.AbstractFlowExecutor.nextViewState(AbstractFlowExecutor.java:227)
        at oracle.install.commons.flow.DefaultFlowNavigator.goForward(DefaultFlowNavigator.java:58)
        at oracle.install.commons.flow.jewt.FlowWizard$1.run(FlowWizard.java:125)
        at oracle.install.commons.flow.jewt.FlowWizard$TransitionManager$1.run(FlowWizard.java:101)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
        at java.util.concurrent.FutureTask.run(FutureTask.java:123)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:651)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:676)
        at java.lang.Thread.run(Thread.java:595)

The problem

I started the installer with debug enabled just add “-debug -logLevel finest >inst1.out 2>inst2.out”). The log files gave some insight:

[Version.getVersion:497]  version String is 11.2.0.3.0
[Version.getVersion:498]  new Version().toString is 11.2.0.2.0
[VerificationUtil.getSIHAReleaseVersionObj:4986]  Configuration Exception:
PRKC-1137 : Unable to find Version object with string value 11.2.0.3.0
[VerificationUtil.getCRSUser:1362]  Active Version = null

The related query command is

"GI_HOME/bin/crsctl query has releaseversion"

Obviously 11.2.0.2.0 installer has problems with the string “11.2.0.3.0”.

Solution #1

The most simple approach is to start the installer like this:

./runInstaller -ignorePrereq

With that the installer skips al pre-installation tests.

Solution #2

One simple approach was to created a wrapper around crsctl to report a version of 11.2.0.2.0 when querying releaseversion:

cd $GRID_HOME /bin
mv crsctl crsctl.orig

Now create a script “crsctl” with the following contents:

EXEC=/u01/app/oragrid/product/11.2.0.3.0/bin/crsctl.orig
case $1 in
query)
 echo "Oracle High Availability Services release version on the local node is [11.2.0.2.0]"
;;
*)
        $EXEC $*
;;
esac

 

You can start the database installation. During the verification steps the installer might report the Oracle Restart Registry as invalid. Just ignore it. The installation should now run fine.

Note that this bug is NOT related to OEL 5.6. It is the installer which cannot deal with the version string of the newer grid infrastructure. So you will face this error on OEL 6, RedHat and SuSE as well.

Dont forget to revert the changes after the installation!

After installation finished i was able to create a database using ASM without any problems. Registering the database into Oracle Restart also worked fine.

Categories: Oracle in general Tags:

INFO: task blocked for more than 120 seconds.

October 13th, 2011 18 comments

When running some high workloads on UEK kernels on systems with a lot of memory you might see the following errors in /var/log/messages:

 

INFO: task bonnie++:31785 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
bonnie++      D ffff810009004420     0 31785  11051               11096 (NOTLB)
ffff81021c771aa8 0000000000000082 ffff81103e62ccc0 ffffffff88031cb3
ffff810ac94cd6c0 0000000000000007 ffff810220347820 ffffffff80310b60
00016803dfd77991 00000000001312ee ffff810220347a08 0000000000000001
Call Trace:
[<ffffffff88031cb3>] :jbd:do_get_write_access+0x4f9/0x530
[<ffffffff800ce675>] zone_statistics+0x3e/0x6d
[<ffffffff88032002>] :jbd:start_this_handle+0x2e5/0x36c
[<ffffffff800a28b4>] autoremove_wake_function+0x0/0x2e
[<ffffffff88032152>] :jbd:journal_start+0xc9/0x100
[<ffffffff88050362>] :ext3:ext3_write_begin+0x9a/0x1cc
[<ffffffff8000fda3>] generic_file_buffered_write+0x14b/0x675
[<ffffffff80016679>] __generic_file_aio_write_nolock+0x369/0x3b6
[<ffffffff80021850>] generic_file_aio_write+0x65/0xc1
[<ffffffff8804c1b6>] :ext3:ext3_file_write+0x16/0x91
[<ffffffff800182df>] do_sync_write+0xc7/0x104
[<ffffffff800a28b4>] autoremove_wake_function+0x0/0x2e
[<ffffffff80062ff0>] thread_return+0x62/0xfe
[<ffffffff80016a81>] vfs_write+0xce/0x174
[<ffffffff80017339>] sys_write+0x45/0x6e
[<ffffffff8005d28d>] tracesys+0xd5/0xe0

This is a know bug. By default Linux uses up to 40% of the available memory for file system caching. After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous. For flushing out this data to disk this there is a time limit of 120 seconds by default. In the case here the IO subsystem is not fast enough to flush the data withing 120 seconds. This especially happens on systems with a lof of memory.

The problem is solved in later kernels and there is not “fix” from Oracle. I fixed this by lowering the mark for flushing the cache from 40% to 10% by setting “vm.dirty_ratio=10” in /etc/sysctl.conf. This setting does not influence overall database performance since you hopefully use Direct IO and bypass the file system cache completely.

Categories: Oracle in general Tags:

11.2.0.3.0 is out

October 13th, 2011 No comments

Since almost two week Oracle released Oracle database patchset 11.2.0.3.0.

 

I am a little bit late with this news. Martin Bach already  posted a lot of stuff about this so let me link to his guides:

At the moment 11.2.0.3.0 is available for Linux (32/64 bit) and Solaris SPARC.

Categories: Oracle in general Tags: