INFO: task blocked for more than 120 seconds.
When running some high workloads on UEK kernels on systems with a lot of memory you might see the following errors in /var/log/messages:
INFO: task bonnie++:31785 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. bonnie++ D ffff810009004420 0 31785 11051 11096 (NOTLB) ffff81021c771aa8 0000000000000082 ffff81103e62ccc0 ffffffff88031cb3 ffff810ac94cd6c0 0000000000000007 ffff810220347820 ffffffff80310b60 00016803dfd77991 00000000001312ee ffff810220347a08 0000000000000001 Call Trace: [<ffffffff88031cb3>] :jbd:do_get_write_access+0x4f9/0x530 [<ffffffff800ce675>] zone_statistics+0x3e/0x6d [<ffffffff88032002>] :jbd:start_this_handle+0x2e5/0x36c [<ffffffff800a28b4>] autoremove_wake_function+0x0/0x2e [<ffffffff88032152>] :jbd:journal_start+0xc9/0x100 [<ffffffff88050362>] :ext3:ext3_write_begin+0x9a/0x1cc [<ffffffff8000fda3>] generic_file_buffered_write+0x14b/0x675 [<ffffffff80016679>] __generic_file_aio_write_nolock+0x369/0x3b6 [<ffffffff80021850>] generic_file_aio_write+0x65/0xc1 [<ffffffff8804c1b6>] :ext3:ext3_file_write+0x16/0x91 [<ffffffff800182df>] do_sync_write+0xc7/0x104 [<ffffffff800a28b4>] autoremove_wake_function+0x0/0x2e [<ffffffff80062ff0>] thread_return+0x62/0xfe [<ffffffff80016a81>] vfs_write+0xce/0x174 [<ffffffff80017339>] sys_write+0x45/0x6e [<ffffffff8005d28d>] tracesys+0xd5/0xe0
This is a know bug. By default Linux uses up to 40% of the available memory for file system caching. After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous. For flushing out this data to disk this there is a time limit of 120 seconds by default. In the case here the IO subsystem is not fast enough to flush the data withing 120 seconds. This especially happens on systems with a lof of memory.
The problem is solved in later kernels and there is not “fix” from Oracle. I fixed this by lowering the mark for flushing the cache from 40% to 10% by setting “vm.dirty_ratio=10″ in /etc/sysctl.conf. This setting does not influence overall database performance since you hopefully use Direct IO and bypass the file system cache completely.
Thank you very much. It really helped me.
doing “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message”
does disable the message, only, isn’t ?
it may be things worst, i argue
what do u suggest about that?
Yes it does but the message has a sense…. so disabling is not a good idea.
Thank you, this solved my problem on a busy Enterprise Manager Cloud Control 12c2+Oracle Database 11.2.0.3 system, you are a life saver!