Oracle 10g introduced the SGA_TARGET and SGA_MAX_SIZE parameter which dynamically resized many SGA components on demand. With 11g Oracle developed this feature further to include the PGA as well – the feature is now called “Automatic Memory Management” (AMM) which is enabled by setting the parameter MEMORY_TARGET.
Automatic Memory Management makes use of the SHMFS – a pseudo file system just like /proc. Every file created in the SHMFS is created in memory.
Unfortunately using MEMORY_TARGET or MEMORY_MAX_SIZE together with Huge Pages is not supported. You have to choose either Automatic Memory Management or HugePages. In this post i´d like to discuss AMM and Huge Pages.
Automatic Memory Management (AMM)
AMM – what it is
Automatic Memory Management was introduced with Oracle 11g Release 1 and automated sizing and re-sizing of SGA and PGA. With AMM activated there are two parameters of interest:
- MEMORY_TARGET
- MEMORY_MAX_TARGET
MEMORY_TARGET specifies the oracle system-wide usable amount of memory while MEMORY_MAX_TARGET specifies the upper bound of which the DBA can se MEMORY_TARGET to. If MEMORY_MAX_TARGET is not specified it defaults to MEMORY_TARGET.
For AMM to work there is one important requirement: Your system needs to support memory mapped files (on Linux typically mounted on /dev/shm).
According to the documentation the following platforms support AMM:
- Linux
- Solaris
- Windows
- HP-UX
- AIX
More information on AMM can be found here, here, here and here.
Advantages
- SGA and PGA automatically adjusted
- dynamically resizeable
- Not swappable
Disadvantages
- Only available on a limited number of plattforms
Bug for Feature?
- Does not work together with HugePages – so it is either AMM or HugePages
Before deciding lets see what HugePage are:
HugePages
HugePages – what are they?
Here is one description:
Hugepages is a mechanism that allows the Linux kernel to utilise the multiple page size capabilities of modern hardware architectures. Linux uses pages as the basic unit of memory, where physical memory is partitioned and accessed using the basic page unit. The default page size is 4096 Bytes in the x86 [and x86_64 as well; note by Ronny Egner] architecture. Hugepages allows large amounts of memory to be utilized with a reduced overhead. Linux uses “Transaction Lookaside Buffers” (TLB) in the CPU architecture. These buffers contain mappings of virtual memory to actual physical memory addresses. So utilising a huge amount of physical memory with the default page size consumes the TLB and adds processing overhead. The Linux kernel is able to set aside a portion of physical memory to be able be addressed using a larger page size. Since the page size is higher, there will be less overhead managing the pages with the TLB.
(Source: http://unixfoo.blogspot.com/2007/10/hugepages.html)
Advantages
- Huge Pages are not swappable; thus keeping your SGA locked in memory
- Overall memory performance is increased: Since there are less pages to scan the memory performance is increased
- kswapd needs far less resources: kswapd regularly scans the page table for infrequent accessed pages which are a candidate for paging to disk. If the page table is large kswapd will use a lot of resources in therm of CPU. With Huge Pages enabled the page table is much smaller and Huge Pages are not subject to swap so kswapd will use less resources.
- Improves TLB hit ratio due to less entries thus increasing memory performance further. The TLB is a small cache on cpu which stores virtual to physical memory mappings
Disadvantages
- Should be allocated at startup (allocating huge pages at runtime is possible but will fail probably due to memory fragmentation; so it is advisable to allocate them at startup)
- dynamically allocating hugepages is buggy
Linux memory management with and without Huge Pages
The following two figures try to illustrate how memory access with and without huge pages work. As you can see every process in a virtual memory operating system has it´s own process page table which points to a system page table.For oracle processes running on linux it is not uncommon to use the same physical memory regions due to accessing the SGA or the block cache. This is illustrated in the figures below for the pages 2 and 3; they are access by both processes.
Without huge pages memory is divided in chunks (called: “pages”) of 4 KB on Intel x86 and Intel x86_64 (the actual size depends on the hardware platform). Operating system offering virtual memory (as most modern operating system do, for instance linux, solaris, hp-ux, aix and even windows) present each process a continuous addressable memory (“virtual memory”) which consists of memory pages which reside in memory or even on disk (“swap”). See here for more information.
The following file taken from the wikipedia article mentioned above illustrates the concept of virtual memory:
From the process’ view it looks like it is solely running on the operating system. But it is not. In fact there are a lot of other processes running.
As mentioned above memory is presented to processes in chunks of 4 kb – a so called “page”. The operating system manages a list of pages – the “page table” – for each process and for the operating system as well which maps the virtual memory to physical memory.
The page table can be seen as “memory for managing memory”.Each page table entry (PTE) takes:
- 4 bytes of memory per page (4 kb) per process on 32-bit intel and
- 8 byte of memory per page (4 kb) per process on 64-bit intel
For more information see my post and the answer on the Linux Kernel Mailing List (LKML).
So for a process to touch every page on a 64-bit system with 16 GB memory there are required:
- for the memory referenced by the process: 4.2 million PTE (~ 16 GB) with 8 byte each = 32 MB
- PLUS for the system page table: 4.2 million PTE (~ 16 GB) with 8 byte each = 32 MB
- equals to 64 MB for the whole page table as counted in /proc/meminfo [PageTables]
On systems running oracle databases with a huge buffer cache and highly active processes (= sessions) it is not uncommon for the processes to reference the whole SGA and parts of the PGA after a while. Taken the example above assuming a buffer cache of 16 GB this adds up to 32 MB per process for the page table. 100 processes will consume 3.2 GB! Thats a lot of memory which is not available and solely used to manage memory.
The size of the page table can be querien on linux as follows:
cat /proc/meminfo | grep PageT PageTables: 25096 kB
This command show the size of the Page Table (sum of system and all process page tables). This amount of memory is unusable for all processes and solely for managing the memory. On system with a lot of memory and a huge sga/pga and many dedicated server connections the page table can be several GByte in size!
The solution for this memory wastage is to implement huge pages. Huge pages increase the memory chunks from 4 kb to 2 MB so a page table entry still takes 8 bytes in 64-bit intel but references 2 mb – thats more efficenty by a factor of 512!
So taken our example from above with a buffer cache of 16 GB (16384 MB) referenced completely by a process the page table for the process will be:
- 16384 GB referenced / 2 MB per page = 8192 PTE with each 8 byte needed = 65536 Byte or 65 KB
I guess the advantage is obvious: 32 MB with no huge pages vs. 65 KB with huge pages!
Is my system already using huge pages?
You can check this by doing:
cat /proc/meminfo | grep Huge
There are three possibilities:
No huge pages configured at all
cat /proc/meminfo | grep Huge HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 Hugepagesize: 2048 kB
Huge pages configured but not used
cat /proc/meminfo | grep Huge HugePages_Total: 3000 HugePages_Free: 3000 HugePages_Rsvd: 0 Hugepagesize: 2048 kB
Huge pages configured and used
[root@rac1 ~]# cat /proc/meminfo | grep Huge HugePages_Total: 3000 HugePages_Free: 2601 HugePages_Rsvd: 2290 Hugepagesize: 2048 kB
How to configure huge pages?
1. edit /etc/sysctl.conf and add the following line:
vm.nr_hugepages = <number>
Note: This parameter specifies the number of huge pages. To get the total size you have to multiply “nr_hugepages” by “cat /proc/meminfo | grep Hugepagesize”. For 64-bit Linux on x86_64 the size of one Huge Page is 2 MB. So for a total amount of 2 GB or roughly 2000 MB you need 1000 Pages.
2. edit /etc/security/limits.conf and add the following lines:
<oracle user> soft memlock unlimited <oracle user> hard memlock unlimited
3. Reboot the server and check
Some real world examples
The following are two database systems not using Huge Pages. Lets see how much memory is spend just for managing the memory:
System A
The following is an example of a Linux based database server running two database instances with approx. 8 GB SGA in total. At the time of sampling there are 444 dedicated server sessions connected.
MemTotal: 16387608 kB MemFree: 105176 kB Buffers: 21032 kB Cached: 9575340 kB SwapCached: 1036 kB Active: 11977268 kB Inactive: 2378928 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 16387608 kB LowFree: 105176 kB SwapTotal: 8393952 kB SwapFree: 8247912 kB Dirty: 9584 kB Writeback: 0 kB AnonPages: 4754720 kB Mapped: 7130088 kB Slab: 256088 kB CommitLimit: 16587756 kB Committed_AS: 22134904 kB PageTables: 1591860 kB VmallocTotal: 34359738367 kB VmallocUsed: 9680 kB VmallocChunk: 34359728499 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 Hugepagesize: 2048 kB
As you notice approx. 10% of all available memory is used for the Page Tables.
System B
System B is a Linux based system with 128 GB memory running one single database instance with 34 GB SGA and approx 400 sessions:
MemTotal: 132102884 kB MemFree: 596308 kB Buffers: 472620 kB Cached: 111858096 kB SwapCached: 138652 kB Active: 65182984 kB Inactive: 53195396 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 132102884 kB LowFree: 596308 kB SwapTotal: 8393952 kB SwapFree: 8112828 kB Dirty: 568 kB Writeback: 0 kB AnonPages: 5901940 kB Mapped: 33971664 kB Slab: 915092 kB CommitLimit: 74445392 kB Committed_AS: 48640652 kB PageTables: 12023792 kB VmallocTotal: 34359738367 kB VmallocUsed: 279912 kB VmallocChunk: 34359456747 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 Hugepagesize: 2048 kB
In this example Page Tables allocate 12 GB of memory. Thats a lot of memory.
Some laboratory examples
Test case no. 1
The first test case is quite simple. I started 100 dedicated database connections with a delay of 1 second between each database connection. Each session will log on and sleep for 200 seconds and log off. In the operating system i will monitor the page table size with increasing and decreasing database sessions.
foo.sql script
exec dbms_lock.sleep(200); exit
doit.sql script
doit.sql for i in {1..100} do echo $i sqlplus system/manager@ora11p @foo.sql & sleep 1 done
Results without Huge Pages
The following output were observed without huge pages:
while true; do cat /proc/meminfo | grep PageTable && sleep 3; done
PageTables: 58836 kB PageTables: 60792 kB PageTables: 62808 kB PageTables: 64808 kB PageTables: 66560 kB PageTables: 68780 kB PageTables: 70084 kB PageTables: 72044 kB PageTables: 72296 kB PageTables: 74184 kB PageTables: 76804 kB PageTables: 79000 kB PageTables: 80928 kB PageTables: 82932 kB PageTables: 84652 kB PageTables: 86576 kB PageTables: 88936 kB PageTables: 90896 kB PageTables: 94120 kB PageTables: 96424 kB PageTables: 98212 kB PageTables: 100304 kB PageTables: 101868 kB PageTables: 103960 kB PageTables: 105996 kB PageTables: 108108 kB PageTables: 109992 kB PageTables: 111404 kB PageTables: 113584 kB PageTables: 114860 kB PageTables: 116856 kB PageTables: 118276 kB PageTables: 120256 kB PageTables: 120316 kB PageTables: 120240 kB PageTables: 120616 kB PageTables: 120316 kB PageTables: 121456 kB PageTables: 121480 kB PageTables: 121484 kB PageTables: 121480 kB PageTables: 121408 kB PageTables: 121404 kB PageTables: 121484 kB PageTables: 121632 kB PageTables: 121484 kB PageTables: 121480 kB PageTables: 120316 kB PageTables: 120320 kB PageTables: 120316 kB PageTables: 120320 kB PageTables: 121460 kB PageTables: 121652 kB <==== PEAK AROUND HERE PageTables: 121500 kB PageTables: 121540 kB PageTables: 120096 kB PageTables: 118136 kB PageTables: 116188 kB PageTables: 114192 kB PageTables: 112236 kB PageTables: 110240 kB PageTables: 106556 kB PageTables: 103792 kB PageTables: 101820 kB PageTables: 97916 kB PageTables: 95900 kB PageTables: 95120 kB PageTables: 93104 kB PageTables: 91848 kB PageTables: 89852 kB PageTables: 87860 kB PageTables: 85896 kB PageTables: 83868 kB PageTables: 81940 kB PageTables: 79944 kB [...]
Results with Huge Pages
while true; do cat /proc/meminfo | grep PageTable && sleep 3; done
PageTables: 27112 kB PageTables: 27236 kB PageTables: 27280 kB PageTables: 27320 kB PageTables: 27344 kB PageTables: 27368 kB PageTables: 27396 kB PageTables: 27416 kB PageTables: 31028 kB PageTables: 31412 kB PageTables: 37668 kB PageTables: 37912 kB PageTables: 39964 kB PageTables: 39756 kB PageTables: 39740 kB PageTables: 41312 kB PageTables: 41436 kB PageTables: 41508 kB PageTables: 42192 kB PageTables: 42196 kB PageTables: 42528 kB PageTables: 43036 kB PageTables: 43232 kB PageTables: 45616 kB PageTables: 44852 kB PageTables: 44540 kB PageTables: 44552 kB PageTables: 44728 kB PageTables: 44748 kB PageTables: 44764 kB PageTables: 45936 kB PageTables: 46992 kB PageTables: 48128 kB PageTables: 49264 kB PageTables: 50312 kB PageTables: 51056 kB PageTables: 52244 kB PageTables: 53496 kB PageTables: 54256 kB PageTables: 55296 kB PageTables: 56440 kB PageTables: 57712 kB PageTables: 58240 kB PageTables: 58824 kB PageTables: 59612 kB PageTables: 60656 kB PageTables: 62468 kB PageTables: 63592 kB PageTables: 64700 kB PageTables: 65820 kB PageTables: 66916 kB PageTables: 68344 kB PageTables: 69144 kB PageTables: 70260 kB PageTables: 71044 kB PageTables: 72172 kB PageTables: 73224 kB PageTables: 73684 kB PageTables: 74736 kB PageTables: 75828 kB PageTables: 76952 kB PageTables: 78068 kB PageTables: 79180 kB PageTables: 78604 kB PageTables: 79384 kB PageTables: 79384 kB PageTables: 80064 kB PageTables: 80092 kB PageTables: 80096 kB PageTables: 80096 kB PageTables: 80096 kB PageTables: 80084 kB PageTables: 80096 kB PageTables: 80092 kB PageTables: 80096 kB <=== PEAK AROUND HERE PageTables: 80096 kB PageTables: 79408 kB PageTables: 79400 kB PageTables: 79400 kB PageTables: 79396 kB PageTables: 79392 kB PageTables: 79392 kB PageTables: 79392 kB PageTables: 79392 kB PageTables: 79392 kB PageTables: 79396 kB PageTables: 79396 kB PageTables: 70260 kB [...]
Observations
- without huge pages page table size peaked at approx. 120 MB
- with huge page page table size peaked at approx. 80 MB
In this simple test case using huge pages used only 66% of memory.
Because of the choosen test case memory saving is not that big because we did not referenced that much memory from the SGA in our processes. Tests would be much clearer with larger parts (e.g. buffer cache) of the SGA referenced.
Conclusion
HugePages offers some important advantages over AMM, for instance:
- minimizing cpu-cycles used for scanning memory pages which are candidates for swapping thus freeing cpu-cycles for your database,
- minimizing memory spend for managing memory references
The latter point is the most important one. Especially systems with large memory amounts dedicated to SGA and PGA and many database sessions (> 100) will benefit from using Huge Pages. The more memory dedicated to SGA and PGA and the more sessions connected with the database the larger the memory savings from using Huge Pages will be.
From my point of view even if AMM simplifies memory management by including both PGA and SGA the memory (and cpu) savings from using Huge Pages are more important than just simlifying memory management.
So if you have an SGA larger than 16 GB and more than 100 sessions using Huge Pages is definetly worth trying. On system with only a few sessions using Huge Pages will give some benefit as well but only by reduding cpu-cycles needed for scanning the memory pages.
Hi Ronny,
I tried to implement HugePages on my RedHat 5.4 host with Oracle 11.2, and saw that in case when I started instance with sqlplus HugePages are used, otherwise (by srvctl or crsctl) aren’t.
Do you have some idea about this ?
Thank you in advance
Alexei
Hi Alexei,
strange. I have not observed this behavior. I will check it on my environment and come back to you.
Hi Ronny,
I liked the in-debt explanation about the Huge Page behavior. I’m frequent visitor of your blog.
Thank you
RA
Pingback: Blogroll Report 26/03 /2010 – 02/04/2010 « Coskan’s Approach to Oracle
Hi Ronny, I have the same behavior on OEL 5.3 as Alexi. I have 11g Clusterware and 10g Database. DB started with SQLPLUS uses Hugepages and the ULIMIT is set correctly. If crs or srvctl startup the db, it defaults ULIMIT to 4GB and the hugepages is not used.
Hi!
I have read this article. On mine 64 bit RHEL 5.3 EE, how can you prove me that I’m not using AMM AND Huge pages?
Frankly I do not understand this all mentioned….so if you could place fewer command that can prove your main point.
Rg,
Damir Vadas
> I have read this article. On mine 64 bit RHEL 5.3 EE, how can you prove me that I’m not using AMM AND Huge pages?
First of all Oracle itself says it is not supported :-)
Secondly all you need to do is to reserve some huge pages, enable AMM (set memory_target), start your instance and check if any hugepage is beging used (see /proc/meminfo).
Excellent post…
Can I configure Hugepages on other operating systems like AIX and Solaris ? If yes, whether the advantages of hugepages mentioned in your blog applicable to these platforms.
Regards
Vivek
The problem with HugePages not being used if instance is started w/ srvctl or crsctl, is Bug #9251136.
Workaround is to add the appropriate ‘ulimit -l’ and ‘ulimit -n’ commands directly into ohasd script. See the bug write up on MOS for full details.
Pingback: Ronny Egners Blog » Tuning Linux for Oracle
After I tracked down and resolved the problem on my systems, I discovered this article. I thought it would be helpful to post my write up on the issue.
I’ve tracked this down to the memlock parameter not being set properly within
the oracle clusterware suite. When the clusterware suite executes system
calls, like starting oracle, the resource limits are inherited from the initial
shell. In this case, the default system settings would apply and huge pages
would not be utilized due to the memlock parameter being sized too
small for the oracle database SGA size. To resolve the problem, set ulimit
settings in the clusteware init scripts to ensure they are sized properly at
boot up.
You may notice this was working on the system until a reboot was issued. If services
were started manually, the limits for the user would have been inherited from
the startup user’s shell.
Instructions to correct the problem:
Modify /etc/init.d/ohad and add the following to the start function in the
script:
ulimit -n 65536
ulimit -l ##your memlock size here##
And mentioned above, there is a Metalink article on this problem: Metalink ID: 9251136.1
This is a wonderful article, I must say I now truly understand why the Sys. Admin suggested the need to change to using Huge Pages rather than just the memory, and also why we now use the lock_sga parameter in Oracle for 10g. Now for our proposed 11g upgrade, I will be making the recommemdation of not using the AMM parameter for 11g for that will take us back to where we were before the introduction of huge pages on our system.
Pingback: 犯错了 (vm_nr_hugepages) « 弹冠相庆
Pingback: 犯错了 (vm_nr_hugepages) » 弹冠相庆
I am having trouble getting hugepages to work with ASM instances. I am on RH5 64 bit.
Oracle 11.2.0.2.
Any advice would be great.
Never tried it. From my point of view that micro-optimization. Your ASM instance will have approx 1 GB – 1.5 GB for really big installations. Thats approx 1 -2 MB per session. Normally there are only a few connected sessions. So from my pov Huge Pages are not really required.
I implemeted huge pages on linux and start getting error:-
1)I am getting ORA-00445: background process “J000” did not start after 120 seconds
kkjcre1p: unable to spawn jobq slave process
I’m running 2 oracle 11.2 databases on SUSE Linux Enterprise Server (SLES) having total 16 gb physical
memory.
1) sga_target 6gb sga_max_size =8g
2) sga_target=sga_max_size=1536 mb
I configured huges pages and
I run hugepages_settings.sh as per the document
Document 401749.1 and it s giving
Recommended setting: vm.nr_hugepages = 4868
which is looking wrong to me.
and also getting following message via OEM
Significant virtual memory paging was detected on the host operating system.
oracle@oracle1:/tmp> cat /proc/meminfo | grep Huge
HugePages_Total: 4868
HugePages_Free: 28
HugePages_Rsvd: 26
HugePages_Surp: 0
Hugepagesize: 2048 kB
oracle@oracle1:/tmp> free -m
total used free shared buffers cached
Mem: 15955 15830 125 0 102 3981
-/+ buffers/cache: 11745 4210
Swap: 49151 8 49143
kernel.shmmax = 12884901888
kernel.shmall = 3145728
fs.file-max = 6815744
vm.nr_hugepages = 4868
## added by orarun ##
oracle soft nproc 2047
oracle hard nproc 16384
oracle soft nofile 1024
oracle hard nofile 65536
# End of file
* soft memlock 12582912
* hard memlock 12582912
Looks good from what i can tell. Maybe you have some kind of wild running user process using a lot of memory. Huge Pages are by default not swappable.
Regadring the process which did not start: This might be a side effect of a heavy loaded system.
My recommendation is to monitor the system using OSWatcher and look at the memory indicators.
One further question: Did you reboot the system to allocate the huge pages? Allocating them on a system which was running for some time might also cause heavy swapping.
Hello.
Thanks for help.
I tried this but my sever was wrong configured and I had to change the ulimit -l value.
Then, logout, login and bounce databases and it worked!
Thanks,
Alex.
Pingback: Linux Huge Pages: Schnellstart | Oraculix