Partition layout on Linux – Recommendations
Partition recommendations for Oracle on Linux
Every system engineer or even every DBA comes along the following question from time to time: “How do i arrange my partitions for using oracle on linux?”
In the following i am blogging my thoughts and recommendations on partition/storage design on the linux operating sytsem for oracle. My opinion might not be solely true and you are free to develop your own standards. But maybe the following lines help you.
Which file system shall i use?
Metalink has several notes on what file systems can/shall be used for oracle on linux. Metalink note 236826.1 states:
------- Begin Quote ---------
Supported File Systems
One of the most interesting features of the Linux OS is its support for multiple and various file systems. File systems can be defined and built on a partition basis. VFAT, ext2, ext3 and Reiser file systems can co-exist on the same Linux system, along with several other file systems and raw partitions (Note 224302.1 – Raw Devices on RedHat Advanced Server – FAQ).
Your choice of which one to use then becomes based on supportability, reliability, security and performance. Oracle generally does not certify file systems and does certify operating systems, but Linux is a specific case. On different Linux distributions, Oracle might choose to have certifications on different filesystems. When a problem is encontered, if a non-supported filesystem is being used for Oracle data, control and redo log files, Oracle will try to reproduce the problem on a supported and certified filesystem. If the problem does not reproduce on a certified filesystem, the specific filesystem / O/S vendor should be engaged for resolution.
The current support includes ext2, ext3 and NFS-based storage systems (e.g. NetApp). To be acceptable for database usage, the file system must support certain minimum functions such as write acknowledgement, write-through cache and file permissions. Counter examples are simple NFS, which lacks acknowledge and VFAT, which lacks file permissions.
Recommended File Systems
There are various file systems available for Linux OS:
- The ext2 and ext3 file systems are robust and fully supported. ext2 was the default file system under the 2.2 kernel. ext3 is simply the enhanced ext2 filesystem having journaling feature. ext3 is the detault filesystem for RHEL3 and 4.
- Oracle Cluster File System (OCFS) is a shared file system designed specifically for Oracle Real Application Cluster (RAC). OCFS eliminates the requirement for Oracle database files to be linked to logical drivers. OCFS volumes can span one shared disk or multiple shared disks for redundancy and performance enhancements (Note 224586.1).OCFS2 is the next generation of the Oracle Cluster File System for Linux. It is an extent based, POSIX compliant file system. Unlike the previous release (OCFS),
- OCFS2 is a general-purpose file system that can be used for shared Oracle home installations making management of Oracle Real Application Cluster (RAC) installations even easier.
In summary, the recommended filesystems are:
- Single node: Any filesystem that is supported by the Linux vendor. Note that any filesystem issues are need to be resolved by the Linux vendor.
- Multi-node (RAC): OCFS, raw, NFS-based storage systems (e.g. NetApp). (See Note 329530.1, Note 423207.1 and Note 279069.1 for GFS support)
------- End Quote ---------
From the authors point of view the following file systems / technologies shall be used for oracle on linux:
- single node oracle
- multi-node (RAC) oracle
Todays hard disk sizes always require a journaling file system. Using a non-journalig file system (like ext2) will create problems after a system crash when checking the file system on larger hard disks will take serveral hours. Thats completely avoidable. Using ext4 is not recommended for stability reasons and it´s reliability has yet to be prooven. In addition to that ext4 is currently not supported anyway. XFS is only supported on SuSE Enterprise Server and according to Note 414673.1 it cannot use asynch and direct io simultaneously. In addition to that the same notes states users shall not use XFS due to an undoucumented bug which can lead to redo log corruptions. At the time of writing the future of reiserfs is uncertain since Hans Reiser (developer of the file system) was sent to jail for murder.
So ext3 is currently the only available choice for oracle on linux because it is quite stable and – for use with LVM most important – supports online file system growth. For shrinking the file system still need to be unmounted. Please note you ext3 might force a full file system check every X months or every Y mounts. You can tune this behavior with “tune2fs”.
For clusters i do not recommend OCFS because from my point of view ASM is the future and further advanced piece of technology. With 11g Release 2 Oracle extended ASM to be a complete volume manager and added a cluster file system for ASM. You can read here more about that topic.
LVM – the logical volume manager
What is an logical volume manager?
Using a so called “logical volume manager” – short: LVM – enables to abstract disk devices from file systems or partitions. A LVM takes one ore more disks and forms a storage unit. This unit is called differently depending on the LVM being used. The linux lvm name it: “volume group”.
The following picture illustrates how the LVM works:
As you can see the LVM takes one or more disks “physical device” or “physical volume” and forms a so called “volume group”. There can be one or more volume group. One disk always belongs to one volume group. From the volume group storage is exported as “logical volume”. On top of a logical volume a file system can be created and mounted or the logical volume can be used directly as raw device.
More information can be found in the linux lvm howto.
- Online-Resize (growing and shrinking) of logical volumes (not the file systems!)
- supports multipathing
- supports several RAID levels (concat, raid-0, raid-1, raid-5)
Restrictions / Disadvantages
Every recent Linux distribution (Red Hat ES/AS 4, Red Hat ES/AS 5, OEL 5, SuSE Linux Enterprise Server 10, SuSE Linux Enterprise Server 11) offers the possibility to use LVM out-of-the-box. However there are a few restrictions:
- Due to limitations in grub bootloader /boot (i.e. the kernel and initial ram disk) cannot be placed inside a LVM; /boot needs to be a separate partition. There is a re-implementation of grub which will offer the possibility to boot from a /boot encapsulated in LVM.
- Fixing boot problems is a little bit more complex because accessing the volumes require the LVM to be started and supported by the rescue system. Most distributions offering LVM support offer this as well in their recue systems. If not you can use SystemRescueCD (see “Tools” section at the end of this post)
- Although it is possible to use LVM in multipathing scenarios most storage vendors require dedicated kernel drivers coming from the storage vendors directly.
- Although a LVM can resize a logical volume the file system stored on this logical volume cannot be resized by the LVM directly. For doing so you have to use the file system specific tools (for ext2/ext3 the tool is named “resize2fs”)
- Resizing a LUN/Disk already used by LVM requires a reboot after the partition used by the LVM was edited with fdisk to reflect the new partition size so the kernel recognizes the new size of the partition (this is also true for non-lvm configurations).
Shall i use an LVM?
From my point of view an LVM shall be used for:
- easier and more flexible management of operating system storage
- easier and more flexible management for oracle database binary installation location (according to the OFS this is /u01)
An LVM shall not be used for:
- Multipathing if it is not supported/certified by your storage vendor
- Storing any type of oracle data files (this includes: data files, archive logs, redo logs, flashback logs, control files, ….): LVM introduces a additional layer of software, complexity and source of problems especially in conjunction with oracle data files.
- SWAP: There might be scenarios where the kernel runs out of memory and needs some pages from memory to be swapped out; if swap is lvm encapsulated the kernel might not be able to complete the request because the LVM requires some memory to be available which is not available – voila: a race condition! In addition to that swapping is already slow… encapsulating it in LVM makes it more slower. Furthermore SWAP is a quite static device.
Putting it all toghether
Due to the restrictions mentioned above a typical hard disk layout for the boot disk (here: “/dev/sda”) looks like this:
Device Size Part.-Type File system Mountpoint /dev/sda1 150 MB linux ext3 /boot /dev/sda2 depends linux swap - - /dev/sda3 remain. space linux lvm (8e) - -
The size of the swap space depends directly on the amount of memory you have. According to the oracle documentation you will need up to 16 GB swap (if you have more than 16 gb memory the swap space size will be static at 16 gb regardless of the actual memory size).
Non-Boot disks (for use with LVM)
For any additional disk used by the LVM the partition layout migh look like this:
Device Size Part.-Type File system Mountpoint /dev/sdb1 whole disk linux lvm (8e) - -
Non-Boot disks (NOT used by the LVM)
Device Size Part.-Type File system Mountpoint /dev/sdc1 whole disk linux ext3 /u02
Mount Points / Logical Volume / Physical device separation
The following listing is my standard listing for oracle database servers on linux:
Device Mount Point File system Size /dev/sda1 /boot ext3 150 MB /dev/sda2 swap swap depends on memory /dev/mapper/vg_root/lv_root / ext3 15 GB /dev/mapper/vg_root/lv_var /var ext3 10 GB /dev/mapper/vg_root/lv_tmp /tmp ext3 20 GB /dev/mapper/vg_root/lv_u01 /u01 ext3 50 GB /dev/sdc1 /u02 ext3 complete disk /dev/sdd1 /u03 ext3 complete disk ... /dev/sdN /u0N ext3 complete disk
You note /boot and swap is not LVM encapsulated due to some resrictions outlined above. /, /var, /tmp and /u01 are separated by logical volumes with appropriate sizes. You might think the device sizes are way too large: You´re right. I tent to make the logical volumes a little bit larger so i do not need to resize them too soon. Every resize operatin might cause total data loss. So it is advisable to resize file systems or logical volumes as less often as possible. The first level directories /var, /tmp and /u01 are separated in logical volumes because data will be placed under this top level directories.
If you like to you can even have separate volumes for /home or the oracle admin or the diagnostic directory.
I do not have a separate volume for /home because there are no users beside “root” and the oracle user on my systems which are allowed to store files. If you need space for user directories i recommend creating a dedicated /home volume as well.
With this volume layout we are using 50+20+10+15 GB = 95 GB plus 150 MB for /boot plus swap. Todays smallest server disks have at least 147 gb capacity (formatted something like 138 gb …). So this layout fits easily on these disks. If you have a server with 73 gb disks you can shrink primary /u01 and after that all the other volumes.