The following post is a summary of my tests with DBFS on a 11g Release 2 RAC.
DBFS – An Overview
Since the introduction of LOBs the database was used to store all kinds of files, for instance images, audio files, video files or even text files. Starting with Oracle 11g Release 1 introduced SecureFiles which introduced compression and de-duplication.
Oracle Database Filesystem (DBFS) creates a mountable file system which can be used to access files stored in the database as SecureFile LOBs. DBFS is a shared file system like NFS and consists of a server (the database) and a client. Just like a traditional database connection server and client can be on different systems which communicate over SQLNet. The same applies to DBFS as well.
The following figure illustrates DBFS:
(Source: http://download.oracle.com/docs/cd/E11882_01/appdev.112/e10645/adlob_fs.htm)
Install DBFS
Requirements
The following requirements must be met to use DBFS:
- only available for 32-bit and 64-bit Linux
- installed FUSE package
- installed kernel development package
- installed oracle client libraries
Install FUSE
- Download FUSE in Version 2.7.3 from http://fuse.sourceforge.net
(Note: I initially tested with FUSE 2.8.1 but was not able to get it working; so stick with 2.7.3 or 2.7.4) - Install kernel development packages matching your kernel (to query your kernel type “uname -a”)
- Compile and install FUSE:
- tar -xvfz fuse-2.7.3.tar.gz
- cd fuse-2.7.3
- ./configure –prefix=/usr –with-kernel=/usr/src/kernels/`uname -r`-`uname -p`
- make
- make install
- depmod
- modprobe fuse
- chmod 666 /dev/fused
- echo “/sbin/modprobe fuse” >> /etc/rc.modules
- ldconfig
Create DBFS File System
Creating a user to store the file system in
Prior creating a DBFS File system you must create a user for storing the file system in. DBFS file systems can be created for every user. The role DBFS_ROLE must be granted to users having dbfs file system, for instance:
create user dbfs identified by dbfs default tablespace users quota unlimited on users; grant dbfs_role to dbfs; grant create session to dbfs; grant create procedure to dbfs; grant create table ro dbfs; alter user dbfs default role all;
Create a simple file system
To create the DBFS file system execute the script “dbfs_create_filesystem.sql” being connected as user to store the file system in. The syntax is:
sqlplus <username>/<password>@<database> @$ORACLE_HOME/rdbms/admin/dbfs_create_filesystem.sql <tablespace name> <file system name>
To create a file system for user dbfs created earlier and storing the content in tablespace USERS and naming the file system TESTDBFS:
sqlplus dbfs/dbfs@ora11p @$ORACLE_HOME/rdbms/admin/dbfs_create_filesystem.sql USERS TESTDBFS
dbfs_create_filesystem.sql creates a partitioned file system by creating multiple physical segments in the database and distributing files randomly in it. In cases when the files are extremely big and make a big percentage of the total file system this can cause an error “ENOSPC” to appear even if the file system is not full. In such cases the script “dbfs_create_filesystem_advances.sql” can be used to create a non-partitioned file system.
Create an advanced file system
Creating a standard file system does not use de-duplication or compression. To use this features you have to create the file system using the script “dbfs_create_filesystem_advanced.sql”. The calling syntax is:
sqlplus <username>/<password>@<database> @$ORACLE_HOME/rdbms/admin/dbfs_create_filesystem_advanced.sql <tablespace name> <file system name> <compress-high | compress-medium | compress-low | nocompress> <deduplicate | nodeduplicate> <encrypt | noencrypt> <partition | non-partition>
To create a de-duplicated, highly compressed, not encrypted and partitioned file system for user dbfs2 stored in tablespace USERS:
sqlplus dbfs2/dbfs2@ora11p @$ORACLE_HOME/rdbms/admin/dbfs_create_filesystem_advanced.sql USERS TESTDBFS2 compress-high deduplicate noencrypt partition
Dropping a File System
To drop a file system the script “dbfs_drop_filesystem.sql” must be used. Calling syntax is:
sqlplus <username>/<password>@<database> @$ORACLE_HOME/rdbms/admin/dbfs_drop_filesystem.sql <file system name>
Mounting a DBFS file system
Mounting a DBFS file system used FUSE (file system in user space) so every user (even non-priviledged users) can mount dbfs file systems and to separate file systems between users. With FUSE it is possible to mount two completely different file systems to the same mountpoint onyl separated by the users looking at it. (see below for an example)
To mount a dbfs file system use the userspace tool “dbfs_client” which is located in $ORACLE_HOME/bin. It is important that LD_LIBRARY_PATH includes the correct path to the oracle libraries. In addition to that remember dbfs_client will NOT return until the file system is unmounted.
The calling syntax is:
Usage: dbfs_client <db_user>@<db_server> [options] <mount point> db_user: Name of Database user that owns DBFS content store filesystem(s) db_server: A valid connect string Oracle database server (for example, hrdb_host:1521/hrservice). mount point: Path to mount Database File System(s). All the file systems owned by the database user will be seen at the mount point. DBFS options: -o direct_io Bypasses the Linux page cache. Gives much better performance for large files. Programs in the file system cannot be executed with this option. This option is recommended when DBFS is used as an ETL staging area. -o wallet Run dbfs_client in background. Wallet must be configured to get credentials. -o failover DBFS Client fails over to surviving database instance with no data loss. Some performance cost on writes, especially for small files. -o allow_root Allows root access to the filesystem. This option requires setting 'user_allow_other' parameter in /etc/fuse.conf -o allow_other Allows other users access to the filesystem. This option requires setting 'user_allow_other' parameter in /etc/fuse.conf -o rw Mount the filesystem read-write [Default] -o ro Mount the filesystem read-only. Files cannot be modified. -o trace_level=N Trace Level: 1->DEBUG, 2->INFO, 3->WARNING, 4->ERROR, 5->CRITICAL [Default: 4] -o trace_file <file> | 'syslog' -h help -V version
(Source: http://download.oracle.com/docs/cd/E11882_01/appdev.112/e10645/adlob_client.htm)
Unmounting a DBFS
Unmounting a dbfs can be achieved with:
fusermount -u <mount point>
Mounting DBFS Through fstab
Mounting a DBFS through fstab is a little bit complicated and requires oracle wallet to be used to store the passwords. See the DBFS documentation for more information.
DBFS restrictions
DBFS imposes the following restrictions:
- no ioctl
- no locking
- no memory-mapped files
- no Async IOs
- no O_DIRECT file open flags
- no hard links
DBFS Client Failover
According to the oracle documentation failover can be achieved by modifying the service as shown below:
exec DBMS_SERVICE.MODIFY_SERVICE(service_name => ‘service_name’,
aq_ha_notifications => true,
failover_method => ‘BASIC’,
failover_type => ‘SELECT’,
failover_retries => 180,
failover_delay => 1);
Instead of modifying the default service a new service named “DBFS” was created and configured accordingly to the example given above.
For POLICY MANAGED Clusters issue:
srvctl add service -d ORA11P -s DBFS -g Default -y AUTOMATIC -e SELECT -m BASIC -w 1 -z 180 -q TRUE
(“-g Default” specifies the name of the cluster group)
For ADMIN MANAGED clusters issue:
srvctl add service -d ORA11P -s DBFS -r ora11p1,ora11p2 -y AUTOMATIC -e SELECT -m BASIC -w 1 -z 180 -q TRUE
(“-r ora11p1,ora11p2” specified the instances the service runs)
Mounting the file system requires the parameter “-o failover” to be added:
ora11p@rac1#>: dbfs_client <db_user>@<db_server> -o failover /mnt/dbfs
Sharing and Caching
DBFS file system can be accessed by multiple clients at the same time. Sharing and caching behavior is like NFS: The client caches writes and flushes them after a timeout or a file handle was closed. Writes to files are visible to other clients after the file handle was closed. Bypassing client side write caching can be achieved by specifying switch O_SYNC opening the file.
Backup of DBFS
- backing up the database also backs up the dbfs in a consistent way
- backup up the dbfs as a normal file system enables single file restores
Small File Performance of DBFS
Note that the performance of DBFS on file systems with tens of thousands of files is not fully tuned in the current revision.
Some examples
File system separation between users
The following three listings show the output of “df -h” from three different users: grid, ora11p and root. Note that “ora11p” mounted the dbfs file system to /test4 and is the only user which is able to use the file system:
-bash-3.2$ id uid=500(grid) gid=500(dba) groups=500(dba) -bash-3.2$ df -h Filesystem Size Used Avail Use% Mounted on /dev/sda3 26G 15G 9.6G 60% / /dev/sda1 99M 12M 83M 13% /boot tmpfs 3.0G 1.4G 1.7G 45% /dev/shm /dev/asm/ora11p_home-132 10G 4.7G 5.4G 47% /u01/app/oracle/product/11.2.0/ora11p
-bash-3.2$ id uid=501(ora11p) gid=500(dba) groups=500(dba) -bash-3.2$ df -h Filesystem Size Used Avail Use% Mounted on /dev/sda3 26G 15G 9.6G 60% / /dev/sda1 99M 12M 83M 13% /boot tmpfs 3.0G 1.4G 1.7G 45% /dev/shm /dev/asm/ora11p_home-132 10G 4.7G 5.4G 47% /u01/app/oracle/product/11.2.0/ora11p dbfs 14M 3.3M 11M 25% /test4
[root@rac1 ~]# id uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel) [root@rac1 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/sda3 26G 15G 9.6G 60% / /dev/sda1 99M 12M 83M 13% /boot tmpfs 3.0G 1.4G 1.7G 45% /dev/shm /dev/asm/ora11p_home-132 10G 4.7G 5.4G 47% /u01/app/oracle/product/11.2.0/ora11p
Even root is by default not allowed to see the contents of /test4. This setting can be influenced by a mount parameter (see above):
[root@rac1 ~]# id uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel) [root@rac1 ~]# cd /test4 -bash: cd: /test4: Permission denied
Mounting a file system
The command syntax to mount a file system is described above. In the following we wil try to mount the highly compressed and de-duplicated database file system from user “dbfs2” created above. Trying to mount this file system results in the following error message:
-bash-3.2$ dbfs_client dbfs2/dbfs2@rac1:1521/ora11p /test dbfs_client: error while loading shared libraries: libclntsh.so.11.1: cannot open shared object file: No such file or directory
Obviously an oracle library cant be found. To see a list of all dependend libraries we check with “ldd”:
ora11p@rac1#>: ldd $ORACLE_HOME/bin/dbfs_client libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x000000303f400000) libfuse.so.2 => /usr/lib/libfuse.so.2 (0x00002b5e455cb000) libclntsh.so.11.1 => not found libnnz11.so => not found libdl.so.2 => /lib64/libdl.so.2 (0x000000302ca00000) libm.so.6 => /lib64/libm.so.6 (0x000000302c600000) libpthread.so.0 => /lib64/libpthread.so.0 (0x000000302ce00000) libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003030200000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x000000303c400000) libc.so.6 => /lib64/libc.so.6 (0x000000302c200000) librt.so.1 => /lib64/librt.so.1 (0x000000302d600000) /lib64/ld-linux-x86-64.so.2 (0x000000302be00000)
It turns out there are two libraries missing: libclntsh.so.11.1 and libnnz11.so. They are located in $ORACLE_HOME/lib. There are several ways to adjust the library search path:
- Modifying /etc/ld.so.conf to include $ORACLE_HOME/lib
- exporting “LD_LIBRARY_PATH=$ORACLE_HOME/lib”
ora11p@host#>: export LD_LIBRARY_PATH=$ORACLE_HOME/lib ora11p@host#>: ldd dbfs_client libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x000000303f400000) libfuse.so.2 => /usr/lib/libfuse.so.2 (0x00002b84ba9d9000) libclntsh.so.11.1 => /u01/app/oracle/product/11.2.0/ora11p/lib/libclntsh.so.11.1 (0x00002b84babf8000) libnnz11.so => /u01/app/oracle/product/11.2.0/ora11p/lib/libnnz11.so (0x00002b84bd223000) libdl.so.2 => /lib64/libdl.so.2 (0x000000302ca00000) libm.so.6 => /lib64/libm.so.6 (0x000000302c600000) libpthread.so.0 => /lib64/libpthread.so.0 (0x000000302ce00000) libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003030200000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x000000303c400000) libc.so.6 => /lib64/libc.so.6 (0x000000302c200000) librt.so.1 => /lib64/librt.so.1 (0x000000302d600000) /lib64/ld-linux-x86-64.so.2 (0x000000302be00000) libaio.so.1 => /usr/lib64/libaio.so.1 (0x00002b84bd5ed000)
-bash-3.2$ dbfs_client dbfs2@rac1:1521/ora11p /test Password: <WAIT>
echo "dbfs2" > dbfs2password
ora11p@host#>: nohup dbfs_client dbfs2@rac1:1521/ora11p /test < dbfs2password & [1] 24237 ora11p@host#>: nohup: appending output to `nohup.out' ora11p@host#>: df -h Filesystem Size Used Avail Use% Mounted on /dev/sda3 26G 15G 9.6G 60% / /dev/sda1 99M 12M 83M 13% /boot tmpfs 3.0G 1.4G 1.7G 45% /dev/shm /dev/asm/ora11p_home-132 10G 4.7G 5.4G 47% /u01/app/oracle/product/11.2.0/ora11p dbfs 6.4M 192K 6.2M 3% /test
Using the file system
Readers might ask: “Does the file system behave like a normal file system like ext3 do?” – the answer is: YES.
A few screenshots to illustrate it:
Testing de-duplication and compression
When creating the file system we specified high compression and deduplication. Lets see the results:
Compression
For this test we took the file “products.xml” from the 11g Release 2 database install source. The file is approx 930 KB in size but due to xml format good compressible.
ora11p@host#>: df -h . Filesystem Size Used Avail Use% Mounted on dbfs 7.4M 272K 7.1M 4% /test
ora11p@host#>: cp /raw_software/database/stage/products.xml .
ora11p@host#>: ll total 910 -rwxr-xr-x 1 ora11p dba 931475 Oct 8 14:19 products.xml
ora11p@host#>: df -h . Filesystem Size Used Avail Use% Mounted on dbfs 7.4M 352K 7.0M 5% /test
Although the file size is 930 KB the used space has increased from 272 KB to 352 KB – an increase by 80 KB. So the file size has been reduced from 930 KB down to 80 KB.
Deduplication
For testing de-duplication we copied the same file (products.xml) we copied in the previous example a dozen time and check the allocated space:
ora11p@host#>: ll total 15470 -rwxr-xr-x 1 ora11p dba 931475 Oct 8 14:26 1 -rwxr-xr-x 1 ora11p dba 931475 Oct 8 14:27 10 -rwxr-xr-x 1 ora11p dba 931475 Oct 8 14:27 11 -rwxr-xr-x 1 ora11p dba 931475 Oct 8 14:27 12 -rwxr-xr-x 1 ora11p dba 931475 Oct 8 14:27 13 -rwxr-xr-x 1 ora11p dba 931475 Oct 8 14:27 14 -rwxr-xr-x 1 ora11p dba 931475 Oct 8 14:27 15 -rwxr-xr-x 1 ora11p dba 931475 Oct 8 14:27 16 -rwxr-xr-x 1 ora11p dba 931475 Oct 8 14:26 2 -rwxr-xr-x 1 ora11p dba 931475 Oct 8 14:26 3 -rwxr-xr-x 1 ora11p dba 931475 Oct 8 14:26 4 -rwxr-xr-x 1 ora11p dba 931475 Oct 8 14:26 5 -rwxr-xr-x 1 ora11p dba 931475 Oct 8 14:26 6 -rwxr-xr-x 1 ora11p dba 931475 Oct 8 14:26 7 -rwxr-xr-x 1 ora11p dba 931475 Oct 8 14:27 8 -rwxr-xr-x 1 ora11p dba 931475 Oct 8 14:27 9 -rwxr-xr-x 1 ora11p dba 931475 Oct 8 14:19 products.xml ora11p@host#>: df -h . Filesystem Size Used Avail Use% Mounted on dbfs 12M 704K 11M 7% /test
Based on the directory listing above we copied the products.xml 16 times. When compressing the files only each file should add approx 80 KB to the file system. Sixteen files would add up to approx 1.3 MB. But the file systems current size is only 704 KB. So there must be sime sort of de-duplication (or en extremely intelligent compression algorithm) – but remember: this is only a first and short test. I will investigate this further in a next post.
Implementing Failover
Before testing failover we need to create an extra service which will be configured according to the guidelines from oracle:
srvctl add service -d ORA11P -s DBFS -r ora11p1,ora11p2 -y AUTOMATIC -e SELECT -m BASIC -w 1 -z 180 -q TRUE srvctl start service -d ORA11P -s DBFS
To use this service add a connection descriptor to TNSNAMES.ORA:
DBFS = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = rac-scan)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = DBFS) ) )
Now mount the file system normally. In our case we mount the file system on host “rac2” while connected to host “rac1” (inst_id=1)
-bash-3.2$ date Thu Oct 8 16:03:54 CEST 2009 -bash-3.2$ ll total 2730 -rwxr-xr-x 1 ora11p dba 931475 Oct 8 14:26 1 -rwxr-xr-x 1 ora11p dba 931475 Oct 8 14:27 9 -rwxr-xr-x 1 ora11p dba 931475 Oct 8 14:19 products.xml
SQL> select inst_id, username, to_char(logon_time,'dd.mm.yyyy hh24:mi:ss') as TIME from gv$session where username='DBFS2'
INST_ID USERNAME TIME ---------- --------- ------------------- 1 DBFS2 08.10.2009 15:54:17
We now do a “shutdown abort” on instance “ora11p1” (i.e. the instance running on host “rac1” and the instance the dbfs is connected to) and check the file system and sessions:
Shutdown the instance “ora11p1”:
SQL@ORA11P!> shutdown abort;
Alert.log information from instance “ora11p2” showing instance “ora11p1” was shut down at 16:05:26:
Thu Oct 08 16:05:26 2009 Reconfiguration started (old inc 28, new inc 30) List of instances: 2 (myinst: 2) Global Resource Directory frozen * dead instance detected - domain 0 invalid = TRUE Communication channels reestablished Master broadcasted resource hash value bitmaps Non-local Process blocks cleaned out Thu Oct 08 16:05:26 2009 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Set master node info Submitted all remote-enqueue requests Dwn-cvts replayed, VALBLKs dubious All grantable enqueues granted Post SMON to start 1st pass IR Thu Oct 08 16:05:26 2009 Instance recovery: looking for dead threads Beginning instance recovery of 1 threads Submitted all GCS remote-cache requests Post SMON to start 1st pass IR Fix write in gcs resources Reconfiguration complete Started redo scan Completed redo scan read 1725 KB redo, 701 data blocks need recovery Started redo application at Thread 1: logseq 454, block 1674 Recovery of Online Redo Log: Thread 1 Group 2 Seq 454 Reading mem 0 Mem# 0: +DATA/ora11p/onlinelog/group_2.258.698489181 Mem# 1: +DATA2/ora11p/onlinelog/group_2.260.698489185 Completed redo application of 1.01MB Completed instance recovery at Thread 1: logseq 454, block 5124, scn 5701524 561 data blocks read, 723 data blocks written, 1725 redo k-bytes read Thread 1 advanced to log sequence 455 (thread recovery)
Session Information on instance “ora11p2”:
SQL> select inst_id, username, to_char(logon_time,'dd.mm.yyyy hh24:mi:ss') from gv$session where username='DBFS2' INST_ID USERNAME TO_CHAR(LOGON_TIME, ---------- ---------- ------------------- 2 DBFS2 08.10.2009 16:05:29
The session was reconnected to the surviving instance three seconds after the instance “ora11p1” was shut down.
Listing the file system was interrupted shortly as shown below:
-bash-3.2$ date Thu Oct 8 16:05:28 CEST 2009 -bash-3.2$ ll << WAITED a few seconds >> total 2730 -rwxr-xr-x 1 ora11p dba 931475 Oct 8 14:26 1 -rwxr-xr-x 1 ora11p dba 931475 Oct 8 14:27 9 -rwxr-xr-x 1 ora11p dba 931475 Oct 8 14:19 products.xml -bash-3.2$ date Thu Oct 8 16:05:36 CEST 2009
Pingback: Exploring the revolutionary DBFS « ocpdba oracle weblog
The actual dbfs prerequisites on the oracle website are:
DBFS Prerequisites
The dbfs_client can be used as a direct RDBMS client using the DBFS Command Interface only on Linux, Linux.X64, Solaris, Solaris64, AIX and HPUX platforms.
The dbfs_client host must have the Oracle client libraries installed.
The dbfs_client can be used as a mount client only on Linux and Linux.X64 platforms, and the following are also required:
The dbfs_client host must have the kernel-devel package installed to configure and build FUSE.
The dbfs_client host must have the FUSE Linux package installed.
A group named fuse must be created and the user name that is running the dbfs_client must be a member of the fuse group.
Full Disclosure: I am the Product Manager for DBFS, SecureFiles, Total Recall, and other database performance-related features and products at Oracle.
1. In the “Small File Performance of DBFS” section, you say “Note that the performance of DBFS on file systems with tens of thousands of files is not fully tuned in the current revision.” The opposite is true: DBFS is designed to meet or exceed the performance of traditional file systems, and to scale – in terms of the number and size of files – well beyond the limits of file systems. DBFS is not optimized for small files, or for block-style access to files.
2. DBFS is a feature of Oracle Database 11 Release 2, available on all platforms / OS ports. The DBFS client can be used in command-line mode on all platforms, and can be used to mount DBFS file systems only on Linux. For example, you could set up an Oracle 11g Release 2 database on Solaris, create DBFS file systems in that database, and then use Linux clients to mount those DBFS file systems.
3. DBFS in combination with Oracle Total Recall gives the ability to recover deleted or incorrectly modified files without having to restore from a backup. The user simply uses a query with the “AS OF” or “VERSIONS BETWEEN” syntax to find older versions of files in a DBFS file system.
As always, more details are available at http://www.oracle.com.
Is dbfs_client available for Mac OS… since you have said command line mode is available in all platforms. Then how do we get this dbfs_client for Mac.
I’d to confirm that dbfs is not optimize for any types of file access. No way the performance is anywhere close to traditional filesystem. What I’m saying is Kevin J. is full of shit. I used it in the real work and it is beyond slow.
Hey guys, I have a RAC system and DBFS is successfully mounted on both nodes, however, on one of them, I cannot list my disk with df -h. But they are both accessible via location /mnt/dbfs/
I checked /etc/mtab in both nodes, there was my mount point listed.
Any ideas?