Home > Oracle 11g Release 2, Oracle in general > The Oracle Database File System (DBFS)

The Oracle Database File System (DBFS)

The following post is a summary of my tests with DBFS on a 11g Release 2 RAC.

DBFS – An Overview

Since the introduction of LOBs the database was used to store all kinds of files, for instance images, audio files, video files or even text files. Starting with Oracle 11g Release 1 introduced SecureFiles which introduced compression and de-duplication.

Oracle Database Filesystem (DBFS) creates a mountable file system which can be used to access files stored in the database as SecureFile LOBs. DBFS is a shared file system like NFS and consists of a server (the database) and a client. Just like a traditional database connection server and client can be on different systems which communicate over SQLNet. The same applies to DBFS as well.

The following figure illustrates DBFS:

dbfs

(Source: http://download.oracle.com/docs/cd/E11882_01/appdev.112/e10645/adlob_fs.htm)

Install DBFS

Requirements

The following requirements must be met to use DBFS:

  • only available for 32-bit and 64-bit Linux
  • installed FUSE package
  • installed kernel development package
  • installed oracle client libraries

Install FUSE

  • Download FUSE in Version 2.7.3 from http://fuse.sourceforge.net
    (Note: I initially tested with FUSE 2.8.1 but was not able to get it working; so stick with 2.7.3 or 2.7.4)
  • Install kernel development packages matching your kernel (to query your kernel type “uname -a”)
  • Compile and install FUSE:
    • tar -xvfz fuse-2.7.3.tar.gz
    • cd fuse-2.7.3
    • ./configure –prefix=/usr –with-kernel=/usr/src/kernels/`uname -r`-`uname -p`
    • make
    • make install
    • depmod
    • modprobe fuse
    • chmod 666 /dev/fused
    • echo “/sbin/modprobe fuse” >> /etc/rc.modules
    • ldconfig

Create DBFS File System

Creating a user to store the file system in

Prior creating a DBFS File system you must create a user for storing the file system in. DBFS file systems can be created for every user. The role DBFS_ROLE must be granted to users having dbfs file system, for instance:

create user dbfs identified by dbfs default tablespace
users quota unlimited on users;

grant dbfs_role to dbfs;
grant create session to dbfs;
grant create procedure to dbfs;
grant create table ro dbfs;
alter user dbfs default role all;

Create a simple file system

To create the DBFS file system execute the script “dbfs_create_filesystem.sql” being connected as user to store the file system in. The syntax is:

sqlplus <username>/<password>@<database>
@$ORACLE_HOME/rdbms/admin/dbfs_create_filesystem.sql <tablespace name>
<file system name>

To create a file system for user dbfs created earlier and storing the content in tablespace USERS and naming the file system TESTDBFS:

sqlplus dbfs/dbfs@ora11p @$ORACLE_HOME/rdbms/admin/dbfs_create_filesystem.sql
USERS TESTDBFS

dbfs_create_filesystem.sql creates a partitioned file system by creating multiple physical segments in the database and distributing files randomly in it. In cases when the files are extremely big and make a big percentage of the total file system this can cause an error “ENOSPC” to appear even if the file system is not full. In such cases the script “dbfs_create_filesystem_advances.sql” can be used to create a non-partitioned file system.

Create an advanced file system

Creating a standard file system does not use de-duplication or compression. To use this features you have to create the file system using the script “dbfs_create_filesystem_advanced.sql”. The calling syntax is:

sqlplus <username>/<password>@<database>
@$ORACLE_HOME/rdbms/admin/dbfs_create_filesystem_advanced.sql
<tablespace name> <file system name>
<compress-high | compress-medium | compress-low | nocompress>
<deduplicate | nodeduplicate>
<encrypt | noencrypt>
<partition | non-partition>

To create a de-duplicated, highly compressed, not encrypted and partitioned file system for user dbfs2 stored in tablespace USERS:

sqlplus dbfs2/dbfs2@ora11p
@$ORACLE_HOME/rdbms/admin/dbfs_create_filesystem_advanced.sql
USERS
TESTDBFS2
compress-high
deduplicate
noencrypt
partition

Dropping a File System

To drop a file system the script “dbfs_drop_filesystem.sql” must be used. Calling syntax is:

sqlplus <username>/<password>@<database>
@$ORACLE_HOME/rdbms/admin/dbfs_drop_filesystem.sql <file system name>

Mounting a DBFS file system

Mounting a DBFS file system used FUSE (file system in user space) so every user (even non-priviledged users) can mount dbfs file systems and to separate file systems between users. With FUSE it is possible to mount two completely different file systems to the same mountpoint onyl separated by the users looking at it. (see below for an example)

To mount a dbfs file system use the userspace tool “dbfs_client” which is located in $ORACLE_HOME/bin. It is important that LD_LIBRARY_PATH includes the correct path to the oracle libraries. In addition to that remember dbfs_client will NOT return until the file system is unmounted.

The calling syntax is:

Usage: dbfs_client <db_user>@<db_server> [options] <mount point>

    db_user: Name of Database user that owns DBFS content store filesystem(s)
    db_server:   A valid connect string Oracle database server
                 (for example, hrdb_host:1521/hrservice).
    mount point: Path to mount Database File System(s).
                 All the file systems owned by the database user will be seen
                 at the mount point.

DBFS options:
  -o direct_io   Bypasses the Linux page cache.  Gives much better performance
                 for large files.
                 Programs in the file system cannot be executed with this option.
                 This option is recommended when DBFS is used as an ETL staging
                 area.
  -o wallet      Run dbfs_client in background. Wallet must be configured to get
                 credentials.
  -o failover    DBFS Client fails over to surviving database instance with no
                 data loss.
                 Some performance cost on writes, especially for small files.
  -o allow_root  Allows root access to the filesystem.
                 This option requires setting 'user_allow_other' parameter in
                 /etc/fuse.conf
  -o allow_other    Allows other users access to the filesystem.
                    This option requires setting 'user_allow_other' parameter in
                    /etc/fuse.conf
  -o rw             Mount the filesystem read-write [Default]
  -o ro             Mount the filesystem read-only. Files cannot be modified.
  -o trace_level=N  Trace Level: 1->DEBUG, 2->INFO, 3->WARNING, 4->ERROR,
                    5->CRITICAL [Default: 4]
  -o trace_file     <file> | 'syslog'
  -h help
  -V version

(Source: http://download.oracle.com/docs/cd/E11882_01/appdev.112/e10645/adlob_client.htm)

Unmounting a DBFS

Unmounting a dbfs can be achieved with:

fusermount -u <mount point>

Mounting DBFS Through fstab

Mounting a DBFS through fstab is a little bit complicated and requires oracle wallet to be used to store the passwords. See the DBFS documentation for more information.

DBFS restrictions

DBFS imposes the following restrictions:

  • no ioctl
  • no locking
  • no memory-mapped files
  • no Async IOs
  • no O_DIRECT file open flags
  • no hard links

DBFS Client Failover

According to the oracle documentation failover can be achieved by modifying the service as shown below:

exec DBMS_SERVICE.MODIFY_SERVICE(service_name => ‘service_name’,
aq_ha_notifications => true,
failover_method => ‘BASIC’,
failover_type => ‘SELECT’,
failover_retries => 180,
failover_delay => 1);

Instead of modifying the default service a new service named “DBFS” was created and configured accordingly to the example given above.

For POLICY MANAGED Clusters issue:

srvctl add service -d ORA11P -s DBFS -g Default -y AUTOMATIC
-e SELECT -m BASIC -w 1 -z 180 -q TRUE

(“-g Default” specifies the name of the cluster group)

For ADMIN MANAGED clusters issue:

srvctl add service -d ORA11P -s DBFS -r ora11p1,ora11p2 -y AUTOMATIC
-e SELECT -m BASIC -w 1 -z 180 -q TRUE

(“-r ora11p1,ora11p2″ specified the instances the service runs)

Mounting the file system requires the parameter “-o failover” to be added:

ora11p@rac1#>: dbfs_client <db_user>@<db_server> -o failover /mnt/dbfs

Sharing and Caching

DBFS file system can be accessed by multiple clients at the same time. Sharing and caching behavior is like NFS: The client caches writes and flushes them after a timeout or a file handle was closed. Writes to files are visible to other clients after the file handle was closed. Bypassing client side write caching can be achieved by specifying switch O_SYNC opening the file.

Backup of DBFS

  • backing up the database also backs up the dbfs in a consistent way
  • backup up the dbfs as a normal file system enables single file restores

Small File Performance of DBFS

Note that the performance of DBFS on file systems with tens of thousands of files is not fully tuned in the current revision.

Some examples

File system separation between users

The following three listings show the output of “df -h” from three different users: grid, ora11p and root. Note that “ora11p” mounted the dbfs file system to /test4 and is the only user which is able to use the file system:

-bash-3.2$ id
uid=500(grid) gid=500(dba) groups=500(dba)
-bash-3.2$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda3              26G   15G  9.6G  60% /
/dev/sda1              99M   12M   83M  13% /boot
tmpfs                 3.0G  1.4G  1.7G  45% /dev/shm
/dev/asm/ora11p_home-132 10G  4.7G  5.4G  47%
                   /u01/app/oracle/product/11.2.0/ora11p
-bash-3.2$ id
uid=501(ora11p) gid=500(dba) groups=500(dba)
-bash-3.2$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda3              26G   15G  9.6G  60% /
/dev/sda1              99M   12M   83M  13% /boot
tmpfs                 3.0G  1.4G  1.7G  45% /dev/shm
/dev/asm/ora11p_home-132 10G  4.7G  5.4G  47%
                   /u01/app/oracle/product/11.2.0/ora11p
dbfs                   14M  3.3M   11M  25% /test4
[root@rac1 ~]# id
uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel)
[root@rac1 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda3              26G   15G  9.6G  60% /
/dev/sda1              99M   12M   83M  13% /boot
tmpfs                 3.0G  1.4G  1.7G  45% /dev/shm
/dev/asm/ora11p_home-132 10G  4.7G  5.4G  47%
                    /u01/app/oracle/product/11.2.0/ora11p

Even root is by default not allowed to see the contents of /test4. This setting can be influenced by a mount parameter (see above):

[root@rac1 ~]# id
uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel)
[root@rac1 ~]# cd /test4
-bash: cd: /test4: Permission denied

Mounting a file system

The command syntax to mount a file system is described above. In the following we wil try to mount the highly compressed and de-duplicated database file system from user “dbfs2″ created above. Trying to mount this file system results in the following error message:

-bash-3.2$ dbfs_client dbfs2/dbfs2@rac1:1521/ora11p /test
dbfs_client: error while loading shared libraries:
libclntsh.so.11.1: cannot open shared object file: No such file or directory

Obviously an oracle library cant be found. To see a list of all dependend libraries we check with “ldd”:

ora11p@rac1#>: ldd $ORACLE_HOME/bin/dbfs_client
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x000000303f400000)
libfuse.so.2 => /usr/lib/libfuse.so.2 (0x00002b5e455cb000)
libclntsh.so.11.1 => not found
libnnz11.so => not found
libdl.so.2 => /lib64/libdl.so.2 (0x000000302ca00000)
libm.so.6 => /lib64/libm.so.6 (0x000000302c600000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x000000302ce00000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003030200000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x000000303c400000)
libc.so.6 => /lib64/libc.so.6 (0x000000302c200000)
librt.so.1 => /lib64/librt.so.1 (0x000000302d600000)
/lib64/ld-linux-x86-64.so.2 (0x000000302be00000)

It turns out there are two libraries missing: libclntsh.so.11.1 and libnnz11.so. They are located in $ORACLE_HOME/lib. There are several ways to adjust the library search path:

  1. Modifying /etc/ld.so.conf to include $ORACLE_HOME/lib
  2. exporting “LD_LIBRARY_PATH=$ORACLE_HOME/lib”
For our tests we choose to export LD_LIBRARY_PATH which adds paths to look for libraries to the user’s environment:
ora11p@host#>: export LD_LIBRARY_PATH=$ORACLE_HOME/lib
ora11p@host#>: ldd dbfs_client
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x000000303f400000)
libfuse.so.2 => /usr/lib/libfuse.so.2 (0x00002b84ba9d9000)
libclntsh.so.11.1 => /u01/app/oracle/product/11.2.0/ora11p/lib/libclntsh.so.11.1 (0x00002b84babf8000)
libnnz11.so => /u01/app/oracle/product/11.2.0/ora11p/lib/libnnz11.so (0x00002b84bd223000)
libdl.so.2 => /lib64/libdl.so.2 (0x000000302ca00000)
libm.so.6 => /lib64/libm.so.6 (0x000000302c600000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x000000302ce00000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003030200000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x000000303c400000)
libc.so.6 => /lib64/libc.so.6 (0x000000302c200000)
librt.so.1 => /lib64/librt.so.1 (0x000000302d600000)
/lib64/ld-linux-x86-64.so.2 (0x000000302be00000)
libaio.so.1 => /usr/lib64/libaio.so.1 (0x00002b84bd5ed000)
In order to make this change permanent you can either modify /etc/ld.so.conf or add the export command to the user’s profile file.
Now we are able to mount the dbfs file system:
-bash-3.2$ dbfs_client dbfs2@rac1:1521/ora11p /test
Password:
<WAIT>
Remember: dbfs_client will not return until the file system has been unmounted. To free the terminal you put the into a file and mount:
echo "dbfs2" > dbfs2password
ora11p@host#>: nohup dbfs_client dbfs2@rac1:1521/ora11p /test  < dbfs2password &
[1] 24237
ora11p@host#>: nohup: appending output to `nohup.out'

ora11p@host#>: df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda3              26G   15G  9.6G  60% /
/dev/sda1              99M   12M   83M  13% /boot
tmpfs                 3.0G  1.4G  1.7G  45% /dev/shm
/dev/asm/ora11p_home-132  10G  4.7G  5.4G  47%
                     /u01/app/oracle/product/11.2.0/ora11p
dbfs                  6.4M  192K  6.2M   3% /test

Using the file system

Readers might ask: “Does the file system behave like a normal file system like ext3 do?” – the answer is: YES.

A few screenshots to illustrate it:

dbfs1

dbfs2

dbfs3

dbfs4

dbfs5

Testing de-duplication and compression

When creating the file system we specified high compression and deduplication. Lets see the results:

Compression

For this test we took the file “products.xml” from the 11g Release 2 database install source. The file is approx 930 KB in size but due to xml format good compressible.

ora11p@host#>: df -h .
Filesystem            Size  Used Avail Use% Mounted on
dbfs                  7.4M  272K  7.1M   4% /test
ora11p@host#>: cp /raw_software/database/stage/products.xml  .
ora11p@host#>: ll
total 910
-rwxr-xr-x 1 ora11p dba 931475 Oct  8 14:19 products.xml
ora11p@host#>: df -h .
Filesystem            Size  Used Avail Use% Mounted on
dbfs                  7.4M  352K  7.0M   5% /test

Although the file size is 930 KB the used space has increased from 272 KB to 352 KB – an increase by 80 KB. So the file size has been reduced from 930 KB down to 80 KB.

Deduplication

For testing de-duplication we copied the same file (products.xml) we copied in the previous example a dozen time and check the allocated space:

ora11p@host#>: ll
total 15470
-rwxr-xr-x 1 ora11p dba 931475 Oct  8 14:26 1
-rwxr-xr-x 1 ora11p dba 931475 Oct  8 14:27 10
-rwxr-xr-x 1 ora11p dba 931475 Oct  8 14:27 11
-rwxr-xr-x 1 ora11p dba 931475 Oct  8 14:27 12
-rwxr-xr-x 1 ora11p dba 931475 Oct  8 14:27 13
-rwxr-xr-x 1 ora11p dba 931475 Oct  8 14:27 14
-rwxr-xr-x 1 ora11p dba 931475 Oct  8 14:27 15
-rwxr-xr-x 1 ora11p dba 931475 Oct  8 14:27 16
-rwxr-xr-x 1 ora11p dba 931475 Oct  8 14:26 2
-rwxr-xr-x 1 ora11p dba 931475 Oct  8 14:26 3
-rwxr-xr-x 1 ora11p dba 931475 Oct  8 14:26 4
-rwxr-xr-x 1 ora11p dba 931475 Oct  8 14:26 5
-rwxr-xr-x 1 ora11p dba 931475 Oct  8 14:26 6
-rwxr-xr-x 1 ora11p dba 931475 Oct  8 14:26 7
-rwxr-xr-x 1 ora11p dba 931475 Oct  8 14:27 8
-rwxr-xr-x 1 ora11p dba 931475 Oct  8 14:27 9
-rwxr-xr-x 1 ora11p dba 931475 Oct  8 14:19 products.xml
ora11p@host#>: df -h .
Filesystem            Size  Used Avail Use% Mounted on
dbfs                   12M  704K   11M   7% /test

Based on the directory listing above we copied the products.xml 16 times. When compressing the files only each file should add approx 80 KB to the file system. Sixteen files would add up to approx 1.3 MB. But the file systems current size is only 704 KB. So there must be sime sort of de-duplication (or en extremely intelligent compression algorithm) -  but remember: this is only a first and short test. I will investigate this further in a next post.

Implementing Failover

Before testing failover we need to create an extra service which will be configured according to the guidelines from oracle:

srvctl add service -d ORA11P -s DBFS -r ora11p1,ora11p2
-y AUTOMATIC -e SELECT -m BASIC -w 1 -z 180 -q TRUE

srvctl start service -d ORA11P -s DBFS

To use this service add a connection descriptor to TNSNAMES.ORA:

DBFS =
 (DESCRIPTION =
 (ADDRESS = (PROTOCOL = TCP)(HOST = rac-scan)(PORT = 1521))
 (CONNECT_DATA =
 (SERVER = DEDICATED)
 (SERVICE_NAME = DBFS)
 )
 )

Now mount the file system normally. In our case we mount the file system on host “rac2″ while connected to host “rac1″ (inst_id=1)

-bash-3.2$ date
 Thu Oct  8 16:03:54 CEST 2009
 -bash-3.2$ ll
 total 2730
 -rwxr-xr-x 1 ora11p dba 931475 Oct  8 14:26 1
 -rwxr-xr-x 1 ora11p dba 931475 Oct  8 14:27 9
 -rwxr-xr-x 1 ora11p dba 931475 Oct  8 14:19 products.xml
SQL> select inst_id, username, to_char(logon_time,'dd.mm.yyyy hh24:mi:ss') as
 TIME from gv$session where username='DBFS2'
INST_ID USERNAME  TIME
 ---------- --------- -------------------
 1 DBFS2     08.10.2009 15:54:17

We now do a “shutdown abort” on instance “ora11p1″ (i.e. the instance running on host “rac1″ and the instance the dbfs is connected to) and check the file system and sessions:

Shutdown the instance “ora11p1″:

SQL@ORA11P!> shutdown abort;

Alert.log information from instance “ora11p2″ showing instance “ora11p1″ was shut down at 16:05:26:

Thu Oct 08 16:05:26 2009
Reconfiguration started (old inc 28, new inc 30)
List of instances:
 2 (myinst: 2)
 Global Resource Directory frozen
 * dead instance detected - domain 0 invalid = TRUE
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
Thu Oct 08 16:05:26 2009
 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
 Set master node info
 Submitted all remote-enqueue requests
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
 Post SMON to start 1st pass IR
Thu Oct 08 16:05:26 2009
Instance recovery: looking for dead threads
Beginning instance recovery of 1 threads
 Submitted all GCS remote-cache requests
 Post SMON to start 1st pass IR
 Fix write in gcs resources
Reconfiguration complete
Started redo scan
Completed redo scan
 read 1725 KB redo, 701 data blocks need recovery
Started redo application at
 Thread 1: logseq 454, block 1674
Recovery of Online Redo Log: Thread 1 Group 2 Seq 454 Reading mem 0
 Mem# 0: +DATA/ora11p/onlinelog/group_2.258.698489181
 Mem# 1: +DATA2/ora11p/onlinelog/group_2.260.698489185
Completed redo application of 1.01MB
Completed instance recovery at
 Thread 1: logseq 454, block 5124, scn 5701524
 561 data blocks read, 723 data blocks written, 1725 redo k-bytes read
Thread 1 advanced to log sequence 455 (thread recovery)

Session Information on instance “ora11p2″:

SQL> select inst_id, username, to_char(logon_time,'dd.mm.yyyy hh24:mi:ss')
from gv$session where username='DBFS2'

 INST_ID   USERNAME   TO_CHAR(LOGON_TIME,
---------- ---------- -------------------
 2         DBFS2      08.10.2009 16:05:29

The session was reconnected to the surviving instance three seconds after the instance “ora11p1″ was shut down.

Listing the file system was interrupted shortly as shown below:

-bash-3.2$ date
Thu Oct  8 16:05:28 CEST 2009
-bash-3.2$ ll

<< WAITED a few seconds >>

total 2730
-rwxr-xr-x 1 ora11p dba 931475 Oct  8 14:26 1
-rwxr-xr-x 1 ora11p dba 931475 Oct  8 14:27 9
-rwxr-xr-x 1 ora11p dba 931475 Oct  8 14:19 products.xml
-bash-3.2$ date
Thu Oct  8 16:05:36 CEST 2009
  1. Herve Etche
    April 6th, 2010 at 21:56 | #1

    The actual dbfs prerequisites on the oracle website are:

    DBFS Prerequisites
    The dbfs_client can be used as a direct RDBMS client using the DBFS Command Interface only on Linux, Linux.X64, Solaris, Solaris64, AIX and HPUX platforms.

    The dbfs_client host must have the Oracle client libraries installed.

    The dbfs_client can be used as a mount client only on Linux and Linux.X64 platforms, and the following are also required:

    The dbfs_client host must have the kernel-devel package installed to configure and build FUSE.

    The dbfs_client host must have the FUSE Linux package installed.

    A group named fuse must be created and the user name that is running the dbfs_client must be a member of the fuse group.

  2. Kevin Jernigan
    April 13th, 2010 at 14:30 | #2

    Full Disclosure: I am the Product Manager for DBFS, SecureFiles, Total Recall, and other database performance-related features and products at Oracle.

    1. In the “Small File Performance of DBFS” section, you say “Note that the performance of DBFS on file systems with tens of thousands of files is not fully tuned in the current revision.” The opposite is true: DBFS is designed to meet or exceed the performance of traditional file systems, and to scale – in terms of the number and size of files – well beyond the limits of file systems. DBFS is not optimized for small files, or for block-style access to files.

    2. DBFS is a feature of Oracle Database 11 Release 2, available on all platforms / OS ports. The DBFS client can be used in command-line mode on all platforms, and can be used to mount DBFS file systems only on Linux. For example, you could set up an Oracle 11g Release 2 database on Solaris, create DBFS file systems in that database, and then use Linux clients to mount those DBFS file systems.

    3. DBFS in combination with Oracle Total Recall gives the ability to recover deleted or incorrectly modified files without having to restore from a backup. The user simply uses a query with the “AS OF” or “VERSIONS BETWEEN” syntax to find older versions of files in a DBFS file system.

    As always, more details are available at http://www.oracle.com.

  3. Rajkumar
    June 29th, 2012 at 01:45 | #3

    Is dbfs_client available for Mac OS… since you have said command line mode is available in all platforms. Then how do we get this dbfs_client for Mac.

  1. April 5th, 2010 at 13:04 | #1