Recently i came across a project where they built their own cheap storage. The whole story is documented here and here.
A colleague of me and i saw this project and wondered if this kind of storage could be used for databases as well. So we analyzed the design and noticed some problems from our point of view:
- data access only via HTTP
- they used the JFS file system which is not widely used
- generally hard disks are hot-swappable; but not used due to fear of problems
- optimized for space rather than for speed
- relatively “weak” power supply
So we tried to improve the layout with the following constraints:
- approx. 10.000 Euro (approx. 15.000 US-Dollar) in total
- Storage accessible via multiple protocols:
- NFS
- iSCSI
- NFS
- CIFS
- if possible SAN
- Reliable
- Optimized for speed rather than capacity (remember: we talked about databases)
- Hot-Swapable Harddisks
This is part I of our journey towards building a storage system ourself. Part II is here and Part III here.
Required Components
What components do we need?
- Case
- Power Supply
- Motherboard
- CPU
- Memory
- Storage HBAs for the hard disks (SAS/SATA)
- SAS/SATA Expander if needed
- Network Interface Cards (1 / 10 Gbit/s)
- SAN HBA
- Operating System
Case
Cases in several sized are available by several vendors. Most of them are built in Asia and sold by resellers all over the world.
For supporting hot swapping hard disks the case needed to support this as well. So we looked at several cases and came up with this case here:
- up to 42 hot swappable hard disks
- three redundant power supplies
- room for a quad cpu board
- Backplane
- ten backplanes connecting four drives each
- backplane connected by SFF 8087 (aka. “mini-SAS”) cables with controller
- separate SFF8087 cable required for connecting system drives with controller
- so for connecting all 42 disks (40 + 2) you need 11 SFF8087 cables
We are running this project in Germany so we choose a German vendor. Searching at Google for the case’s name (“RSC-8ED-0Q1”) yield you can buy the case at several vendors, for instance here.
Some pictures and technical data sheet:
Mainboard
According to the data sheet there are two system boards available. We have chosen the Tyan board with the following technical specifications:
- integrated graphic controller
- Slots
- 2x PCI-E x16; x16 sig.
- 2x PCI-E x16; x4 sig.
- 1 PCI 32-bit
- 8 SATA-2-Ports
- 2x USB 2.0
- 3x 1 Gbit/s-NIC
- 4x 1207-pin socket for AMD Opteron (Rev. F) 8000er CPUs
- 16x DDR2-DIMMS (max. 64 GB RAM)
The costs for the system board are approx 250 euros or 368 US-$.
Memory
- system board supports up to 16 DDR2 DIMM Modules
- up to 64 GB Memory
Due to the current low price of memory we decided to put the largest allowed amount of memory (64 GB) at the system board.
One 4 GB memory module costs approx. 80 euros (or 117.8 US-$). So sixteen of these modules add up to 1280 euros or 1885 US-$.
CPU
The board imposes some restrictions on the type of CPU we can use. We decided to fit four quad core CPUs. The CPU chosen was:
- AMD Opteron 2350 (Sockel F, 65nm, Barcelona, OS2350WAL4BGHWOF)
- Socket F
- Clock Frequency: 2.000 MHz
- Quad-Core
- Type of core: Barcelona
- Stepping (Revision): B3-Stepping
- FSB: 1.000 MHz
- QuickPath Interfac
- Hypertransport: 2,0 GT/s
- Second-Level-Cache: 4 x 512 KB
250 Euros (or 368.2 US-$) for one CPU or 1000 euros (or 1472.75 US-$) for four CPUs.
SAS/SATA HBA and Expander Cards
We needed to attach 42 hard disks to the system board. Two disks are attached with a single mini-SAS cable. The remaining fourty disks are attached by ten mini-SAS cables (each cable attaching four disks). So we had to attach a total of eleven mini-SAS (SFF8087) cables.
SAS Cabling
- two disk connected over a single mini-SAS cable directly to the system board
- 20 disks connected over 5 mini-SAS cables to sas-expander A
- sas-expander A connected over one mini-SAS cable to sas-controller A
- 20 disks connected over 5 mini-SAS cables to sas-expander B
- sas-expander B connected over one min-SAS cable to sas-controller B
Storage Layout
Summarizing our cabling layout we have three different failure components from storage point of view:
- sas controller on system board
- sas controller A
- sas controller B
We bear this in mind when designing raid groups lateron.
Components
- 2x SAS Controller: Adaptec 2405 with the following features according to the documentation:
- 128 MB Cache
- Supports 4 direct-attached or up to 128 SATA or SAS disk drives using SAS expanders
- Quick initialization
- Online Capacity Expansion
- Copyback Hot Spare
- Dynamic caching algorithm
- Native Command Queuing (NCQ)
- Background initialization
- Hot-plug drive support
- RAID Level Migration
- Hot spares – global, dedicated, and pooled
- Automatic/manual rebuild of hot spares
- SAF-TE enclosure management
- Configurable stripe size
- S.M.A.R.T. support
- Multiple arrays per disk drive
- Bad stripe table
- Dynamic sector repair
- Staggered drive spin-up
- Bootable array support
- Optimized Disk Utilization
- Controller bietet RAID0, RAID1 und RAID10 in „Hardware“; nutzen wir aber nicht, da wir ZFS nutzen
- 2x SAS Expander: CHENBRO Low Profile 28-port SAS expander card
- Input: up to 6 mini-SAS cables coming from the back plane
- Output: one mini-SAS cable to the raid controller
NICs (1 Gbit/s)
Three NICs already on-board so nothing needs to be done here.
When building the system we have to measure the network performance of these interfaces. If performance is too bad we need to replace the on-board NICs with dedicated NICs for instance from Intel. When doing so the price for 10 GE-NICs should be checked again.
NICs (10 Gbit/s)
Due to lack of support in the core switches we did not embedded a 10 GE-NIC.
If you want to do so the following card will fit:
- Intel 10 GE XF SR NIC PCI-E
- Price: 2300 euro or 3387 US-$
SAN HBAs
Due to lack of a SAN environment we did not add this feature. But if you want to you can turn your storage system into a SAN target (i.e. exports storage) you can do this with the COMSTAR project whipping with Open Solaris.
For a impressing demonstration you can refer here.
For enabling SAN features you need a QLogic (not EMulex) HBA. I recommend two Single-Port HBAs. One single-port HBA costs approx. 500 Euros.
Operating system
A main goal for our storage project were minimal costs. So paying for an operating system was not desired. In addition to that freely available operating system like *BSD, Linux or Solaris are extremely stable, ship with a lot of features and mostly perform much better than commercial products.
We tested several operating systems and finally ended up with the OpenSolaris Project for many resons:
- extremely stable
- updated on regular basis
- wide protocol support available (iSCSI, FCoE, NFS, CIFS, HTTP, …)
- the ZFS file system
- the COMSTAR project
The last two features are the main reasons for chosing OpenSolaris. ZFS is a highly integrated and extremely powerful and flexible file system. Perhaps the most powerful file system currently available while the COMSTAR project enables us to export our storage over SAN.
Storage Layout
Based on the hardware and software configuration we decided not to use the RAID features of our SAS-controller and use ZFS with RAID-DP (RAID with Double Parity) instead. Because parity calculations are cpu resource intensive we fitted four quad-core-cpus.
Due to our cabling we have three main failure components:
- the internal (on-board) storage controller
- sas controller A in PCI-e-Slot
- sas controller B in PCI-e-Slot
For keeping the failure groups separated we designed the following raid configuration:
- two disk attached to the on-board controller:
- one two-way-mirror for the operating system
- twenty disks attached via an sas expander card to sas raid controller A
- one raid group consisting of 20 disks with one Hot Spare, two Parity Disks and 17 data disks
- twenty disk attached via an sas expander card to sas controller B
- one raid group consisting of 20 disks with one Hot Spare, two Parity Disks and 17 data disks
Hard disks
In our calculation there is one component left: The hard disks. We equip 42 of them. While fourty disks are used as storage to be exported two disks are used for the operating system. These two disks can be somewhat smaller.
In the follwing calculations we will evaluate four different scenarios:
- 40 disks with 1 TB capacity each
- 40 “server” disks with 1 TB capacity each
- 40 disks with 1.5 TB each
- 40 disks with 2 TB each
For the operating system we choose two disks with 500 GB each. The total price of these two disk is approx. 100 Euros or 148 US-$.
Scenario #1: 40 disks with 1 TB each
- 1 TB HDD: 53 Euro (for instance: “Maxtor DiamondMax 23”)
- 40x 1 TB HDDs: 2120 Euros
Scenario #2: 40 “server” disks with 1 TB capacity each
- 1 TB “server” HDD: 84 Euros (for instance: “Samsung F1 RAID Class”)
- 40x 1 TB “server” HDDs: 3360 €
Scenario #3: 40 disks with 1.5 TB each
- 1.5 TB HDD: 84 Euros (for instance: “Seagate Barracuda 7200.11”)
- 40x 1.5 TB HDDs: 3360 Euros
Scenario #4: 40 disks with 2 TB each
- 2 TB HDD: 134 Euro (for instance: “WD20000CSRTL2”)
- 40x 2 TB HDDs: 5360 Euros
Calculations
Useable Capacity
- Useable capacity with 1 TB disks:
- 2 pools x (20-1 Hot Spare – 2 Parity ) x 1000 GB ~ 34 TB
- usable capacity approx. 31.6 TB
- Useable capacity with 1,5 TB Disks:
- 2 Pools x (20-1 HS – 2 Parity ) x 1500 GB ~ 51 TB
- usable capacity approx. 47.5 TB
- Useable capacity with 2 TB Disks:
- 2 Pools x (20-1 HS – 2 Parity ) x 2000 GB ~ 68 TB
- usable capacity approx. 63.3 TB
Throughput calculation
- 34 data disks total
- tecnical data details for Seagate 1,5 TB disk:
- Spindle Speed 7,200 rpm
- Average latency 4.16 msec
- Random read seek time <8.5 msec
- Random write seek time <10.0 msec
- Calculation: → avg. seek time = (8,5+10)/2 = 9,25 ms (for a 50/50 mixture of reads and writes)
- IO calculations:
- Rotation per milli secons = 7200 rpm / 60000 = 0,12 ms
- Full Rotation Time = 1/rot. per ms = 1/0,12 = 8.33 ms
- avg. rot. latency = 8,33 / 2 = 4,17 ms
- IO time in ms = avg. seek time + rot. Latency = 4,17 + 9,25 = 13.42 ms
- IOPS = (1/IO time*1000) = 74.53 IOPS
- Total possible IOPS: 34 Disks a 74 IOPS ~~ 2516 IOPS
Costs calculation
System components
Component | Price in Euros |
Case | 2375 € |
system board | 250 € |
4x CPU | 1000 € |
2x Adaptec 2405 SAS Controller | 270 € |
11x SFF8077-SFF8087 Cable 0.5 meters | 228 € |
2x CHENBRO Low Profile 28-port SAS expander card | 356 € |
2x 500 GB HDDs for Operating system | 100 € |
SUM | 5859 € or 8729 US-$ |
Harddisks
Component | Price in Euros | Total price for 40 HDDs in euros |
1 TB HDD | 53 € | 2120 € |
1 TB “server” HDD | 84 € | 3360 € |
1.5 TB HDD | 84 € | 3360 € |
2 TB HDD | 134 € | 5360 € |
Hard disk size | Base components price in Euros | Hard disks price in Euros | Total price in Euros | Total price in US-$ | Price in Euros per TB |
1 TB | 5859 € | 2120 € | 7979 € | 11877 US-$ | 252.5 € |
1 TB “server” | 5859 € | 3360 € | 9219 € | 13735 US-$ | 194 € |
1.5 TB | 5859 € | 3360 € | 9219 € | 13735 US-$ | 194 € |
2 TB | 5859 € | 5360 € | 11219 € | 16714 US-$ | 177.24 € |
Usage scenarios
- cheap, but reliable and performant multi-purpose storage
- if one system seems not reliable enough:
- use two of them
- and mirror mith oracle ASM for highest availability
The authors
Name Volodymr Dubinin Ronny Egner
Company SIV.AG
Konrad-Zuse-Straße 1
18184 Rogentin
GermanyRonny Egner Consulting
Vinckestraße 22
40470 Dusseldorf
Germany
EMail vd@siv.de ronnyegner@gmx.de
Pingback: What is the best software to use when building a server? | Linux Appliance
Pingback: Ronny Egners Blog » Building a custom and cheap storage server yourself – Part II
Pingback: Ronny Egners Blog » Building a custom and cheap storage server yourself – Part III