A few days before i posted a short howto how to configure iSCSI multipathing with Nexenta. This post covers the configuration of the linux initiator using iSCSI multipathing.
Before we start a preleminary note: It is a very good idea (i´d call it: “required”) to use separate subnets for each physical interface. Do NOT use the same subnet accross different network interfaces!
If you do not comply with this simple rule you will end up having problems with so called Arp Flux (also documented here, here, here and so on) which requires further modifications.
For configuring and using iSCSI multipathing the following packages are needed:
- device-mapper-multipath
- device-mapper-multipath-libs
- iscsi-initiator-utils
Our testlab used a VM based on Oracle Enterprise Linux 6 Update 2 with two physical interfaces:
- eth1: 192.168.1.200/24
- eth3: 192.168.10.2/24
Multipathing
Why choosing Multipath over network bonding?
Multipathing and network bonding both protect against failing network devices such as cables, switches and network cards and at the same time provde higher throughput by using more than one interface.
Problem 1: Speed aggregation
When using network bonding for port aggregation you need to decide which algorithm should be used to distribute the packages over the available interfaces. For all except the ‘active / backup’ algorithm both the switch and the operating system must support the chosen algorithm. This includes the widely used LACP algorithm. Another downside is that algorithms such as LACP cannot be used accross physically independent switches.
In addition to that all algorithms except the ’round robin’ altgorithmn do NOT offer a speed improvement for a single connection beyond the speed of a single interface. However by distributing the different connections over all available interfaces the total throughout for all connections can indeed be higher than 1 Gbit/s (if your systems consists of several NICs with 1 Gbit/s speed).
The only exception to this rule is the ’round robin’ algorithm which offers the whole aggregated network speed even for a single connection. The downside is that most switches do not support this algorithmn.
In contrast Multipathing offers real aggregation by distributing the data over all available paths hence effectively aggregating the throughput.
Problem 2: High Availability
While every network bonding algorithmn protects against failing network cards and cables protection against failing switches is difficult. All algorithmns except ‘active / backup’ require the switches being used to communicate with each other and explicitly offer support for this configuration which is only available in expensive switches.
Multipathing handles each connection separately and does not require any support in the switches firmware.
Basic Configuration
Create a file /etc/multipath.conf with the following content:
[root@mhvtl media]# cat /etc/multipath.conf defaults { udev_dir /dev polling_interval 10 path_selector "round-robin 0" path_grouping_policy multibus path_checker readsector0 rr_min_io 100 max_fds 8192 rr_weight priorities failback immediate no_path_retry fail user_friendly_names yes }
Note that
no_path_retry fail
is required for any kind of multipathing to work. Without this setting I/Os on failed paths end up being queued causing the whole device to become unresponsible.
iSCSI: Connecting to the Target
Step #1: Discover iSCSI Targets
[root@mhvtl iscsi]# iscsiadm -m discovery -t sendtargets -p 192.168.10.1 Starting iscsid: [ OK ] 192.168.10.1:3260,2 iqn.1986-03.com.sun:02:f1bb8f6d-b6d2-c3f7-cfb9-ff56689144d2 192.168.1.5:3260,3 iqn.1986-03.com.sun:02:f1bb8f6d-b6d2-c3f7-cfb9-ff56689144d2
Step #2: Login to the Targets
Portal #1: 192.168.10.1
[root@mhvtl iscsi]# iscsiadm -m node --target iqn.1986-03.com.sun:02:f1bb8f6d-b6d2-c3f7-cfb9-ff56689144d2 --portal 192.168.10.1 --login Logging in to [iface: default, target: iqn.1986-03.com.sun:02:f1bb8f6d-b6d2-c3f7-cfb9-ff56689144d2, portal: 192.168.10.1,3260] (multiple) Login to [iface: default, target: iqn.1986-03.com.sun:02:f1bb8f6d-b6d2-c3f7-cfb9-ff56689144d2, portal: 192.168.10.1,3260] successful.
Portal #2: 192.168.1.5
[root@mhvtl iscsi]# iscsiadm -m node --target iqn.1986-03.com.sun:02:f1bb8f6d-b6d2-c3f7-cfb9-ff56689144d2 --portal 192.168.1.5 --login Logging in to [iface: default, target: iqn.1986-03.com.sun:02:f1bb8f6d-b6d2-c3f7-cfb9-ff56689144d2, portal: 192.168.1.5,3260] (multiple) Login to [iface: default, target: iqn.1986-03.com.sun:02:f1bb8f6d-b6d2-c3f7-cfb9-ff56689144d2, portal: 192.168.1.5,3260] successful.
Status query
[root@mhvtl iscsi]# iscsiadm --mode node 192.168.1.5:3260,3 iqn.1986-03.com.sun:02:f1bb8f6d-b6d2-c3f7-cfb9-ff56689144d2 192.168.10.1:3260,2 iqn.1986-03.com.sun:02:f1bb8f6d-b6d2-c3f7-cfb9-ff56689144d2
/var/log/messages
Connection to portal 192.168.10.1:
Jun 24 18:03:51 mhvtl kernel: scsi6 : iSCSI Initiator over TCP/IP Jun 24 18:03:51 mhvtl kernel: scsi 6:0:0:0: Direct-Access NEXENTA NEXENTASTOR 1.0 PQ: 0 ANSI: 5 Jun 24 18:03:51 mhvtl kernel: sd 6:0:0:0: Attached scsi generic sg3 type 0 Jun 24 18:03:51 mhvtl kernel: sd 6:0:0:0: [sdc] 10485760 512-byte logical blocks: (5.36 GB/5.00 GiB) Jun 24 18:03:51 mhvtl kernel: sd 6:0:0:0: [sdc] Write Protect is off Jun 24 18:03:51 mhvtl kernel: sd 6:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Jun 24 18:03:51 mhvtl kernel: sdc: detected capacity change from 0 to 5368709120 Jun 24 18:03:51 mhvtl kernel: sdc: sdc1 Jun 24 18:03:51 mhvtl kernel: sd 6:0:0:0: [sdc] Attached SCSI disk Jun 24 18:03:51 mhvtl multipathd: sdc: add path (uevent) Jun 24 18:03:51 mhvtl kernel: device-mapper: multipath round-robin: version 1.0.0 loaded Jun 24 18:03:51 mhvtl multipathd: mpatha: load table [0 10485760 multipath 0 0 1 1 round-robin 0 1 1 8:32 1] Jun 24 18:03:51 mhvtl multipathd: mpatha: event checker started Jun 24 18:03:51 mhvtl multipathd: sdc path added to devmap mpatha Jun 24 18:03:52 mhvtl iscsid: Connection4:0 to [target: iqn.1986-03.com.sun:02:f1bb8f6d-b6d2-c3f7-cfb9- ff56689144d2, portal: 192.168.10.1,3260] through [iface: default] is operational now
Connection to portal 192.168.1.5:
Jun 24 18:04:01 mhvtl kernel: connection3:0: detected conn error (1020) Jun 24 18:04:01 mhvtl iscsid: Connection3:0 to [target: iqn.1986-03.com.sun:02:f1bb8f6d-b6d2-c3f7-cfb9- ff56689144d2, portal: 192.168.1.5,3260] through [iface: default] is shutdown. Jun 24 18:04:03 mhvtl kernel: scsi7 : iSCSI Initiator over TCP/IP Jun 24 18:04:04 mhvtl kernel: scsi 7:0:0:0: Direct-Access NEXENTA NEXENTASTOR 1.0 PQ: 0 ANSI: 5 Jun 24 18:04:04 mhvtl kernel: sd 7:0:0:0: Attached scsi generic sg4 type 0 Jun 24 18:04:04 mhvtl kernel: sd 7:0:0:0: [sdd] 10485760 512-byte logical blocks: (5.36 GB/5.00 GiB) Jun 24 18:04:04 mhvtl kernel: sd 7:0:0:0: [sdd] Write Protect is off Jun 24 18:04:04 mhvtl kernel: sd 7:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Jun 24 18:04:04 mhvtl kernel: sdd: detected capacity change from 0 to 5368709120 Jun 24 18:04:04 mhvtl kernel: sdd: sdd1 Jun 24 18:04:04 mhvtl kernel: sd 7:0:0:0: [sdd] Attached SCSI disk Jun 24 18:04:04 mhvtl multipathd: sdd: add path (uevent) Jun 24 18:04:04 mhvtl multipathd: mpatha: load table [0 10485760 multipath 0 0 1 1 round-robin 0 2 1 8:32 1 8:48 1] Jun 24 18:04:04 mhvtl multipathd: sdd path added to devmap mpatha Jun 24 18:04:04 mhvtl iscsid: Connection5:0 to [target: iqn.1986-03.com.sun:02:f1bb8f6d-b6d2-c3f7-cfb9- ff56689144d2, portal: 192.168.1.5,3260] through [iface: default] is operational now
Multipath Status
[root@mhvtl iscsi]# multipath -ll mpatha (3600144f02ba6460000004fe7032f0001) dm-1 NEXENTA,NEXENTASTOR size=5.0G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=1 status=active |- 6:0:0:0 sdc 8:32 active ready running `- 7:0:0:0 sdd 8:48 active ready running
Testing Multipath
After setting up and configuring multipathing it is highly recommended to test the behavior if paths fail. To do a simple test i stopped the interface “eth3” which leaves one remaining interface:
ifconfig eth3 down
/var/log/messages shows:
Jun 24 18:05:34 mhvtl kernel: connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4299040130, last ping 4299045130, now 4299050130 Jun 24 18:05:34 mhvtl kernel: connection4:0: detected conn error (1011) Jun 24 18:05:35 mhvtl multipathd: mpatha: sdc - readsector0 checker reports path is down Jun 24 18:05:35 mhvtl multipathd: checker failed path 8:32 in map mpatha Jun 24 18:05:35 mhvtl multipathd: mpatha: remaining active paths: 1 Jun 24 18:05:35 mhvtl kernel: device-mapper: multipath: Failing path 8:32. Jun 24 18:05:35 mhvtl iscsid: Kernel reported iSCSI connection 4:0 error (1011) state (3)
As you can see multipathd recognized the failing path and stopped I/O using the underlying device (in our test “/dev/sdc”):
[root@mhvtl media]# multipath -ll mpatha (3600144f02ba6460000004fe7032f0001) dm-1 NEXENTA,NEXENTASTOR size=5.0G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=1 status=active |- 6:0:0:0 sdc 8:32 failed faulty running `- 7:0:0:0 sdd 8:48 active ready running
In our test there was a small I/O stall of approx 5 seconds after which everything continued to function normally. When re-enabling network interface eth3 multipathd automatically repaired the faulty path.