Linux: Configuring iSCSI Multipathing

A few days before i posted a short howto how to configure iSCSI multipathing with Nexenta. This post covers the configuration of the linux initiator using iSCSI multipathing.

Before we start a preleminary note: It is a very good idea (i´d call it: “required”) to use separate subnets for each physical interface. Do NOT use the same subnet accross different network interfaces!

If you do not comply with this simple rule you will end up having problems with so called Arp Flux (also documented here, here, here and so on) which requires further modifications.

For configuring and using iSCSI multipathing the following packages are needed:

device-mapper-multipath
device-mapper-multipath-libs
iscsi-initiator-utils

Our testlab used a VM based on Oracle Enterprise Linux 6 Update 2 with two physical interfaces:

eth1: 192.168.1.200/24
eth3: 192.168.10.2/24

Multipathing

Why choosing Multipath over network bonding?

Multipathing and network bonding both protect against failing network devices such as cables, switches and network cards and at the same time provde higher throughput by using more than one interface.

Problem 1: Speed aggregation

When using network bonding for port aggregation you need to decide which algorithm should be used to distribute the packages over the available interfaces. For all except the ‘active / backup’ algorithm both the switch and the operating system must support the chosen algorithm. This includes the widely used LACP algorithm. Another downside is that algorithms such as LACP cannot be used accross physically independent switches.

In addition to that all algorithms except the ’round robin’ altgorithmn do NOT offer a speed improvement for a single connection beyond the speed of a single interface. However by distributing the different connections over all available interfaces the total throughout for all connections can indeed be higher than 1 Gbit/s (if your systems consists of several NICs with 1 Gbit/s speed).

The only exception to this rule is the ’round robin’ algorithm which offers the whole aggregated network speed even for a single connection. The downside is that most switches do not support this algorithmn.

In contrast Multipathing offers real aggregation by distributing the data over all available paths hence effectively aggregating the throughput.

Problem 2: High Availability

While every network bonding algorithmn protects against failing network cards and cables protection against failing switches is difficult. All algorithmns except ‘active / backup’ require the switches being used to communicate with each other and explicitly offer support for this configuration which is only available in expensive switches.

Multipathing handles each connection separately and does not require any support in the switches firmware.

Basic Configuration

Create a file /etc/multipath.conf with the following content:

[root@mhvtl media]# cat /etc/multipath.conf
 defaults {
 udev_dir                /dev
 polling_interval        10
 path_selector           "round-robin 0"
 path_grouping_policy    multibus
 path_checker            readsector0
 rr_min_io               100
 max_fds                 8192
 rr_weight               priorities
 failback                immediate
 no_path_retry           fail
 user_friendly_names     yes
 }

Note that

     no_path_retry    fail

is required for any kind of multipathing to work. Without this setting I/Os on failed paths end up being queued causing the whole device to become unresponsible.

iSCSI: Connecting to the Target

Step #1: Discover iSCSI Targets

[root@mhvtl iscsi]# iscsiadm -m discovery -t sendtargets -p  192.168.10.1
 Starting iscsid:                                           [  OK  ]
 192.168.10.1:3260,2 iqn.1986-03.com.sun:02:f1bb8f6d-b6d2-c3f7-cfb9-ff56689144d2
 192.168.1.5:3260,3 iqn.1986-03.com.sun:02:f1bb8f6d-b6d2-c3f7-cfb9-ff56689144d2

Step #2: Login to the Targets

Portal #1: 192.168.10.1

[root@mhvtl iscsi]# iscsiadm -m node --target iqn.1986-03.com.sun:02:f1bb8f6d-b6d2-c3f7-cfb9-ff56689144d2
--portal 192.168.10.1 --login

Logging in to [iface: default, target: iqn.1986-03.com.sun:02:f1bb8f6d-b6d2-c3f7-cfb9-ff56689144d2,
portal: 192.168.10.1,3260] (multiple)

Login to [iface: default, target: iqn.1986-03.com.sun:02:f1bb8f6d-b6d2-c3f7-cfb9-ff56689144d2,
portal: 192.168.10.1,3260] successful.

Portal #2: 192.168.1.5

[root@mhvtl iscsi]# iscsiadm -m node --target iqn.1986-03.com.sun:02:f1bb8f6d-b6d2-c3f7-cfb9-ff56689144d2
--portal 192.168.1.5 --login

Logging in to [iface: default, target: iqn.1986-03.com.sun:02:f1bb8f6d-b6d2-c3f7-cfb9-ff56689144d2,
portal: 192.168.1.5,3260] (multiple)

Login to [iface: default, target: iqn.1986-03.com.sun:02:f1bb8f6d-b6d2-c3f7-cfb9-ff56689144d2,
portal: 192.168.1.5,3260] successful.

Status query

[root@mhvtl iscsi]# iscsiadm --mode node
 192.168.1.5:3260,3 iqn.1986-03.com.sun:02:f1bb8f6d-b6d2-c3f7-cfb9-ff56689144d2
 192.168.10.1:3260,2 iqn.1986-03.com.sun:02:f1bb8f6d-b6d2-c3f7-cfb9-ff56689144d2

/var/log/messages

Connection to portal 192.168.10.1:

 Jun 24 18:03:51 mhvtl kernel: scsi6 : iSCSI Initiator over TCP/IP
 Jun 24 18:03:51 mhvtl kernel: scsi 6:0:0:0: Direct-Access     NEXENTA  NEXENTASTOR      1.0  PQ: 0 ANSI: 5
 Jun 24 18:03:51 mhvtl kernel: sd 6:0:0:0: Attached scsi generic sg3 type 0
 Jun 24 18:03:51 mhvtl kernel: sd 6:0:0:0: [sdc] 10485760 512-byte logical blocks: (5.36 GB/5.00 GiB)
 Jun 24 18:03:51 mhvtl kernel: sd 6:0:0:0: [sdc] Write Protect is off
 Jun 24 18:03:51 mhvtl kernel: sd 6:0:0:0: [sdc] Write cache: enabled, read cache: enabled,
 doesn't support DPO or FUA
 Jun 24 18:03:51 mhvtl kernel: sdc: detected capacity change from 0 to 5368709120
 Jun 24 18:03:51 mhvtl kernel: sdc: sdc1
 Jun 24 18:03:51 mhvtl kernel: sd 6:0:0:0: [sdc] Attached SCSI disk
 Jun 24 18:03:51 mhvtl multipathd: sdc: add path (uevent)
 Jun 24 18:03:51 mhvtl kernel: device-mapper: multipath round-robin: version 1.0.0 loaded
 Jun 24 18:03:51 mhvtl multipathd: mpatha: load table [0 10485760 multipath 0 0 1 1 round-robin 0 1 1 8:32 1]
 Jun 24 18:03:51 mhvtl multipathd: mpatha: event checker started
 Jun 24 18:03:51 mhvtl multipathd: sdc path added to devmap mpatha
 Jun 24 18:03:52 mhvtl iscsid: Connection4:0 to [target: iqn.1986-03.com.sun:02:f1bb8f6d-b6d2-c3f7-cfb9-
 ff56689144d2, portal: 192.168.10.1,3260] through [iface: default] is operational now

Connection to portal 192.168.1.5:

 Jun 24 18:04:01 mhvtl kernel: connection3:0: detected conn error (1020)
 Jun 24 18:04:01 mhvtl iscsid: Connection3:0 to [target: iqn.1986-03.com.sun:02:f1bb8f6d-b6d2-c3f7-cfb9-
 ff56689144d2, portal: 192.168.1.5,3260] through [iface: default] is shutdown.
 Jun 24 18:04:03 mhvtl kernel: scsi7 : iSCSI Initiator over TCP/IP
 Jun 24 18:04:04 mhvtl kernel: scsi 7:0:0:0: Direct-Access     NEXENTA  NEXENTASTOR      1.0  PQ: 0 ANSI: 5
 Jun 24 18:04:04 mhvtl kernel: sd 7:0:0:0: Attached scsi generic sg4 type 0
 Jun 24 18:04:04 mhvtl kernel: sd 7:0:0:0: [sdd] 10485760 512-byte logical blocks: (5.36 GB/5.00 GiB)
 Jun 24 18:04:04 mhvtl kernel: sd 7:0:0:0: [sdd] Write Protect is off
 Jun 24 18:04:04 mhvtl kernel: sd 7:0:0:0: [sdd] Write cache: enabled, read cache: enabled,
 doesn't support DPO or FUA
 Jun 24 18:04:04 mhvtl kernel: sdd: detected capacity change from 0 to 5368709120
 Jun 24 18:04:04 mhvtl kernel: sdd: sdd1
 Jun 24 18:04:04 mhvtl kernel: sd 7:0:0:0: [sdd] Attached SCSI disk
 Jun 24 18:04:04 mhvtl multipathd: sdd: add path (uevent)
 Jun 24 18:04:04 mhvtl multipathd: mpatha: load table [0 10485760 multipath 0 0 1 1 round-robin 0 2 1 8:32 1 8:48 1]
 Jun 24 18:04:04 mhvtl multipathd: sdd path added to devmap mpatha
 Jun 24 18:04:04 mhvtl iscsid: Connection5:0 to [target: iqn.1986-03.com.sun:02:f1bb8f6d-b6d2-c3f7-cfb9-
 ff56689144d2, portal: 192.168.1.5,3260] through [iface: default] is operational now

Multipath Status

[root@mhvtl iscsi]# multipath -ll
 mpatha (3600144f02ba6460000004fe7032f0001) dm-1 NEXENTA,NEXENTASTOR
 size=5.0G features='0' hwhandler='0' wp=rw
 `-+- policy='round-robin 0' prio=1 status=active
 |- 6:0:0:0 sdc 8:32 active ready running
 `- 7:0:0:0 sdd 8:48 active ready running

Testing Multipath

After setting up and configuring multipathing it is highly recommended to test the behavior if paths fail. To do a simple test i stopped the interface “eth3” which leaves one remaining interface:

ifconfig eth3 down

/var/log/messages shows:

 Jun 24 18:05:34 mhvtl kernel: connection4:0: ping timeout of 5 secs expired, recv
 timeout 5, last rx 4299040130, last ping 4299045130, now 4299050130
 Jun 24 18:05:34 mhvtl kernel: connection4:0: detected conn error (1011)
 Jun 24 18:05:35 mhvtl multipathd: mpatha: sdc - readsector0 checker reports path is down
 Jun 24 18:05:35 mhvtl multipathd: checker failed path 8:32 in map mpatha
 Jun 24 18:05:35 mhvtl multipathd: mpatha: remaining active paths: 1
 Jun 24 18:05:35 mhvtl kernel: device-mapper: multipath: Failing path 8:32.
 Jun 24 18:05:35 mhvtl iscsid: Kernel reported iSCSI connection 4:0 error (1011) state (3)

As you can see multipathd recognized the failing path and stopped I/O using the underlying device (in our test “/dev/sdc”):

[root@mhvtl media]# multipath -ll
 mpatha (3600144f02ba6460000004fe7032f0001) dm-1 NEXENTA,NEXENTASTOR
 size=5.0G features='0' hwhandler='0' wp=rw
 `-+- policy='round-robin 0' prio=1 status=active
 |- 6:0:0:0 sdc 8:32 failed faulty running
 `- 7:0:0:0 sdd 8:48 active ready  running

In our test there was a small I/O stall of approx 5 seconds after which everything continued to function normally. When re-enabling network interface eth3 multipathd automatically repaired the faulty path.