Softpanorama
(slightly skeptical) Open Source Software Educational Society

May the source be with you, but remember the KISS principle ;-)

Softpanorama Search

Linux multipath

News

Recommended Books

Recommended Links

HOW-TOs

Tutorials

Man Pages

Reference

FAQ
Linux Multipath Logical Volume Manager (LVM)         Humor Etc

The most common multipathed environment today is a Fibre Channel (FC) Storage Area Network (SAN). This beasts can be found in most Datacenters.

The multipathing layer sits above the protocols (FCP or iSCSI), and determines whether or not the devices discovered on the target, represent separate devices or whether they are just separate paths to the same device. In this case, Device Mapper (DM) is the multipathing layer for Linux.

To determine which SCSI devices/paths correspond to the same LUN, the DM initiates a SCSI Inquiry. The inquiry response, among other things, carries the LUN serial number. Regardless of the number paths a LUN is associated with, the serial number for the LUN will always be the same. This is how multipathing SW determines which and how many paths are associated with each LUN.

Novell has a useful chapter about multipathing Managing Multipath I/O for Devices

This section describes how to manage failover and path load balancing for multiple paths between the servers and block storage devices.


Notes:
  • This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Some amount of grammar and spelling errors should be expected.
  • The site contain some broken links as it develops like a living tree... Please try to use Google, Open directory, etc. to find a replacement link (see HOWTO search the WEB for details). We would appreciate if you can mail us a correct link.
Google Search
Open directory

Research Index


Old News ;-)

QLogic View Thread

gurus

I have the total opposite problem of what most folks are describing here.
I have a PE8650 with 2 QLE2460's.

One does exactly what I want but the other acts weird.

It shows 26 devices although I only have 25 devices masked to this wwpn. The extra one has a * on it which means the OS doesn't see it. The OS never discovers it either yet the hba insists its there.
Here the details.

1. Relevant output of /proc/scsi/qla2xxx/2
before zoning:

scsi-qla1-adapter-port=2100001b321734f0;

FC Port Information:

SCSI LUN Information:
(Id:Lun)  * - indicates lun is not registered with the OS.

As you can see there is not visible target and therefore no devices.

on the dmx I have a masking entry for 25 devices for this wwpn (6 GK and 19 real devices)

2100001b321734f0  Fibre  JOLIE_QLE_P2     2100001b321734f0  005E:0063
                                                            0070
                                                            0098
                                                            009C
                                                            00A0
                                                            00F1
                                                            012B
                                                            0155
                                                            0169
                                                            016D
                                                            0175
                                                            0179
                                                            0181
                                                            0185
                                                            0199
                                                            019D
                                                            01A7
                                                            01BD
                                                            01C1
                                                            01C7

When I create the zone and activate it /proc/scsi/qla2xxx/2 immediately shows 26 devices :

FC Port Information:
scsi-qla1-port-0=5006048ad5f0c350:5006048ad5f0c350:010000:81;

SCSI LUN Information:
(Id:Lun)  * - indicates lun is not registered with the OS.
( 0: 0): Total reqs 0, Pending reqs 0, flags 0x0*, 1:0:81 00
( 0:25): Total reqs 0, Pending reqs 0, flags 0x0*, 1:0:81 00
( 0:26): Total reqs 0, Pending reqs 0, flags 0x0*, 1:0:81 00
( 0:27): Total reqs 0, Pending reqs 0, flags 0x0*, 1:0:81 00
( 0:28): Total reqs 0, Pending reqs 0, flags 0x0*, 1:0:81 00
( 0:29): Total reqs 0, Pending reqs 0, flags 0x0*, 1:0:81 00
( 0:30): Total reqs 0, Pending reqs 0, flags 0x0*, 1:0:81 00
( 0:31): Total reqs 0, Pending reqs 0, flags 0x0*, 1:0:81 00
( 0:32): Total reqs 0, Pending reqs 0, flags 0x0*, 1:0:81 00
( 0:33): Total reqs 0, Pending reqs 0, flags 0x0*, 1:0:81 00
( 0:34): Total reqs 0, Pending reqs 0, flags 0x0*, 1:0:81 00
( 0:35): Total reqs 0, Pending reqs 0, flags 0x0*, 1:0:81 00
( 0:36): Total reqs 0, Pending reqs 0, flags 0x0*, 1:0:81 00
( 0:37): Total reqs 0, Pending reqs 0, flags 0x0*, 1:0:81 00
( 0:38): Total reqs 0, Pending reqs 0, flags 0x0*, 1:0:81 00
( 0:39): Total reqs 0, Pending reqs 0, flags 0x0*, 1:0:81 00
( 0:40): Total reqs 0, Pending reqs 0, flags 0x0*, 1:0:81 00
( 0:41): Total reqs 0, Pending reqs 0, flags 0x0*, 1:0:81 00
( 0:42): Total reqs 0, Pending reqs 0, flags 0x0*, 1:0:81 00
( 0:43): Total reqs 0, Pending reqs 0, flags 0x0*, 1:0:81 00
( 0:44): Total reqs 0, Pending reqs 0, flags 0x0*, 1:0:81 00
( 0:45): Total reqs 0, Pending reqs 0, flags 0x0*, 1:0:81 00
( 0:46): Total reqs 0, Pending reqs 0, flags 0x0*, 1:0:81 00
( 0:47): Total reqs 0, Pending reqs 0, flags 0x0*, 1:0:81 00
( 0:48): Total reqs 0, Pending reqs 0, flags 0x0*, 1:0:81 00
( 0:49): Total reqs 0, Pending reqs 0, flags 0x0*, 1:0:81 00

At this point they're all not registered with the OS.
First weird this is how can I see 26 but the worst thing is that after rebooting the box the output looks like that :

FC Port Information:
scsi-qla1-port-0=5006048ad5f0c350:5006048ad5f0c350:010000:81;

SCSI LUN Information:
(Id:Lun)  * - indicates lun is not registered with the OS.
( 0: 0): Total reqs 3, Pending reqs 0, flags 0x0*, 1:0:81 00
( 0:25): Total reqs 278, Pending reqs 0, flags 0x0, 1:0:81 00
( 0:26): Total reqs 102, Pending reqs 0, flags 0x0, 1:0:81 00
( 0:27): Total reqs 102, Pending reqs 0, flags 0x0, 1:0:81 00
( 0:28): Total reqs 102, Pending reqs 0, flags 0x0, 1:0:81 00
( 0:29): Total reqs 102, Pending reqs 0, flags 0x0, 1:0:81 00
( 0:30): Total reqs 102, Pending reqs 0, flags 0x0, 1:0:81 00
( 0:31): Total reqs 181, Pending reqs 0, flags 0x0, 1:0:81 00
( 0:32): Total reqs 170, Pending reqs 0, flags 0x0, 1:0:81 00
( 0:33): Total reqs 170, Pending reqs 0, flags 0x0, 1:0:81 00
( 0:34): Total reqs 181, Pending reqs 0, flags 0x0, 1:0:81 00
( 0:35): Total reqs 181, Pending reqs 0, flags 0x0, 1:0:81 00
( 0:36): Total reqs 170, Pending reqs 0, flags 0x0, 1:0:81 00
( 0:37): Total reqs 181, Pending reqs 0, flags 0x0, 1:0:81 00
( 0:38): Total reqs 170, Pending reqs 0, flags 0x0, 1:0:81 00
( 0:39): Total reqs 159, Pending reqs 0, flags 0x0, 1:0:81 00
( 0:40): Total reqs 170, Pending reqs 0, flags 0x0, 1:0:81 00
( 0:41): Total reqs 167, Pending reqs 0, flags 0x0, 1:0:81 00
( 0:42): Total reqs 178, Pending reqs 0, flags 0x0, 1:0:81 00
( 0:43): Total reqs 167, Pending reqs 0, flags 0x0, 1:0:81 00
( 0:44): Total reqs 178, Pending reqs 0, flags 0x0, 1:0:81 00
( 0:45): Total reqs 167, Pending reqs 0, flags 0x0, 1:0:81 00
( 0:46): Total reqs 178, Pending reqs 0, flags 0x0, 1:0:81 00
( 0:47): Total reqs 185, Pending reqs 0, flags 0x0, 1:0:81 00
( 0:48): Total reqs 5817, Pending reqs 0, flags 0x0, 1:0:81 00
( 0:49): Total reqs 241, Pending reqs 0, flags 0x0, 1:0:81 00

As you can see 25 devices are there plus one extra one.
Problem is simply what is this, why do I see it and why does the OS not see it then ?
The other qle card on the same host only sees 25 devices and has no * just like desired.

On closer inspection I realize that this is lun0 which should only be visible via QLE port 1 but it seems stuck in QLE port 2 also. Almost as if the scsi layer still thinks it's there when it isn't anymore. But should a reboot fix this since udev will kick in ?
According to symcli lun 0 and lun 19 are the same device (gatekeeper 005E). Something seems double visible but I can't see where.
PP is happy too and sees the 25 device I want it to see.

Has anyone seen this ?

Storage August 27, 2006

Over the past couple of years a flurry of OS Native multipathing solutions have become available. As a result we are seeing a trend towards these solutions and away from vendor specific multipathing software.

The latest OS Native multipathing solution is Device Mappper-Multipath (DM-Multipath) available with Red Hat Enterprise Linux 4.0 U2 and SuSE SLES 9.0 PS2.

I had the opportunity to configure it in my lab a couple of days ago and I was pleasantly surprised as to how easy was to configure it. Before I show how it's done, let me talk a little about how it works.

The multipathing layer sits above the protocols (FCP or iSCSI), and determines whether or not the devices discovered on the target, represent separate devices or whether they are just separate paths to the same device. In this case, Device Mapper (DM) is the multipathing layer for Linux.

To determine which SCSI devices/paths correspond to the same LUN, the DM initiates a SCSI Inquiry. The inquiry response, among other things, carries the LUN serial number. Regardless of the number paths a LUN is associated with, the serial number for the LUN will always be the same. This is how multipathing SW determines which and how many paths are associated with each LUN.

Before you get started you want to make a sure the following things are loaded:

 

 

Make a copy of the /etc/multipath.conf file. Edit the original file and make sure you only have the following entries uncommented out. If you don't have Netapp the section then add it.

defaults {
user_friendly_names yes
}
#
devnode_blacklist {
devnode "sd[a-b]$"
devnode "^(ramrawloopfdmddm-srscdst)[0-9]*"
devnode "^hd[a-z]"
devnode "^cciss!c[0-9]d[0-9]*"
}


devices {
device {
vendor "NETAPP "
product "LUN"
path_grouping_policy group_by_prio
getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
prio_callout"/opt/netapp/santools/mpath_prio_ontap /dev/n"
features "1 queue_if_no_path"
path_checker readsector0
failback immediate
}
}

The devnode_blacklist includes devices for which you do not want multipathing enabled. So if you have a couple of local SCSI drives (i.e sda and sdb) the first entry in the blacklist will exclude them. Same for IDE drives (hd).

Add the multipath service to the boot sequence by entering the following:

chkconfig --add multipathd
chkconfig multipathd on


Multipathing on Linux is Active/Active with a Round-Robin algorithm.


The path_grouping_policy is group_by_prio. It assigns paths into Path Groups based on path priority values. Each path is given a priority (high value = high priority) based on a callout program written by Netapp Engineering (part of the FCP Linux Host Utilities 3.0).

The priority values for each path in a Path Group are summed and you obtain a group priority value. The paths belonging to the Path Group with the higher priority value are used for I/O.

If a path fails, the value of the failed path is subtracted from the Path Group priority value. If the Path Group priority value is still higher than the values of the other Path Groups, I/O will continue within that Path Group. If not, I/O will switch to the Path Group with highest priority.

Create and map some LUNs to the host. If you are using the latest Qlogic or Emulex drivers, then run the respective utilities they provide to discover the LUN:

To view a list of multipathed devices:

# multipath -d -l

[root@rhel-a ~]# multipath -l

360a9800043346461436f373279574b53
[size=5 GB][features="1 queue_if_no_path"][hwhandler="0"]
\_ round-robin 0 [active] \
\_ 2:0:0:0 sdc 8:32 [active]
\_ 3:0:0:0 sde 8:64 [active]
\_ round-robin 0 [enabled]
\_ 2:0:1:0 sdd 8:48 [active]
\_ 3:0:1:0 sdf 8:80 [active]

The above shows 1 LUN with 4 paths. Done. It's that easy to set up. 

Posted by Nick Triantos at 10:19 AM  

Labels:  

30 comments:

Bernard said...
Nice blog on linux native multipathing _ I keep reading your blog regularly & follow all update postings. please post more SAN or storage infrastructure information and all the troubleshooting involved.
Please post more guys we have been reading your posts regularly. Its very informative & I thank you for your time.

http://storage-jobs.blogspot.com storage area network or SAN jobs
Anonymous said...
Do you know if multipath and FCP utility are available for RHEL3?
Nick Triantos said...
Device-Mapper multipath is only available with the RHEL 4.0 U2, SuSE 9 SP2 or above.

RHEL3 uses mdadm (multiple device administration driver) for path failover. It's active/passive and personaly i've found the setup process prone to error.

mdadm creates a multipath device accessible as /dev/md[x] using partition 1 on each of the LUNs (i.e /dev/sda /dev/sdb). However, it doesn't check to see if they are paths to the same LUN or partition. Very easy to mess it up. Last I had played with it, it supported automatic failover but the failback was a manual process.

If you are running RHEL 3 with Qlogic cards then you may want to use the Qlogic driver which provides failover/failback capability.

While the Qlogic driver does not provide I/O load balancing for the same LUN across 2 different HBA host ports, it does give you the ability to balance the LUNs across host ports. In this scenario, using Qlogic's SANSurfer you'd assign a "Preferred Paths" to each of your LUNs.
Anonymous said...
How do you get multiple paths into a path group? We've got 2 HDS LUns presented to 2 HBAs. 'multipath -l' looks like this:

mpath1 (1HITACHI_R4509C66106C)
[size=339 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
\_ 1:0:0:1 sdc 8:32 [active][ready]
\_ round-robin 0 [enabled]
\_ 2:0:0:1 sde 8:64 [active][ready]

mpath0 (1HITACHI_R4509C661F15)
[size=13 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
\_ 1:0:0:0 sdb 8:16 [active][ready]
\_ round-robin 0 [enabled]
\_ 2:0:0:0 sdd 8:48 [active][ready]


with each path in its own path group, we're in active/standby for each LUN, when we want to be active/active and make use of the round-robin load balancing.
Nick Triantos said...
Hi, in order to get them into different Path Groups you have to use the group_by_prio path grouping policy. Then you have to have a callout program which provided by the vendor (HDS in this case) that assigns pririty values to each path.

You need to look at the /etc/multipath.conf and look at your path grouping policy. There are 2 policies:

1) group_by_prio which we discussed.

2) The default path group policy is "multibus". The algorithm used in this case is simple. All the paths available for a lun are grouped into one single group. This group will be active all the time. The io is started on the first path in the group.

After "rr_min_io" number of io's are completed on that path, io is switched to the next available path in the group in a round robin method. If one of the paths fail, that path is excluded for selection for io. That is io is round robin'd on remaining paths, if you have more than 2 per LUN. If the "failback" parameter is set to "immediate" in /etc/multipath.conf, the path is added into group for selection for io, as soon as it becomes available.

The rr_min_io setting for dm-multipath specifies the number of I/Os sent through a path before switching to the next path.

Lowering this value from the default value of 1000 has been seen to dramatically affect overall throughput for dm-multipath, especially for large I/O workloads (64 KB).

The rr_min_io value can only be set by changing the defaults section in /etc/multipath.conf. You cannot change it in the devices section. You must put a
section such as the following at the very top of the /etc/multipath.conf file:
defaults {
rr_min_io 128
}

Here's what the /etc/multipath.conf for a Netapp box would look like using multibus grouping policy:

devices {
device {
vendor "NETAPP"
product "LUN"
path_grouping_policy multibus
features "1 queue_if_no_path"
path_checker readsector0
failback immediate
}
}

The above will round-robin I/O across all available paths in a single Path Group.

The "1 queue_if_no_path" entry enables I/O queuing in case all paths to a device are lost.
Eyal Traitel said...
I want to note that it's possible to follow the same procedure with CentOS - but I have encountered some package name inconsistencies.
I have it documented it in here:
http://filers.blogspot.com/2006/08/configuring-fcp-multipathing-in-redhat.html

Eyal Traitel
http://filers.blogspot.com
http://stupidstorage.blogspot.com
Anonymous said...
Hi Nick,

I'd like to add multipathd to the boot sequence, but when I run chkconfig --add multipathd, I see the following: multipathd: unknown service. I installed multipath-tools-0.4.7. Any ideas why this command is returning with unknown service? Any help is greatly appreciated.

Thanx!
Anonymous said...
Thank you for all the information about DM Multipath for Linux.

I read through the comment for Hitachi. I have a similar question.

I have 2 HBAs, 2 array ports for my HITACHI Tagmastore array. Is there a way I can set a "preferred" path for I/O or redirect I/O from one path to another?

multipath output :
------------------
mpath0 (360060e800427dd00000027dd00003f80)
[size=500 MB][features="0"][hwhandler="0"]
\_ round-robin 0 [prio=1][active]
\_ 2:0:0:0 sdc 8:32 [active][ready]
\_ round-robin 0 [prio=1][enabled]
\_ 3:0:0:0 sde 8:64 [active][ready]

iostat displays that the I/O is routed through the path "sdc". Is there a way to route it through "sde" ? And I want to be able to do this without any I/O disruption.

extended device statistics
device mgr/s mgw/s r/s w/s kr/s kw/s size queue wait svc_t %b
sdc 0 4625 0.0 74.4 0.0 4700.9 63.2 36.6 492.9 10.9 81
sde 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0
dm-2 0 0 0.0 4700.9 0.0 4700.9 1.0 2364.9 503.1 0.2 81
dm-4 0 0 0.0 4700.9 0.0 4700.9 1.0 2364.9 503.1 0.2 81

Also why do I see I/O on dm-2 and dm-4?

Thanks, really appreciate the help.
Nick Triantos said...
Response to the multipathd:uknown service...

Check to see if the device-mapper-multipath package is loaded. Most likely it's not. I know that when I was installing RHEL im my lab by default the rpm was not loaded. Also check for device mapper.

For RHEL
# rpm -q device-mapper
# rpm -q device-mapper-multipath

If you're on SLES:

# rpm -q device-mapper
Anonymous said...
Device-mapper appears to be loaded :

[root@dot4 dev]# rpm -q device-mapper
device-mapper-1.02.07-4.0.RHEL4
[root@dot4 dev]# rpm -q device-mapper-multipath
device-mapper-multipath-0.4.5-16.1.RHEL4
[root@dot4 dev]#
Nick Triantos said...
What kind of Tagma do you have?

Is it a USP Tagma or is it an AMS Tagma?

HDS provides a callout prioritizer for the AMS series, "/sbin/pp_hds_modular" but i don't which distro is part of.

If you have a USP then the policy should be multibus.

Regardless of what you have (AMS or USP) here's HDS section. Whichever policy is used is dependent on the "product ID" (OPEN-V/DF600F etc)

devices {
device {
vendor "HITACHI"
product "DF600F"
path_grouping_policy group_by_prio
prio_callout "/sbin/pp_hds_modular %d"
path_checker readsector0
getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
}
device {
vendor "HITACHI"
product "DF600F-CM"
path_grouping_policy group_by_prio
prio_callout "/sbin/pp_hds_modular %d"
path_checker readsector0
getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
}
device {
vendor "HITACHI"
product "OPEN-V"
path_grouping_policy multibus
path_checker readsector0
getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
}
device {
vendor "HITACHI"
product "OPEN-V-CM"
path_grouping_policy multibus
path_checker readsector0
getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
}
}
Anonymous said...
Thanks Nick,

I have Hitachi USP 100. I don't want to use "multibus" since I am testing one of our company's product which specifically requires me to turn off load balancing/round-robin. So I intend to use "failover" policy even though I have Hitachi USP 100.

Also the other main question I had was, how would I re-direct I/O to a specific path? When I am testing my product, I will be presenting more paths to the USP 100 LUN and I want to be directing I/O to the newly presented path.

Basically I need I/O routed through only one path and I want to be able to choose which path I can route I/O through.

Thanks again for the help.
fileX said...
Does anyone have any experience on Multipath with OpenFiler either in SUSE/RHEL?
Anonymous said...
Today I tested your recipe and found some small errors: the line in multipath.conf should be
prio_callout "/opt/netapp/santools/mpath_prio_ontap /dev/%n"
and to get multipath devices I had to run multipath without -l.

m.schlett@fzd.de
Maurice said...
Hi Nick,

Excellent blog, good info. A quick question if I may. Have you any experience with IBM DS4700 and dm-multipath? IBM keeps telling me to use RDAC but I can't get SLES 9 SP3, boot from SAN and RDAC working reliably. Novell provided good info on setting up dm-multipath but I don't recall reading anything about callouts. Any insight is most appreciated.
Nick Triantos said...
Hi, thanks for the kind words.

IBM's nudging you towards RDAC because DM-Multipath on SLES 9 or RHEL 4.0 does not support SAN Boot. In fact it is problematic.

This is addressed in the RHEL 5.0 and SLES 10 even though I have not done any testing with those.

The multipath-tools package supports pretty much every vendor. IBM should have several entries in there some of which use a callout program and some don't depending on the array.
Anonymous said...
Hi Nick,
very good blog and useful info.

We have many servers with no internal disk at all, only SAN-disks(USP600) and booting into the SAN.
How should I do configure Multipath?
I know multipath does not support system/root-disk but I need multipath for all other LUNs.
Some parameter is common for all LUNs, for example ql2xfailover=0(which is needed) in modprobe.conf disables failover for all luns, even the root disk.

Thanks for your help
/David
david_at_dahey_dot_com
Nick Triantos said...
David,

Like you said, DM-Multipath does not support SAN booting right now. However, one route you could take is use Qlogic HBAs and deploy the qlogic driver. The parameter you mentioned is a Qlogic driver parameter (0=disable failover 1=enable). Supports both boot and non-boot LUNs and works well. While it doesn't provide any dynamic I/O load balancing for a given LUN, you can manually balance the LUNs across a set of host HBAs.
ethan.john said...
Basically I need I/O routed through only one path and I want to be able to choose which path I can route I/O through.
I've been working toward this end extensively for a while. The best method I've found is pretty nasty:

- Use a custom script as the prio_callout, so that you can make the script return whatever values you need for path priorities.
- Set rr_min_io to something very large -- 2 billion or so.

Linux multipathing doesn't currently allow you to set custom path priorities.
Anonymous said...
We have been testing Linux DM in both RedHat and SUSE against our storage with both QLogic and Emulex HBAs. All scenarios work except for SUSE 9 SP3 with Emulex HBAs.

In the SUSE 9 SP3 w/ Emulex environment, DM successfully fails I/O over on an Active/Active service (in this test, a cable-pull is performed). Replugging the cable back in, the I/O does not failback. We can only bring the paths back to an A/A state by entering a "multipath -v2" command.

With a QLogic HBA, or running I/O in RedHat, we always see:
"tur checker reports path is up
.. multipathd: 66:240: reinstated
.. mpathA6: remaining active paths: 2"

Are there any unique setting with Emulex in SUSE that must be set for DM to work automatically?
Nick Triantos said...
Hi,

You're not the only one that has seen this issue with Emulex. Somebody told me that it happens only with LUNs with an ID of 0 but haven't tried it.
Anonymous said...
Hello!
Can I use such kind of multipathing, if I boot blade servers from external storage controller? So I need multiple paths to root partition.
Nick Triantos said...
Hi,

If you are using it for RHEL 4 you can't use dm-multipath for SAN booting. However, I have been told that RHEL 5 U1 will supports this.
Anonymous said...
As others have said, this is great info...do you have any experience with RHEL4 on IBM's power5 (p-Series)? I know dm-multipath doesn't support SAN boot, but in power5 thru a VIO server the LUNs are presented to the client linux LPAR as vscsi. I still haven't been able to get the boot partitions to work with dm-multipath tho non-boot partitions work fine with this method. Any suggestions? mdadm? If I can get this to work, we'll have redundancy and SAN boot via the VIO servers. Any help is appreciated.
Nick Triantos said...
Hi,

I have not played with VIO to set up Linux MP using Device-Mapper. Having said that, I'm not so sure this would alter anything from a san booting perspective regardless of the nature of whether the LUN is presented as a vscsi device.
The issue with device-mapper and san boot support is not dependent upon HW and capability (i.e virtualization).

As far as mdadm, you'd have the same challenge. You have to create software based RAID partitions and then you'd configure mdadm for multiple paths. This works for any partition other than /boot. Tied it long time ago and failed miserably but that was with RHEL 3.

If you'd like mdadm instructions you can find them here:

http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/en-US/rhel-ig-s390-multi-en-4/s2-s390info-raid.html
Anonymous said...
There are new doc on RHEL multipath:
http://www.redhat.com/docs/manuals/enterprise/RHEL-5-manual/en-US/RHEL510/DM_Multipath/index.html

Also - multipath boot from SAN is supported on RHEL 5.1
Paul said...
Does the default example you post send i/o through the Netapp interconnect cables as well or does it just send i/o to the controller that owns the lun? Does the Netapp group policy automatically know which path to send the i/o so that they do not cross the interconnect on filer?
Nick Triantos said...
No I/O is sent thru the Interconnect. Although, you have 4 paths per LUN, you round-robin ONLY thru the 2 Direct paths to the controller who owns them.

Cheers
Anonymous said...
Great blog! nice content, good responses.

I have a compellent san connected to sles 10 via both iscsi and FC. Currently the fc and iscsi paths are in a single group with no priority. I'm trying to find a way to give greater priority to the fc path and use the iscsi path for failover only. Looks like the customer script for prio_callout. Anyone know what it would look like,
BCEgg said...
Hi Nick - I'm a little confused about the whole Multipathing configuration. Do I use fdisk to create the partitions or kpartx?

Thanks in advance.

 

SuSE mailinglist RE [suse-sles-e] Adding Disk on the fly

Date: Tue, 23 May 2006 14:57:52 +0200
Message-ID: <CA1361E9F77A4243A99E04D98F5CC724023E8E9C@ZARDPEXCH001.medscheme.com>
From: "Stephen Hughes" <stephenh@Medscheme.co.za>
Subject: RE: [suse-sles-e] Adding Disk on the fly

Hi Group,
 

Thanks for all your help with this matter. I managed to use the
rescan-scsi-bus.sh command on one of my servers to add a SAN attached
disk, but now I've assigned more disk to another server I have running
SLES9. I run the rescan-scsi-bus.sh command with the various switches
but it still does not pick up my new disk. Below id the output from my
lsscsi command as well as the command I ran to try and pick up the disk.
The new disk according to my Navisphere client should come in after
/dev/sdbp.
I also looked at some of the other replies but I don't have the rescan
file to echo a "1" to as suggested in one of the replies:
"# echo 1 > /sys/bus/scsi/devices/0:0:0:0/rescan"
 

mamba:/usr/local/bin # lsscsi
[0:0:6:0] process DELL 1x4 U2W SCSI BP 1.27 -
[0:2:0:0] disk MegaRAID LD0 RAID0 69356R 161J /dev/sda
[1:0:0:0] disk EMC SYMMETRIX 5671 /dev/sdb
[1:0:0:1] disk EMC SYMMETRIX 5671 /dev/sdc
[1:0:0:2] disk EMC SYMMETRIX 5671 /dev/sdd
[1:0:0:3] disk EMC SYMMETRIX 5671 /dev/sde
[1:0:0:4] disk EMC SYMMETRIX 5671 /dev/sdf
[1:0:0:5] disk EMC SYMMETRIX 5671 /dev/sdg
[1:0:0:6] disk EMC SYMMETRIX 5671 /dev/sdh
[1:0:0:7] disk EMC SYMMETRIX 5671 /dev/sdi
[1:0:0:8] disk EMC SYMMETRIX 5671 /dev/sdj
[1:0:0:9] disk EMC SYMMETRIX 5671 /dev/sdk
[1:0:0:10] disk EMC SYMMETRIX 5671 /dev/sdl
[1:0:0:11] disk EMC SYMMETRIX 5671 /dev/sdm
[1:0:0:12] disk EMC SYMMETRIX 5671 /dev/sdn
[1:0:0:13] disk EMC SYMMETRIX 5671 /dev/sdo
[1:0:0:14] disk EMC SYMMETRIX 5671 /dev/sdp
[1:0:0:15] disk EMC SYMMETRIX 5671 /dev/sdq
[1:0:0:16] disk EMC SYMMETRIX 5671 /dev/sdr
[1:0:0:17] disk EMC SYMMETRIX 5671 /dev/sds
[1:0:0:18] disk EMC SYMMETRIX 5671 /dev/sdt
[1:0:0:19] disk EMC SYMMETRIX 5671 /dev/sdu
[2:0:0:0] disk EMC SYMMETRIX 5671 /dev/sdv
[2:0:0:1] disk EMC SYMMETRIX 5671 /dev/sdw
[2:0:0:2] disk EMC SYMMETRIX 5671 /dev/sdx
[2:0:0:3] disk EMC SYMMETRIX 5671 /dev/sdy
[2:0:0:4] disk EMC SYMMETRIX 5671 /dev/sdz
[2:0:0:5] disk EMC SYMMETRIX 5671 /dev/sdaa
[2:0:0:6] disk EMC SYMMETRIX 5671 /dev/sdab
[2:0:0:7] disk EMC SYMMETRIX 5671 /dev/sdac
[2:0:0:8] disk EMC SYMMETRIX 5671 /dev/sdad
[2:0:0:9] disk EMC SYMMETRIX 5671 /dev/sdae
[2:0:0:10] disk EMC SYMMETRIX 5671 /dev/sdaf
[2:0:0:11] disk EMC SYMMETRIX 5671 /dev/sdag
[2:0:0:12] disk EMC SYMMETRIX 5671 /dev/sdah
[2:0:0:13] disk EMC SYMMETRIX 5671 /dev/sdai
[2:0:0:14] disk EMC SYMMETRIX 5671 /dev/sdaj
[2:0:0:15] disk EMC SYMMETRIX 5671 /dev/sdak
[2:0:0:16] disk EMC SYMMETRIX 5671 /dev/sdal
[2:0:0:17] disk EMC SYMMETRIX 5671 /dev/sdam
[2:0:0:18] disk EMC SYMMETRIX 5671 /dev/sdan
[3:0:0:0] disk DGC LUNZ 0219 /dev/sdao
[3:0:0:28] disk DGC RAID 5 0219 /dev/sdap
[3:0:1:0] tape STK T9940B 1.34 /dev/st0
[3:0:2:0] tape STK T9940B 1.34 /dev/st1
[3:0:3:0] tape SEAGATE ULTRIUM06242-XXX 1613 /dev/st2
[3:0:3:1] tape SEAGATE ULTRIUM06242-XXX 1613 /dev/st3
[3:0:4:0] tape SEAGATE ULTRIUM06242-XXX 1613 /dev/st4
[3:0:4:1] tape SEAGATE ULTRIUM06242-XXX 1536 /dev/st5
[3:0:5:0] disk DGC RAID 5 0219 /dev/sdaq
[3:0:5:1] disk DGC RAID 5 0219 /dev/sdar
[3:0:5:2] disk DGC RAID 5 0219 /dev/sdas
[3:0:5:3] disk DGC RAID 5 0219 /dev/sdat
[3:0:5:4] disk DGC RAID 5 0219 /dev/sdau
[3:0:5:5] disk DGC RAID 5 0219 /dev/sdav
[3:0:5:6] disk DGC RAID 5 0219 /dev/sdaw
[3:0:5:7] disk DGC RAID 5 0219 /dev/sdax
[3:0:5:8] disk DGC RAID 5 0219 /dev/sday
[3:0:5:9] disk DGC RAID 5 0219 /dev/sdaz
[3:0:5:10] disk DGC RAID 5 0219 /dev/sdba
[3:0:5:11] disk DGC RAID 5 0219 /dev/sdbb
[3:0:5:12] disk DGC RAID 5 0219 /dev/sdbc
[3:0:5:13] disk DGC RAID 5 0219 /dev/sdbd
[3:0:5:14] disk DGC RAID 5 0219 /dev/sdbe
[3:0:5:15] disk DGC RAID 5 0219 /dev/sdbf
[3:0:5:16] disk DGC RAID 5 0219 /dev/sdbg
[3:0:5:17] disk DGC RAID 5 0219 /dev/sdbh
[3:0:5:18] disk DGC RAID 5 0219 /dev/sdbi
[3:0:5:19] disk DGC RAID 5 0219 /dev/sdbj
[3:0:5:20] disk DGC RAID 5 0219 /dev/sdbk
[3:0:5:21] disk DGC RAID 5 0219 /dev/sdbl
[3:0:5:22] disk DGC RAID 5 0219 /dev/sdbm
[3:0:5:23] disk DGC RAID 5 0219 /dev/sdbn
[3:0:5:24] disk DGC RAID 5 0219 /dev/sdbo
[3:0:5:25] disk DGC RAID 5 0219 /dev/sdbp
 

Command I executed: (26,27,28,29 because I'm adding 4 LUNS)
mamba:/usr/local/bin # rescan-scsi-bus.sh --hosts=3 --ids=5
--luns=26,27,28,29
 

Thanks
Stephen
 

-----Original Message-----
From: Matt Gillard [mailto:Matt.Gillard@colesmyer.com.au]
Sent: 04 May 2006 08:06 AM
To: Stephen Hughes; Denis Brown; suse-sles-e@suse.com
Subject: RE: [suse-sles-e] Adding Disk on the fly
 

 

/bin/rescan-scsi-bus.sh is what you are after.
 

 

Fun with your SAN and Multi-path » SUSE Linux Enterprise in the Americas

Customers are always looking for ways to get their cost of Linux deployments down lower, and make management easier on their staff. One of, at least in my opinion, the best options they have is to get rid of 3rd party multi path IO solutions for your SAN and disk management.

I was at one of my customers the other day helping them set up MPIO that is built into SLES 10. While I was there I took a few notes for what we did to get things working for their environment. These same instructions should work with other SAN’s that can handle multi path IO.

SLES 10 supports a lot of SAN’s right out of the box and automatically detects them so you don’t really need an /etc/multipath.conf. My customer likes to be able to change the black list for various types of hardware they use and wanted user-friendly names. To do this I created a multipath.conf for them that looked like the following…

## /etc/multipath.conf file for SLES 10
## You may find a full copy of this file, with comments, here..
## /usr/share/doc/packages/multipath-tools/multipath.conf

# Setup user friendly names

# name : user_friendly_names
# scope : multipath
# desc : If set to “yes”, using the bindings file
# /var/lib/multipath/bindings to assign a persistent and
# unique alias to the multipath, in the form of mpath<n>.
# If set to “no” use the WWID as the alias. In either case
# this be will be overriden by any specific aliases in this
# file.
# values : yes|no
# default : no

defaults {
user_friendly_names yes

}

# Setup the blacklisted devices….

# name : blacklist
# scope : multipath & multipathd
# desc : list of device names that are not multipath candidates
# default : cciss, fd, hd, md, dm, sr, scd, st, ram, raw, loop
#

blacklist {
devnode “^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*”
devnode “^hd[a-z][[0-9]*]”
devnode “^cciss!c[0-9]d[0-9]*[p[0-9]*]”}
 

If your curious about what platforms that SLES 10 supports out of the box a list is in the SLES documentation.

Assuming you already have your LUNs assigned to you. And once you have the /etc/multipath.conf file there are a few services that need to be started to make all this work.

# service boot.multipth start
# service multipathd start

That should start the demons and load kernel modules that you need. To check that do an lsmod to see if you see dm_multipath and multipath. Once that is done you can check your setup to see if it is correct…

# multipath –v2 -d

create: mpath10 (360080480000290100601544032363831) EMC,SYMMETRIX

[size=200G][features=0][hwhandler=0]

\_ round-robin 0 [prio=4][undef]

\_ 11:0:0:39 sdbr 68:80 [undef][ready]

\_ 11:0:1:39 sdcc 69:0 [undef][ready]

\_ 10:0:0:39 sdl 8:176 [undef][ready]

\_ 10:0:1:39 sdw 65:96 [undef][ready]

create: mpath11 (360080480000290100601544032363832) EMC,SYMMETRIX

[size=400G][features=0][hwhandler=0]

\_ round-robin 0 [prio=4][undef]

\_ 11:0:0:40 sdbs 68:96 [undef][ready]

\_ 11:0:1:40 sdcd 69:16 [undef][ready]

\_ 10:0:0:40 sdm 8:192 [undef][ready]

\_ 10:0:1:40 sdx 65:112 [undef][ready]

That is what it looks like for the EMC Symmetrix I was working with so your mileage may vary.

Once you have the devices showing up correctly you need to make sure the multi path modules load on reboot. To do that run the following commands…

# chkconfig multipathd on
# chkconfig boot.multipath on

The next thing is to configure LVM to scan these devices so you can use them in your volume groups. To do this you will need to edit /etc/lvm/lvm.conf in the following places…

filter = [ “a|/dev/disk/by-id/.*|”, “r|.*|” ]types = [ “device-mapper”, 253 ]

Above limits the devices that LVM will scan to only devices that show up by-id. If your using LVM to manage other disks that are not in that directory, think local scsi drives, you will need to make sure those are still available by adjusting your filter more like this…

filter = [ “a|/dev/disk/by-id/.*|”, “a|/dev/sda1$/”, “r|.*|” ]

Once that is done do a lvmdiskscan to get LVM to see the new drives.

A few other things that customers often ask for is how to have SLES scan for new LUNs on the san without rebooting. With SLES10 it’s s simple as passing a few parameters to the sys file system.

# echo 1 > /sys/class/fc_host/host<number>/issue_lip

That will make the kernel aware of the new devices at a very low level, but the devices are not yet usable. To make them usable do the following…

# echo “- - -” > /sys/class/scsi_host/host<number>/scan

That will scan all devices and add the new ones for you. All of this information is in the SLES 10 Storage Administration Guide, including various ways to recover from issues.

Also since SP1 SLES10 has been able to boot a mpio device from the SAN. The doc for doing that in SP1 is located here.

Have fun and enjoy..

Last 3 posts by Daniel

Popularity: 27% [?]

 

« SAP Business All-In-One to ship preconfigured on SLES and HP | Managing your iPod with RhythmBox & Linux »

 

2 Responses to “ Fun with your SAN and Multi-path ”

Comments:

  1. kkhenson says:

    I would just add, for IBM DS8000, DS6000, ESS, or SVC 4.2 disk systems, you need to use the multipath.conf file located in a table here: http://www-1.ibm.com/support/docview.wss?rs=540&context=ST52G7&dc=D430&uid=ssg1S4000107&loc=en_US&cs=utf-8&lang=en

  2. bmcleod says:

    Daniel - Nice article, helped me with MPIO. I’m trying MPIO with EMC CX-300 / SLES 10 SP2 and Xen. Thanks -Bruce

    Think you have one typo:
    # service boot.multipth start
    should be
    # service boot.multipath start

 

How to setup - use multipathing on SLES

> support

How to setup / use multipathing on SLES

Information

Preamble: The procedure described within this article is only supported on SLES9 SP2 level and higher. Earlier releases may not work as expected.

1. Introduction

The Multipath IO (MPIO) support in SLES9 (SP2) is based on the Device Mapper (DM) multipath module of the Linux kernel, and the multipath-tools user-space package. These have been enhanced and integrated into SLES9 SP2 by SUSE Development.

DM MPIO is the preferred form of MPIO on SLES9 and the only option completely supported by Novell/SUSE.

DM MPIO features automatic configuration of the MPIO subsystem for a large variety of setups. Active/passive or active/active (with round-robin load balancing) configurations of up to 8 paths to each device are supported.

The framework is extensible both via specific hardware handlers (see below) or via more sophisticated load balancing algorithms than round-robin.

The user-space component takes care of automatic path discovery and grouping, as well as automated path retesting, so that a previously failed path is automatically reinstated when it becomes healthy again. This minimizes, if not obviates, the need for administrator attention in a production environment.

2. Supported configurations

3. Installation notes

4. Using the MPIO devices

Recommended Links

Novell Doc SLES 10 Storage Administration Guide - SLES 10 Storage Administration Guide

This guide provides information about how to manage storage devices on a SUSE® Linux Enterprise Server 10 Support Pack 2 server with an emphasis on using the Enterprise Volume Management System (EVMS) 2.5.5 or later to manage devices. Related storage administration issues are also covered as noted below.

Multipath I-O - Wikipedia, the free encyclopedia

Linux Multipath Howto (RHEL 4) - SWiK

Storage- Linux Native Multipathing (Device Mapper-Multipath)

The Linux multipath implementation

Linux Multipath Howto (RHAS4)

  1. up2date device-mapper-multipath
  2. Edit /etc/multipath.conf
    For detailed information, see: "SAN Persistent Binding and Multipathing in the 2.6 Kernel"
  3. modprobe dm-multipath
  4. modprobe dm-round-robin
  5. service multipathd start
  6. multipath -v2
    Will show multipath luns and groups. Look for the multipath group number, this is the dm-# listed in /proc/partitions. The multipath lun is accessed via the /dev/dm-# device entry.
  7. Format each SCSI DEVICE:
    • sfdisk /dev/sdX
  8. (Optional) Create multipath devices for each partition:
    (not needed if using LVM, since you will just mount the logical volume device)
    • kpartx -a /dev/dm-#
  9. Enable multipath to start on bootup:
    • chkconfig multipathd on

Configuring Linux to enable Multipath I/O

Linux multipath IO (MPIO) using multipath-tools | Calivia



Copyright © 1996-2009 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Site uses AdSense so you need to be aware of Google privacy policy. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

Disclaimer:

Created May 16, 1996; Last modified: August 15, 2009