12 KiB
Device Mapper Multipath
Contents
Overview
The device-mapper-multipath (a sub-component of device-mapper) subsystem is the native way of configuring 2 or more individual paths to the same storage LUN, typically used in a HA (failover) capacity. If one underlying path fails the system transfers I/O to another path; higher level operations (such as LVM) use the single multipath pseudo device and are abstracted from the underlying physical links.
Initial Setup
A standard setup requires 2 RPMs which provide the multipathd service and udev rules for naming the multipaths:
- device-mapper
- device-mapper-multipath
For a Dell DAS such as the MD32xx 2 more packages are required, typically from the vendor install media:
- dkms (Dynamic Kernel Module Support - framework required for the below RPM)
- scsi_dh_rdac (Dell custom version, the kernel also contains one)
The multipathd service is what pulls it all together.
DAS Config
A well formed Dell MD32xx DAS deployed config might look like:
# DAS /etc/multipath.conf
blacklist {
device {
vendor "*"
product "Universal Xport"
}
device {
vendor "*"
product "MD3000"
}
device {
vendor "*"
product "MD3000i"
}
device {
vendor "*"
product "Virtual Disk"
}
device {
vendor "*"
product "PERC|Perc"
}
}
defaults {
user_friendly_names yes
max_fds 8192
polling_interval 5
}
devices {
device {
vendor "DELL"
product "MD32xxi"
path_grouping_policy group_by_prio
prio rdac
path_checker rdac
path_selector "round-robin 0"
hardware_handler "1 rdac"
failback immediate
features "2 pg_init_retries 50"
no_path_retry 30
rr_min_io 100
}
device {
vendor "DELL"
product "MD32xx"
path_grouping_policy group_by_prio
prio rdac
path_checker rdac
path_selector "round-robin 0"
hardware_handler "1 rdac"
failback immediate
features "2 pg_init_retries 50"
no_path_retry 30
rr_min_io 100
}
device {
vendor "DELL"
product "MD36xxi"
path_grouping_policy group_by_prio
prio rdac
path_checker rdac
path_selector "round-robin 0"
hardware_handler "1 rdac"
failback immediate
features "2 pg_init_retries 50"
no_path_retry 30
rr_min_io 100
}
device {
vendor "DELL"
product "MD36xxf"
path_grouping_policy group_by_prio
prio rdac
path_checker rdac
path_selector "round-robin 0"
hardware_handler "1 rdac"
failback immediate
features "2 pg_init_retries 50"
no_path_retry 30
rr_min_io 100
}
}
NAS iSCSI Config
An example config for a Netapp NAS iSCSI might look like:
# NAS iSCSI /etc/multipath.conf
blacklist {
device {
vendor "*"
product "PERC|Perc"
}
device {
vendor "*"
product "Universal Xport"
}
device {
vendor "*"
product "Virtual Disk"
}
}
defaults {
user_friendly_names yes
max_fds max
queue_without_daemon no
}
devices {
device {
vendor "NETAPP"
product "LUN"
getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
#
# RHEL5 style
prio_callout "/sbin/mpath_prio_ontap /dev/%n"
# RHEL6 style
# prio ontap
#
features "1 queue_if_no_path"
hardware_handler "0"
path_grouping_policy group_by_prio
failback immediate
rr_weight uniform
rr_min_io 128
path_checker directio
flush_on_last_del yes
}
}
Multipath Names
By default in RHEL/CentOS, the names of the multipath will be in /dev/mapper/ and begin with "mpath" and be followed by a number (v5) or a letter (v6). A partition within that path will then have "p" followed by it's number. These are controlled by udev and a config file installed by the device-mapper-multipath RPM; for example on RHEL6/CentOS6 it's named /lib/udev/rules.d/40-multipath.rules.
Examples:
/dev/mapper/mpath1p2 - 2nd partition on path #1 (1) (v5)
/dev/mapper/mpathbp1 - 1st partition on path #2 (b) (v6)
These are a human-friendly format of the WWID triggered by the setting user_friendly_names yes in the config file. These can be changed to suit needs - it's easy and can save a lot of confusion later if a dozen LUNs are used as RAW devices (such as in an Oracle RAC).
Administrating Multipaths
The main tool for administering multipaths is called multipath and is normally found in /sbin/ (root only). The primary use day-to-day will be the -l or -ll flags to simply list multipaths and their associated 'real' SCSI devices (paths). Using this tool you can examine the health of the (multi)paths and all associated information.
Example:
## DAS multipath
# multipath -l
[...]
VOTING5 (3690b11c0001b99ba0000098f5192345e) dm-5 DELL,MD32xx
size=1.0G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 2:0:0:4 sdw 65:96 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 1:0:0:4 sdf 8:80 active undef running
The system knows which SCSI devices belong together by their WWID (aka WWN, UUID) that are presented from the storage host - if they match, they belong together in a multipath. From the above example LUN, using the -v3 flag will show they match:
## DAS WWIDs (WWN/UUID)
# multipath -v3
[...]
uuid hcil dev dev_t pri dm_st chk_st vend/p
3690b11c0001b99ba0000098f5192345e 2:0:0:4 sdw 65:96 14 undef ready DELL,M
3690b11c0001b99ba0000098f5192345e 1:0:0:4 sdf 8:80 9 undef ready DELL,M
The WWID 3690b11c0001b99ba0000098f5192345e matches on both SCSI devices, so now the multipath daemon knows they belong together and creates a pseudo device for us to work with. If one underlying path (device) fails, it goes over to the other one without any manual intervention. Magic.
There are other uses of the multipath tool, such as the -f/-F flags (flush paths) and -p (change policies) -- be careful using these on a live server. Check the man page for detailed information, and know there is a -d (dry run) option to test things before commit. It's sometimes easier to restart the multipathd daemon instead depending on what you're doing (such as renaming - see below).
Partitioning Multipaths
The tool kpartx is what an administrator will use to have the kernel re-examine newly partitioned multipaths and create new device entries for us; it's the equivalent of using partx on normal devices.
## Normal SCSI device
# parted /dev/sdb (create new partition 1)
# partx -a /dev/sdb
# ls -1 /dev/sdb*
/dev/sdb
/dev/sdb1
## Multipath device
# parted /dev/mapper/mpathb (create new partition 1)
# kpartx -a /dev/mapper/mpathb
# ls -1 /dev/mapper/mpathb*
/dev/mapper/mpathb
/dev/mapper/mpathbp1
The device /dev/mapper/mpathbp1 is now used just like /dev/sdb1 would be for any other tools (mkfs, pvcreate, vgextend, etc.) -- the multipath daemon takes care of routing the actual SCSI commands out to the active device (path) in the multipath to storage.
Clustered Multipaths
Using the WWIDs as described above will allow you to ensure that if you have a host group of LUNs presented to 2 or more servers match multipaths. The mapping of a WWID to multipath on one node must match on all other nodes, otherwise you're writing to different storage areas on different nodes. If your examination finds they do not match you may need to rename them manually - see below.
Always double-check the WWID to multipath mappings match on all nodes in a cluster! This may not be quick but it's extremely important the time be spent doing this work. Never assume it's "just right" on a new build.
Renaming Multipaths
Renaming them is easy - add a new stanza to the bottom of multipath.conf that has a grouping, then rename each one. The setting user_friendly_names yes is required in multipath.conf for this to work as expected. For example, here's is a rename of a shared Oracle RAC voting LUN from the spurious name into something that makes sense for use inside Oracle as a RAW device:
multipaths {
multipath {
wwid 3690b11c0001b99ba0000098f5192345e
alias VOTING5
}
}
Restart multipathd service and now the multipath is named like so:
# ls -1 /dev/mapper/VOTING5
/dev/mapper/VOTING5
The partitions within a renamed multipath follow the same convention, 'p' followed by a number. You would expect names like /dev/mapper/VOTING5p1, /dev/mapper/VOTING5p2, etc. if you partitioned this LUN for use as a normal filesystem.
Multipath Ownership
One of the other common desires is to set the UID, GID and mode on the multipaths; alas there's a different method for RHEL/CentOS v5 and v6.
RHEL5 / CentOS5
This is done in the same block schema as renaming them like so:
multipaths {
multipath {
wwid 3690b11c0001b99ba0000098f5192345e
alias VOTING5
uid 503
gid 503
mode 755
}
}
Note that the system requires the numerical UID/GID and octal mode as shown above.
RHEL6 / CentOS6
The above method was deprecated in RHEL6 in favor of udev rules - Red Hat's article on how to set it up is wee bit lacking; use a ruleset like this instead of their official doc:
/etc/udev/rules.d/12-dm-permissions.rules
ENV{DM_NAME}=="VOTING5", OWNER:="oracle", GROUP:="oinstall", MODE:="660"
This is based on renaming the multipath outlined above; to get the value of the DM_NAME you are trying to rename the "udevadm" tool is used to query the raw device-map node.
- Get the raw node-name with a simple ls:
# ls -l /dev/mapper/VOTING5
lrwxrwxrwx 1 root root 7 May 30 22:41 /dev/mapper/VOTING5 -> ../dm-5
- Use that dm-?? number against the sysfs interface for it:
# udevadm info --query=all --path=/devices/virtual/block/dm-5/
P: /devices/virtual/block/dm-5
N: dm-5
S: mapper/VOTING5
S: disk/by-id/dm-name-VOTING5
S: disk/by-id/dm-uuid-mpath-3690b11c0001b99ba0000098f5192345e
S: block/253:5
E: UDEV_LOG=3
E: DEVPATH=/devices/virtual/block/dm-5
E: MAJOR=253
E: MINOR=5
E: DEVNAME=/dev/dm-5
E: DEVTYPE=disk
E: SUBSYSTEM=block
E: DM_SBIN_PATH=/sbin
E: DM_UDEV_PRIMARY_SOURCE_FLAG=1
E: DM_UDEV_RULES_VSN=2
E: DM_NAME=VOTING5
E: DM_UUID=mpath-3690b11c0001b99ba0000098f5192345e
E: DM_SUSPENDED=0
E: MPATH_SBIN_PATH=/sbin
E: DEVLINKS=/dev/mapper/VOTING5 /dev/disk/by-id/dm-name-VOTING5 /dev/disk/by-id/dm-uuid-mpath-3690b11c0001b99ba0000098f5192345e /dev/block/253:5
Use any line item that begins with "E: " as the match clause in your udev rule; it seems the most obvious to use DM_NAME however your situation may require using one of the others.
References
- http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html-single/DM_Multipath/index.html#multipath_consistent_names
- http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html-single/DM_Multipath/index.html#config_file_defaults
- http://technologist.pro/storage/multipathing-netapp-lun-on-rhel-5-3
- https://github.com/torvalds/linux/blob/master/drivers/scsi/device_handler/scsi_dh_rdac.c