363 lines
12 KiB
Markdown
363 lines
12 KiB
Markdown
# Device Mapper Multipath
|
|
|
|
## Contents
|
|
|
|
- [Overview](#overview)
|
|
- [Initial Setup](#initial-setup)
|
|
- [DAS Config](#das-config)
|
|
- [NAS iSCSI Config](#nas-iscsi-config)
|
|
- [Multipath Names](#multipath-names)
|
|
- [Administrating Multipaths](#administrating-multipaths)
|
|
- [Partitioning Multipaths](#partitioning-multipaths)
|
|
- [Clustered Multipaths](#clustered-multipaths)
|
|
- [Renaming Multipaths](#renaming-multipaths)
|
|
- [Multipath Ownership](#multipath-ownership)
|
|
- [RHEL5 / CentOS5](#rhel5--centos5)
|
|
- [RHEL6 / CentOS6](#rhel6--centos6)
|
|
- [References](#references)
|
|
|
|
|
|
## Overview
|
|
|
|
The `device-mapper-multipath` (a sub-component of `device-mapper`) subsystem is the native way of configuring 2 or more individual paths to the same storage LUN, typically used in a HA (failover) capacity. If one underlying path fails the system transfers I/O to another path; higher level operations (such as LVM) use the single multipath pseudo device and are abstracted from the underlying physical links.
|
|
|
|
|
|
## Initial Setup
|
|
|
|
A standard setup requires 2 RPMs which provide the `multipathd` service and udev rules for naming the multipaths:
|
|
|
|
1. device-mapper
|
|
2. device-mapper-multipath
|
|
|
|
For a Dell DAS such as the MD32xx 2 more packages are required, typically from the vendor install media:
|
|
|
|
1. dkms (Dynamic Kernel Module Support - framework required for the below RPM)
|
|
2. scsi\_dh\_rdac (Dell custom version, the [kernel also contains one](https://github.com/torvalds/linux/blob/master/drivers/scsi/device_handler/scsi_dh_rdac.c))
|
|
|
|
The `multipathd` service is what pulls it all together.
|
|
|
|
|
|
### DAS Config
|
|
|
|
A well formed Dell MD32xx DAS deployed config might look like:
|
|
|
|
```
|
|
# DAS /etc/multipath.conf
|
|
|
|
blacklist {
|
|
device {
|
|
vendor "*"
|
|
product "Universal Xport"
|
|
}
|
|
device {
|
|
vendor "*"
|
|
product "MD3000"
|
|
}
|
|
device {
|
|
vendor "*"
|
|
product "MD3000i"
|
|
}
|
|
device {
|
|
vendor "*"
|
|
product "Virtual Disk"
|
|
}
|
|
device {
|
|
vendor "*"
|
|
product "PERC|Perc"
|
|
}
|
|
}
|
|
defaults {
|
|
user_friendly_names yes
|
|
max_fds 8192
|
|
polling_interval 5
|
|
}
|
|
devices {
|
|
device {
|
|
vendor "DELL"
|
|
product "MD32xxi"
|
|
path_grouping_policy group_by_prio
|
|
prio rdac
|
|
path_checker rdac
|
|
path_selector "round-robin 0"
|
|
hardware_handler "1 rdac"
|
|
failback immediate
|
|
features "2 pg_init_retries 50"
|
|
no_path_retry 30
|
|
rr_min_io 100
|
|
}
|
|
device {
|
|
vendor "DELL"
|
|
product "MD32xx"
|
|
path_grouping_policy group_by_prio
|
|
prio rdac
|
|
path_checker rdac
|
|
path_selector "round-robin 0"
|
|
hardware_handler "1 rdac"
|
|
failback immediate
|
|
features "2 pg_init_retries 50"
|
|
no_path_retry 30
|
|
rr_min_io 100
|
|
}
|
|
device {
|
|
vendor "DELL"
|
|
product "MD36xxi"
|
|
path_grouping_policy group_by_prio
|
|
prio rdac
|
|
path_checker rdac
|
|
path_selector "round-robin 0"
|
|
hardware_handler "1 rdac"
|
|
failback immediate
|
|
features "2 pg_init_retries 50"
|
|
no_path_retry 30
|
|
rr_min_io 100
|
|
}
|
|
device {
|
|
vendor "DELL"
|
|
product "MD36xxf"
|
|
path_grouping_policy group_by_prio
|
|
prio rdac
|
|
path_checker rdac
|
|
path_selector "round-robin 0"
|
|
hardware_handler "1 rdac"
|
|
failback immediate
|
|
features "2 pg_init_retries 50"
|
|
no_path_retry 30
|
|
rr_min_io 100
|
|
}
|
|
}
|
|
```
|
|
|
|
|
|
### NAS iSCSI Config
|
|
|
|
An example config for a Netapp NAS iSCSI might look like:
|
|
|
|
```
|
|
# NAS iSCSI /etc/multipath.conf
|
|
|
|
blacklist {
|
|
device {
|
|
vendor "*"
|
|
product "PERC|Perc"
|
|
}
|
|
device {
|
|
vendor "*"
|
|
product "Universal Xport"
|
|
}
|
|
device {
|
|
vendor "*"
|
|
product "Virtual Disk"
|
|
}
|
|
}
|
|
|
|
defaults {
|
|
user_friendly_names yes
|
|
max_fds max
|
|
queue_without_daemon no
|
|
}
|
|
|
|
devices {
|
|
device {
|
|
vendor "NETAPP"
|
|
product "LUN"
|
|
getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
|
|
#
|
|
# RHEL5 style
|
|
prio_callout "/sbin/mpath_prio_ontap /dev/%n"
|
|
# RHEL6 style
|
|
# prio ontap
|
|
#
|
|
features "1 queue_if_no_path"
|
|
hardware_handler "0"
|
|
path_grouping_policy group_by_prio
|
|
failback immediate
|
|
rr_weight uniform
|
|
rr_min_io 128
|
|
path_checker directio
|
|
flush_on_last_del yes
|
|
}
|
|
}
|
|
```
|
|
|
|
|
|
## Multipath Names
|
|
|
|
By default in RHEL/CentOS, the names of the multipath will be in `/dev/mapper/` and begin with "mpath" and be followed by a number (v5) or a letter (v6). A partition within that path will then have "p" followed by it's number. These are controlled by `udev` and a config file installed by the `device-mapper-multipath` RPM; for example on RHEL6/CentOS6 it's named `/lib/udev/rules.d/40-multipath.rules`.
|
|
|
|
Examples:
|
|
|
|
```
|
|
/dev/mapper/mpath1p2 - 2nd partition on path #1 (1) (v5)
|
|
/dev/mapper/mpathbp1 - 1st partition on path #2 (b) (v6)
|
|
```
|
|
|
|
These are a human-friendly format of the WWID triggered by the setting `user_friendly_names yes` in the config file. These can be changed to suit needs - it's easy and can save a lot of confusion later if a dozen LUNs are used as RAW devices (such as in an Oracle RAC).
|
|
|
|
|
|
## Administrating Multipaths
|
|
|
|
The main tool for administering multipaths is called `multipath` and is normally found in /sbin/ (root only). The primary use day-to-day will be the -l or -ll flags to simply list multipaths and their associated 'real' SCSI devices (paths). Using this tool you can examine the health of the (multi)paths and all associated information.
|
|
|
|
Example:
|
|
|
|
```
|
|
## DAS multipath
|
|
# multipath -l
|
|
[...]
|
|
VOTING5 (3690b11c0001b99ba0000098f5192345e) dm-5 DELL,MD32xx
|
|
size=1.0G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|
|
|-+- policy='round-robin 0' prio=0 status=active
|
|
| `- 2:0:0:4 sdw 65:96 active undef running
|
|
`-+- policy='round-robin 0' prio=0 status=enabled
|
|
`- 1:0:0:4 sdf 8:80 active undef running
|
|
```
|
|
|
|
The system knows which SCSI devices belong together by their WWID (aka WWN, UUID) that are presented from the storage host - if they match, they belong together in a multipath. From the above example LUN, using the -v3 flag will show they match:
|
|
|
|
```
|
|
## DAS WWIDs (WWN/UUID)
|
|
# multipath -v3
|
|
[...]
|
|
uuid hcil dev dev_t pri dm_st chk_st vend/p
|
|
3690b11c0001b99ba0000098f5192345e 2:0:0:4 sdw 65:96 14 undef ready DELL,M
|
|
3690b11c0001b99ba0000098f5192345e 1:0:0:4 sdf 8:80 9 undef ready DELL,M
|
|
```
|
|
|
|
The WWID 3690b11c0001b99ba0000098f5192345e matches on both SCSI devices, so now the multipath daemon knows they belong together and creates a pseudo device for us to work with. If one underlying path (device) fails, it goes over to the other one without any manual intervention. Magic.
|
|
|
|
There are other uses of the multipath tool, such as the -f/-F flags (flush paths) and -p (change policies) -- be careful using these on a live server. Check the man page for detailed information, and know there is a -d (dry run) option to test things before commit. It's sometimes easier to restart the multipathd daemon instead depending on what you're doing (such as renaming - see below).
|
|
|
|
|
|
### Partitioning Multipaths
|
|
|
|
The tool `kpartx` is what an administrator will use to have the kernel re-examine newly partitioned multipaths and create new device entries for us; it's the equivalent of using `partx` on normal devices.
|
|
|
|
```
|
|
## Normal SCSI device
|
|
|
|
# parted /dev/sdb (create new partition 1)
|
|
# partx -a /dev/sdb
|
|
# ls -1 /dev/sdb*
|
|
/dev/sdb
|
|
/dev/sdb1
|
|
|
|
## Multipath device
|
|
|
|
# parted /dev/mapper/mpathb (create new partition 1)
|
|
# kpartx -a /dev/mapper/mpathb
|
|
# ls -1 /dev/mapper/mpathb*
|
|
/dev/mapper/mpathb
|
|
/dev/mapper/mpathbp1
|
|
```
|
|
|
|
The device /dev/mapper/mpathbp1 is now used just like /dev/sdb1 would be for any other tools (mkfs, pvcreate, vgextend, etc.) -- the multipath daemon takes care of routing the actual SCSI commands out to the active device (path) in the multipath to storage.
|
|
|
|
|
|
### Clustered Multipaths
|
|
|
|
Using the WWIDs as described above will allow you to ensure that if you have a host group of LUNs presented to 2 or more servers match multipaths. **The mapping of a WWID to multipath on one node must match on all other nodes**, otherwise you're writing to different storage areas on different nodes. If your examination finds they do not match you may need to rename them manually - see below.
|
|
|
|
> Always double-check the WWID to multipath mappings match on all nodes in a cluster\! This may not be quick but it's extremely important the time be spent doing this work. Never assume it's "just right" on a new build.
|
|
|
|
### Renaming Multipaths
|
|
|
|
Renaming them is easy - add a new stanza to the bottom of multipath.conf that has a grouping, then rename each one. The setting `user_friendly_names yes` is required in multipath.conf for this to work as expected. For example, here's is a rename of a shared Oracle RAC voting LUN from the spurious name into something that makes sense for use inside Oracle as a RAW device:
|
|
|
|
```
|
|
multipaths {
|
|
multipath {
|
|
wwid 3690b11c0001b99ba0000098f5192345e
|
|
alias VOTING5
|
|
}
|
|
}
|
|
```
|
|
|
|
Restart `multipathd` service and now the multipath is named like so:
|
|
|
|
```
|
|
# ls -1 /dev/mapper/VOTING5
|
|
/dev/mapper/VOTING5
|
|
```
|
|
|
|
The partitions within a renamed multipath follow the same convention, 'p' followed by a number. You would expect names like `/dev/mapper/VOTING5p1`, `/dev/mapper/VOTING5p2`, etc. if you partitioned this LUN for use as a normal filesystem.
|
|
|
|
|
|
## Multipath Ownership
|
|
|
|
One of the other common desires is to set the UID, GID and mode on the multipaths; alas there's a different method for RHEL/CentOS v5 and v6.
|
|
|
|
|
|
### RHEL5 / CentOS5
|
|
|
|
This is done in the same block schema as renaming them like so:
|
|
|
|
```
|
|
multipaths {
|
|
multipath {
|
|
wwid 3690b11c0001b99ba0000098f5192345e
|
|
alias VOTING5
|
|
uid 503
|
|
gid 503
|
|
mode 755
|
|
}
|
|
}
|
|
```
|
|
|
|
Note that the system requires the numerical UID/GID and octal mode as shown above.
|
|
|
|
|
|
### RHEL6 / CentOS6
|
|
|
|
The above method was deprecated in RHEL6 in favor of udev rules - Red Hat's article on how to set it up is wee bit lacking; use a ruleset like this instead of their official doc:
|
|
|
|
```
|
|
/etc/udev/rules.d/12-dm-permissions.rules
|
|
|
|
ENV{DM_NAME}=="VOTING5", OWNER:="oracle", GROUP:="oinstall", MODE:="660"
|
|
```
|
|
|
|
This is based on renaming the multipath outlined above; to get the value of the DM\_NAME you are trying to rename the "udevadm" tool is used to query the raw device-map node.
|
|
|
|
- Get the raw node-name with a simple ls:
|
|
|
|
```
|
|
# ls -l /dev/mapper/VOTING5
|
|
lrwxrwxrwx 1 root root 7 May 30 22:41 /dev/mapper/VOTING5 -> ../dm-5
|
|
```
|
|
|
|
- Use that dm-?? number against the sysfs interface for it:
|
|
|
|
```
|
|
# udevadm info --query=all --path=/devices/virtual/block/dm-5/
|
|
P: /devices/virtual/block/dm-5
|
|
N: dm-5
|
|
S: mapper/VOTING5
|
|
S: disk/by-id/dm-name-VOTING5
|
|
S: disk/by-id/dm-uuid-mpath-3690b11c0001b99ba0000098f5192345e
|
|
S: block/253:5
|
|
E: UDEV_LOG=3
|
|
E: DEVPATH=/devices/virtual/block/dm-5
|
|
E: MAJOR=253
|
|
E: MINOR=5
|
|
E: DEVNAME=/dev/dm-5
|
|
E: DEVTYPE=disk
|
|
E: SUBSYSTEM=block
|
|
E: DM_SBIN_PATH=/sbin
|
|
E: DM_UDEV_PRIMARY_SOURCE_FLAG=1
|
|
E: DM_UDEV_RULES_VSN=2
|
|
E: DM_NAME=VOTING5
|
|
E: DM_UUID=mpath-3690b11c0001b99ba0000098f5192345e
|
|
E: DM_SUSPENDED=0
|
|
E: MPATH_SBIN_PATH=/sbin
|
|
E: DEVLINKS=/dev/mapper/VOTING5 /dev/disk/by-id/dm-name-VOTING5 /dev/disk/by-id/dm-uuid-mpath-3690b11c0001b99ba0000098f5192345e /dev/block/253:5
|
|
```
|
|
|
|
Use any line item that begins with "E: " as the match clause in your udev rule; it seems the most obvious to use DM\_NAME however your situation may require using one of the others.
|
|
|
|
|
|
## References
|
|
|
|
- <http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html-single/DM_Multipath/index.html#multipath_consistent_names>
|
|
- <http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html-single/DM_Multipath/index.html#config_file_defaults>
|
|
- <http://technologist.pro/storage/multipathing-netapp-lun-on-rhel-5-3>
|
|
- <https://github.com/torvalds/linux/blob/master/drivers/scsi/device_handler/scsi_dh_rdac.c>
|