initial import
This commit is contained in:
parent
e3e0eb7656
commit
e8fb7b288e
43 changed files with 14946 additions and 0 deletions
363
md/device_mapper_multipath.md
Normal file
363
md/device_mapper_multipath.md
Normal file
|
|
@ -0,0 +1,363 @@
|
|||
# Device Mapper Multipath
|
||||
|
||||
## Contents
|
||||
|
||||
- [Overview](#overview)
|
||||
- [Initial Setup](#initial-setup)
|
||||
- [DAS Config](#das-config)
|
||||
- [NAS iSCSI Config](#nas-iscsi-config)
|
||||
- [Multipath Names](#multipath-names)
|
||||
- [Administrating Multipaths](#administrating-multipaths)
|
||||
- [Partitioning Multipaths](#partitioning-multipaths)
|
||||
- [Clustered Multipaths](#clustered-multipaths)
|
||||
- [Renaming Multipaths](#renaming-multipaths)
|
||||
- [Multipath Ownership](#multipath-ownership)
|
||||
- [RHEL5 / CentOS5](#rhel5--centos5)
|
||||
- [RHEL6 / CentOS6](#rhel6--centos6)
|
||||
- [References](#references)
|
||||
|
||||
|
||||
## Overview
|
||||
|
||||
The `device-mapper-multipath` (a sub-component of `device-mapper`) subsystem is the native way of configuring 2 or more individual paths to the same storage LUN, typically used in a HA (failover) capacity. If one underlying path fails the system transfers I/O to another path; higher level operations (such as LVM) use the single multipath pseudo device and are abstracted from the underlying physical links.
|
||||
|
||||
|
||||
## Initial Setup
|
||||
|
||||
A standard setup requires 2 RPMs which provide the `multipathd` service and udev rules for naming the multipaths:
|
||||
|
||||
1. device-mapper
|
||||
2. device-mapper-multipath
|
||||
|
||||
For a Dell DAS such as the MD32xx 2 more packages are required, typically from the vendor install media:
|
||||
|
||||
1. dkms (Dynamic Kernel Module Support - framework required for the below RPM)
|
||||
2. scsi\_dh\_rdac (Dell custom version, the [kernel also contains one](https://github.com/torvalds/linux/blob/master/drivers/scsi/device_handler/scsi_dh_rdac.c))
|
||||
|
||||
The `multipathd` service is what pulls it all together.
|
||||
|
||||
|
||||
### DAS Config
|
||||
|
||||
A well formed Dell MD32xx DAS deployed config might look like:
|
||||
|
||||
```
|
||||
# DAS /etc/multipath.conf
|
||||
|
||||
blacklist {
|
||||
device {
|
||||
vendor "*"
|
||||
product "Universal Xport"
|
||||
}
|
||||
device {
|
||||
vendor "*"
|
||||
product "MD3000"
|
||||
}
|
||||
device {
|
||||
vendor "*"
|
||||
product "MD3000i"
|
||||
}
|
||||
device {
|
||||
vendor "*"
|
||||
product "Virtual Disk"
|
||||
}
|
||||
device {
|
||||
vendor "*"
|
||||
product "PERC|Perc"
|
||||
}
|
||||
}
|
||||
defaults {
|
||||
user_friendly_names yes
|
||||
max_fds 8192
|
||||
polling_interval 5
|
||||
}
|
||||
devices {
|
||||
device {
|
||||
vendor "DELL"
|
||||
product "MD32xxi"
|
||||
path_grouping_policy group_by_prio
|
||||
prio rdac
|
||||
path_checker rdac
|
||||
path_selector "round-robin 0"
|
||||
hardware_handler "1 rdac"
|
||||
failback immediate
|
||||
features "2 pg_init_retries 50"
|
||||
no_path_retry 30
|
||||
rr_min_io 100
|
||||
}
|
||||
device {
|
||||
vendor "DELL"
|
||||
product "MD32xx"
|
||||
path_grouping_policy group_by_prio
|
||||
prio rdac
|
||||
path_checker rdac
|
||||
path_selector "round-robin 0"
|
||||
hardware_handler "1 rdac"
|
||||
failback immediate
|
||||
features "2 pg_init_retries 50"
|
||||
no_path_retry 30
|
||||
rr_min_io 100
|
||||
}
|
||||
device {
|
||||
vendor "DELL"
|
||||
product "MD36xxi"
|
||||
path_grouping_policy group_by_prio
|
||||
prio rdac
|
||||
path_checker rdac
|
||||
path_selector "round-robin 0"
|
||||
hardware_handler "1 rdac"
|
||||
failback immediate
|
||||
features "2 pg_init_retries 50"
|
||||
no_path_retry 30
|
||||
rr_min_io 100
|
||||
}
|
||||
device {
|
||||
vendor "DELL"
|
||||
product "MD36xxf"
|
||||
path_grouping_policy group_by_prio
|
||||
prio rdac
|
||||
path_checker rdac
|
||||
path_selector "round-robin 0"
|
||||
hardware_handler "1 rdac"
|
||||
failback immediate
|
||||
features "2 pg_init_retries 50"
|
||||
no_path_retry 30
|
||||
rr_min_io 100
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
### NAS iSCSI Config
|
||||
|
||||
An example config for a Netapp NAS iSCSI might look like:
|
||||
|
||||
```
|
||||
# NAS iSCSI /etc/multipath.conf
|
||||
|
||||
blacklist {
|
||||
device {
|
||||
vendor "*"
|
||||
product "PERC|Perc"
|
||||
}
|
||||
device {
|
||||
vendor "*"
|
||||
product "Universal Xport"
|
||||
}
|
||||
device {
|
||||
vendor "*"
|
||||
product "Virtual Disk"
|
||||
}
|
||||
}
|
||||
|
||||
defaults {
|
||||
user_friendly_names yes
|
||||
max_fds max
|
||||
queue_without_daemon no
|
||||
}
|
||||
|
||||
devices {
|
||||
device {
|
||||
vendor "NETAPP"
|
||||
product "LUN"
|
||||
getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
|
||||
#
|
||||
# RHEL5 style
|
||||
prio_callout "/sbin/mpath_prio_ontap /dev/%n"
|
||||
# RHEL6 style
|
||||
# prio ontap
|
||||
#
|
||||
features "1 queue_if_no_path"
|
||||
hardware_handler "0"
|
||||
path_grouping_policy group_by_prio
|
||||
failback immediate
|
||||
rr_weight uniform
|
||||
rr_min_io 128
|
||||
path_checker directio
|
||||
flush_on_last_del yes
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
## Multipath Names
|
||||
|
||||
By default in RHEL/CentOS, the names of the multipath will be in `/dev/mapper/` and begin with "mpath" and be followed by a number (v5) or a letter (v6). A partition within that path will then have "p" followed by it's number. These are controlled by `udev` and a config file installed by the `device-mapper-multipath` RPM; for example on RHEL6/CentOS6 it's named `/lib/udev/rules.d/40-multipath.rules`.
|
||||
|
||||
Examples:
|
||||
|
||||
```
|
||||
/dev/mapper/mpath1p2 - 2nd partition on path #1 (1) (v5)
|
||||
/dev/mapper/mpathbp1 - 1st partition on path #2 (b) (v6)
|
||||
```
|
||||
|
||||
These are a human-friendly format of the WWID triggered by the setting `user_friendly_names yes` in the config file. These can be changed to suit needs - it's easy and can save a lot of confusion later if a dozen LUNs are used as RAW devices (such as in an Oracle RAC).
|
||||
|
||||
|
||||
## Administrating Multipaths
|
||||
|
||||
The main tool for administering multipaths is called `multipath` and is normally found in /sbin/ (root only). The primary use day-to-day will be the -l or -ll flags to simply list multipaths and their associated 'real' SCSI devices (paths). Using this tool you can examine the health of the (multi)paths and all associated information.
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
## DAS multipath
|
||||
# multipath -l
|
||||
[...]
|
||||
VOTING5 (3690b11c0001b99ba0000098f5192345e) dm-5 DELL,MD32xx
|
||||
size=1.0G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|
||||
|-+- policy='round-robin 0' prio=0 status=active
|
||||
| `- 2:0:0:4 sdw 65:96 active undef running
|
||||
`-+- policy='round-robin 0' prio=0 status=enabled
|
||||
`- 1:0:0:4 sdf 8:80 active undef running
|
||||
```
|
||||
|
||||
The system knows which SCSI devices belong together by their WWID (aka WWN, UUID) that are presented from the storage host - if they match, they belong together in a multipath. From the above example LUN, using the -v3 flag will show they match:
|
||||
|
||||
```
|
||||
## DAS WWIDs (WWN/UUID)
|
||||
# multipath -v3
|
||||
[...]
|
||||
uuid hcil dev dev_t pri dm_st chk_st vend/p
|
||||
3690b11c0001b99ba0000098f5192345e 2:0:0:4 sdw 65:96 14 undef ready DELL,M
|
||||
3690b11c0001b99ba0000098f5192345e 1:0:0:4 sdf 8:80 9 undef ready DELL,M
|
||||
```
|
||||
|
||||
The WWID 3690b11c0001b99ba0000098f5192345e matches on both SCSI devices, so now the multipath daemon knows they belong together and creates a pseudo device for us to work with. If one underlying path (device) fails, it goes over to the other one without any manual intervention. Magic.
|
||||
|
||||
There are other uses of the multipath tool, such as the -f/-F flags (flush paths) and -p (change policies) -- be careful using these on a live server. Check the man page for detailed information, and know there is a -d (dry run) option to test things before commit. It's sometimes easier to restart the multipathd daemon instead depending on what you're doing (such as renaming - see below).
|
||||
|
||||
|
||||
### Partitioning Multipaths
|
||||
|
||||
The tool `kpartx` is what an administrator will use to have the kernel re-examine newly partitioned multipaths and create new device entries for us; it's the equivalent of using `partx` on normal devices.
|
||||
|
||||
```
|
||||
## Normal SCSI device
|
||||
|
||||
# parted /dev/sdb (create new partition 1)
|
||||
# partx -a /dev/sdb
|
||||
# ls -1 /dev/sdb*
|
||||
/dev/sdb
|
||||
/dev/sdb1
|
||||
|
||||
## Multipath device
|
||||
|
||||
# parted /dev/mapper/mpathb (create new partition 1)
|
||||
# kpartx -a /dev/mapper/mpathb
|
||||
# ls -1 /dev/mapper/mpathb*
|
||||
/dev/mapper/mpathb
|
||||
/dev/mapper/mpathbp1
|
||||
```
|
||||
|
||||
The device /dev/mapper/mpathbp1 is now used just like /dev/sdb1 would be for any other tools (mkfs, pvcreate, vgextend, etc.) -- the multipath daemon takes care of routing the actual SCSI commands out to the active device (path) in the multipath to storage.
|
||||
|
||||
|
||||
### Clustered Multipaths
|
||||
|
||||
Using the WWIDs as described above will allow you to ensure that if you have a host group of LUNs presented to 2 or more servers match multipaths. **The mapping of a WWID to multipath on one node must match on all other nodes**, otherwise you're writing to different storage areas on different nodes. If your examination finds they do not match you may need to rename them manually - see below.
|
||||
|
||||
> Always double-check the WWID to multipath mappings match on all nodes in a cluster\! This may not be quick but it's extremely important the time be spent doing this work. Never assume it's "just right" on a new build.
|
||||
|
||||
### Renaming Multipaths
|
||||
|
||||
Renaming them is easy - add a new stanza to the bottom of multipath.conf that has a grouping, then rename each one. The setting `user_friendly_names yes` is required in multipath.conf for this to work as expected. For example, here's is a rename of a shared Oracle RAC voting LUN from the spurious name into something that makes sense for use inside Oracle as a RAW device:
|
||||
|
||||
```
|
||||
multipaths {
|
||||
multipath {
|
||||
wwid 3690b11c0001b99ba0000098f5192345e
|
||||
alias VOTING5
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Restart `multipathd` service and now the multipath is named like so:
|
||||
|
||||
```
|
||||
# ls -1 /dev/mapper/VOTING5
|
||||
/dev/mapper/VOTING5
|
||||
```
|
||||
|
||||
The partitions within a renamed multipath follow the same convention, 'p' followed by a number. You would expect names like `/dev/mapper/VOTING5p1`, `/dev/mapper/VOTING5p2`, etc. if you partitioned this LUN for use as a normal filesystem.
|
||||
|
||||
|
||||
## Multipath Ownership
|
||||
|
||||
One of the other common desires is to set the UID, GID and mode on the multipaths; alas there's a different method for RHEL/CentOS v5 and v6.
|
||||
|
||||
|
||||
### RHEL5 / CentOS5
|
||||
|
||||
This is done in the same block schema as renaming them like so:
|
||||
|
||||
```
|
||||
multipaths {
|
||||
multipath {
|
||||
wwid 3690b11c0001b99ba0000098f5192345e
|
||||
alias VOTING5
|
||||
uid 503
|
||||
gid 503
|
||||
mode 755
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Note that the system requires the numerical UID/GID and octal mode as shown above.
|
||||
|
||||
|
||||
### RHEL6 / CentOS6
|
||||
|
||||
The above method was deprecated in RHEL6 in favor of udev rules - Red Hat's article on how to set it up is wee bit lacking; use a ruleset like this instead of their official doc:
|
||||
|
||||
```
|
||||
/etc/udev/rules.d/12-dm-permissions.rules
|
||||
|
||||
ENV{DM_NAME}=="VOTING5", OWNER:="oracle", GROUP:="oinstall", MODE:="660"
|
||||
```
|
||||
|
||||
This is based on renaming the multipath outlined above; to get the value of the DM\_NAME you are trying to rename the "udevadm" tool is used to query the raw device-map node.
|
||||
|
||||
- Get the raw node-name with a simple ls:
|
||||
|
||||
```
|
||||
# ls -l /dev/mapper/VOTING5
|
||||
lrwxrwxrwx 1 root root 7 May 30 22:41 /dev/mapper/VOTING5 -> ../dm-5
|
||||
```
|
||||
|
||||
- Use that dm-?? number against the sysfs interface for it:
|
||||
|
||||
```
|
||||
# udevadm info --query=all --path=/devices/virtual/block/dm-5/
|
||||
P: /devices/virtual/block/dm-5
|
||||
N: dm-5
|
||||
S: mapper/VOTING5
|
||||
S: disk/by-id/dm-name-VOTING5
|
||||
S: disk/by-id/dm-uuid-mpath-3690b11c0001b99ba0000098f5192345e
|
||||
S: block/253:5
|
||||
E: UDEV_LOG=3
|
||||
E: DEVPATH=/devices/virtual/block/dm-5
|
||||
E: MAJOR=253
|
||||
E: MINOR=5
|
||||
E: DEVNAME=/dev/dm-5
|
||||
E: DEVTYPE=disk
|
||||
E: SUBSYSTEM=block
|
||||
E: DM_SBIN_PATH=/sbin
|
||||
E: DM_UDEV_PRIMARY_SOURCE_FLAG=1
|
||||
E: DM_UDEV_RULES_VSN=2
|
||||
E: DM_NAME=VOTING5
|
||||
E: DM_UUID=mpath-3690b11c0001b99ba0000098f5192345e
|
||||
E: DM_SUSPENDED=0
|
||||
E: MPATH_SBIN_PATH=/sbin
|
||||
E: DEVLINKS=/dev/mapper/VOTING5 /dev/disk/by-id/dm-name-VOTING5 /dev/disk/by-id/dm-uuid-mpath-3690b11c0001b99ba0000098f5192345e /dev/block/253:5
|
||||
```
|
||||
|
||||
Use any line item that begins with "E: " as the match clause in your udev rule; it seems the most obvious to use DM\_NAME however your situation may require using one of the others.
|
||||
|
||||
|
||||
## References
|
||||
|
||||
- <http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html-single/DM_Multipath/index.html#multipath_consistent_names>
|
||||
- <http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html-single/DM_Multipath/index.html#config_file_defaults>
|
||||
- <http://technologist.pro/storage/multipathing-netapp-lun-on-rhel-5-3>
|
||||
- <https://github.com/torvalds/linux/blob/master/drivers/scsi/device_handler/scsi_dh_rdac.c>
|
||||
Loading…
Add table
Add a link
Reference in a new issue