18 KiB
RHCS Mechanics
Contents
- Acronyms
- Configuration Files
- Filesystem Locations
- Operational Commands
- Cluster Components
- Operational Examples
- Configuration Examples
- References
Acronyms
- AIS: Application Interface Specification
- AMF: Availability Management Framework
- CCS: Cluster Configuration System
- CLM: Cluster Membership
- CLVM: Cluster Logical Volume Manager
- CMAN: Cluster Manager
- DLM: Distributed Lock Manager
- GFS2: Global File System 2
- GNDB: Global Network Block Device
- STONITH: Shoot The Other Node In The Head
- TOTEM: Group communication algorithm for reliable group messaging among cluster members
Configuration Files
/etc/cluster/cluster.conf- The main cluster configuration file/etc/lvm/lvm.conf- The LVM configuration file - typicallylocking_typeand afilterare being configured here
Filesystem Locations
/usr/share/cluster/- The main directory of code used for cluster objects |/var/log/cluster/- The main logging directory (RHEL6)
Operational Commands
Graphical Cluster Configuration
luci- Cluster Management Web Interface primarily used with RHEL6system-config-cluster- Cluster Management X11/Motif Interface primarily used with RHEL5
RGManager - Resource Group Manager
clustat- Command used to display the status of the cluster, including node membership and services runningclusvcadm- Command used to manually enable, disable, relocate, and restart user services in a clusterrg_test- Debug and test services and resource ordering
CCS - Cluster Configuration System
ccs_config_validate- Verify a configuration; can validate the running config or a named file (RHEL6)ccs_config_dump- Tool to generate XML output of running configuration (RHEL6)ccs_sync- Synchronize the cluster configuration file to one or more machines in a cluster (RHEL6)ccs_update_schema- Update the cluster relaxng schema that validates cluster.conf (RHEL6)ccs_test- Diagnostic and testing command used to retrieve information from configuration files via ccsdccs_tool- Used to make online updates of CCS configuration files - considered obsolete
CMAN - Cluster Manager
cman_tool- The administrative front end to CMAN, starts and stops CMAN infrastructure and can perform changesgroup_tool- Used to get a list of groups related to fencing, DLM, GFS, and getting debug informationfence_XXXX- Fence agent for XXXX type of device- for examplefence_drac(Dell DRAC),fence_ipmilan(IPMI) andfence_ilo(HP iLO)fence_check- Test the fence configuration for each node in the clusterfence_node- A program which performs I/O fencing on a single nodefence_tool- A program to join and leave the fence domaindlm_tool- Utility for thedlmanddlm_controlddaemongfs_control- Utility for the gfs_controld daemon
GFS2 - Global File System 2
mkfs.gfs2- Creates a GFS2 file system on a storage devicemount.gfs2- Mount a GFS2 file system; normally not used by the user directlyfsck.gfs2- Repair an unmounted GFS2 file systemgfs2_grow- Grows a mounted GFS2 file systemgfs2_jadd- Adds journals to a mounted GFS2 file systemgfs2_quota- Manage quotas on a mounted GFS2 file systemgfs2_tool- Configures, tunes and gather information on a GFS2 file system
Quorum Disk
mkqdisk- Cluster Quorum Disk Utility
Cluster Components
RGManager - Resource Group Manager
rgmanager- Daemon used to handle user service requests including service start, service disable, service relocate, and service restart; RHEL6clurgmgrd- Daemon used to handle user service requests including service start, service disable, service relocate, and service restart; RHEL5cpglockd- Utilizes the extended virtual synchrony features of Corosync to implement a simplistic, distributed lock server for rgmanager
CLVM - Cluster Logical Volume Manager
clvmd- The daemon that distributes LVM metadata updates around a cluster. Requirescmanto be running first
CCS - Cluster Configuration System
ricci- CCS daemon running on all cluster nodes and provides configuration file data to cluster software; RHEL6ccsd- CCS daemon running on all cluster nodes and provides configuration file data to cluster software; RHEL5
CMAN - Cluster Manager
cman- Cluster initscript used to start/stop all the CMAN daemonscorosync- Corosync cluster communications infrastructure daemon using TOTEM; RHEL6aisexec- OpenAIS cluster communications infrastructure daemon using TOTEM; RHEL5fenced- Fences cluster nodes that have failed (fencing generally means rebooting)dlm_controld- Daemon that configures dlm according to cluster eventsgfs_controld- Daemon that coordinates GFS mounts and recoverygroupd- Compatibility daemon forfenced,dlm_controldandgfs_controldqdiskd- Talks to CMAN and provides a mechanism for determining node-fitness in a cluster environmentcmannotifyd- Talks to CMAN and provides a mechanism to notify external entities about cluster changes
Operational Examples
The man pages for clustat and clusvcadm contain more in-depth explanations of all the shown options; more options exist than are shown here.
Configuration Validation
As RHEL5 does not have the ccs_config_validate utility, an alternate method is possible to perform XML validation against the cluster schema instead:
xmllint --relaxng /usr/share/system-config-cluster/misc/cluster.ng /etc/cluster/cluster.conf
The well-formatted XML file and a final message about validation should be printed out when run.
Status Check
Use the clustat command to check the cluster status:
# clustat
Cluster Status for cluster1 @ Fri Jan 17 16:49:45 2014
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
node1 1 Online, Local, rgmanager
node2 2 Online, rgmanager
Service Name Owner (Last) State
------- ---- ----- ------ -----
service:pgsql-svc node1 started
Service Manipulation
Use the clusvcadm command to manipulate the services:
# Restart PostgreSQL in place on the same server
clusvcadm -R pgsql-svc
# Relocate PostgreSQL to a specific node
clusvcadm -r pgsql-svc -m <node name>
# Disable PostgreSQL
clusvcadm -d pgsql-svc
# Enable PostgreSQL
clusvcadm -e pgsql-svc
# Freeze PostgreSQL on the current node
clusvcadm -Z pgsql-svc
# Unfreeze PostgreSQL after it was frozen
clusvcadm -U pgsql-svc
Configuration Examples
Standard LVM and PgSQL Initscript
This example uses a single standard LVM mount from SAN (as opposed to HA-LVM) and a normal initscript to start the service. An IP for a secondary backup network is included as well as the Dell DRAC fencing devices on the same VLAN.
/etc/hosts
127.0.0.1 localhost localhost.localdomain
10.11.12.10 pgdb1.example.com pgdb1
10.11.12.11 pgdb2.example.com pgdb2
10.11.12.20 pgdb1-drac
10.11.12.21 pgdb2-drac
/etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="18" name="pgdbclus1">
<cman expected_votes="1" two_node="1"/>
<fence_daemon post_fail_delay="5" post_join_delay="15"/>
<clusternodes>
<clusternode name="pgdb1" nodeid="1">
<fence>
<method name="drac">
<device name="pgdb1-drac"/>
</method>
</fence>
</clusternode>
<clusternode name="pgdb2" nodeid="2">
<fence>
<method name="drac">
<device name="pgdb2-drac"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice agent="fence_drac5" cmd_prompt="admin1" ipaddr="10.11.12.20" login="root" module_name="pgdb1" name="pgdb1-drac" passwd="calvin"/>
<fencedevice agent="fence_drac5" cmd_prompt="admin1" ipaddr="10.11.12.21" login="root" module_name="pgdb2" name="pgdb2-drac" passwd="calvin"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="pgsql-fd" nofailback="1" restricted="1">
<failoverdomainnode name="pgdb1"/>
<failoverdomainnode name="pgdb2"/>
</failoverdomain>
</failoverdomains>
<resources>
<ip address="10.11.12.25" monitor_link="1" sleeptime="10"/>
<ip address="10.9.8.7" monitor_link="0" sleeptime="5"/>
<fs device="/dev/vgsan00/lvdata00" fsid="64301" fstype="ext4" mountpoint="/var/lib/pgsql/" name="pgsql-fs" options="noatime"/>
<script file="/etc/init.d/postgresql-9.3" name="pgsql-srv"/>
</resources>
<service domain="pgsql-fd" name="pgsql-svc" recovery="relocate">
<ip ref="10.11.12.25">
<fs ref="pgsql-fs">
<script ref="pgsql-srv"/>
</fs>
</ip>
<ip ref="10.9.8.7"/>
</service>
</rm>
</cluster>
HA-LVM and MySQL Object
This example uses a single HA-LVM mount from SAN (activated with exclusive locks) and a RHCS-provided service object to start MySQL. An IP for a secondary backup network is included as well as the Dell DRAC fencing devices on the same VLAN.
/etc/hosts
127.0.0.1 localhost localhost.localdomain
10.11.12.10 mydb1.example.com mydb1
10.11.12.11 mydb2.example.com mydb2
10.11.12.20 mydb1-drac
10.11.12.21 mydb2-drac
/etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="18" name="mydbclus1">
<cman expected_votes="1" two_node="1"/>
<fence_daemon post_fail_delay="5" post_join_delay="15"/>
<clusternodes>
<clusternode name="mydb1" nodeid="1">
<fence>
<method name="drac">
<device name="mydb1-drac"/>
</method>
</fence>
</clusternode>
<clusternode name="mydb2" nodeid="2">
<fence>
<method name="drac">
<device name="mydb2-drac"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice agent="fence_drac5" cmd_prompt="admin1" ipaddr="10.11.12.20" login="root" module_name="mydb1" name="mydb1-drac" passwd="calvin"/>
<fencedevice agent="fence_drac5" cmd_prompt="admin1" ipaddr="10.11.12.21" login="root" module_name="mydb2" name="mydb2-drac" passwd="calvin"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="mysql-fd" nofailback="1" restricted="1">
<failoverdomainnode name="mydb1"/>
<failoverdomainnode name="mydb2"/>
</failoverdomain>
</failoverdomains>
<resources>
<ip address="10.11.12.25" monitor_link="1" sleeptime="10"/>
<ip address="10.9.8.7" monitor_link="0" sleeptime="5"/>
<lvm lv_name="data00" name="mysql-lv" vg_name="vgsan00"/>
<fs device="/dev/vgsan00/lvdata00" force_fsck="0" force_unmount="0" fsid="64301" fstype="ext4" mountpoint="/var/lib/mysql/" name="mysql-fs" options="noatime" self_fence="0"/>
<mysql config_file="/etc/my.cnf" listen_address="10.11.12.25" mysqld_options="" name="mysql" shutdown_wait="600"/>
</resources>
<service domain="mysql-fd" name="mysql-svc" recovery="relocate">
<ip ref="10.11.12.25"/>
<lvm ref="mysql-lv"/>
<fs ref="mysql-fs"/>
<mysql ref="mysql"/>
<ip ref="10.9.8.7"/>
</service>
</rm>
</cluster>
Standard LVM, MySQL script and NFS
This example uses a two standard LVM mounts from DAS, a MySQL initscript and RHCS-provided NFS objects to run two services. An IP for a secondary backup network is included for each, as well as the Dell DRAC fencing devices on the same VLAN. Notice that we're also adding preferred priority (where the service will run) to ask the cluster to self-balance from a cold-start, each node will run one service.
/etc/hosts
127.0.0.1 localhost localhost.localdomain
10.11.12.10 node1.example.com node1
10.11.12.11 node2.example.com node2
10.11.12.20 node1-drac
10.11.12.21 node2-drac
/etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="18" name="cluster1">
<cman expected_votes="1" two_node="1"/>
<fence_daemon post_fail_delay="5" post_join_delay="15"/>
<clusternodes>
<clusternode name="node1" nodeid="1" votes="1">
<fence>
<method name="drac">
<device modulename="" name="node1-drac"/>
</method>
</fence>
</clusternode>
<clusternode name="node2" nodeid="2" votes="1">
<fence>
<method name="drac">
<device modulename="" name="node2-drac"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice agent="fence_drac5" cmd_prompt="admin1" ipaddr="10.11.12.20" login="root" name="node1-drac" passwd="calvin"/>
<fencedevice agent="fence_drac5" cmd_prompt="admin1" ipaddr="10.11.12.21" login="root" name="node2-drac" passwd="calvin"/>
</fencedevices>
<rm log_facility="local4" log_level="7">
<failoverdomains>
<failoverdomain name="mysql-fd" ordered="1" restricted="1">
<failoverdomainnode name="node1" priority="1"/>
<failoverdomainnode name="node2" priority="2"/>
</failoverdomain>
<failoverdomain name="nfs-fd" ordered="1" restricted="1">
<failoverdomainnode name="node1" priority="2"/>
<failoverdomainnode name="node2" priority="1"/>
</failoverdomain>
</failoverdomains>
<resources>
<ip address="10.11.12.25" monitor_link="1" sleeptime="10"/>
<ip address="10.11.12.26" monitor_link="1" sleeptime="10"/>
<ip address="10.9.8.7" monitor_link="0" sleeptime="5"/>
<ip address="10.9.8.6" monitor_link="0" sleeptime="5"/>
<fs device="/dev/vgdas00/mysql00" force_fsck="0" force_unmount="0" fsid="14404" fstype="ext3" mountpoint="/das/mysql-fs" name="mysql-fs" options="noatime" self_fence="0"/>
<fs device="/dev/vgdas01/nfs00" force_fsck="0" force_unmount="1" fsid="31490" fstype="ext3" mountpoint="/das/nfs-fs" name="nfs-fs" options="noatime" self_fence="0"/>
<script file="/etc/init.d/mysqld" name="mysql-script"/>
<nfsexport name="nfs-res"/>
<nfsclient name="nfs-export" options="rw,no_root_squash" path="/das/nfs-fs" target="10.11.12.0/24"/>
</resources>
<service autostart="1" domain="mysql-fd" name="mysql-svc">
<ip ref="10.11.12.25">
<fs ref="mysql-fs">
<script ref="mysql-script"/>
</fs>
</ip>
<ip ref="10.9.8.7"/>
</service>
<service autostart="1" domain="nfs-fd" name="nfs-svc" nfslock="1">
<ip ref="10.11.12.26">
<fs ref="nfs-fs">
<nfsexport ref="nfs-res">
<nfsclient ref="nfs-export"/>
</nfsexport>
</fs>
</ip>
<ip ref="10.9.8.6"/>
</service>
</rm>
</cluster>
HA-LVM and NFS Object
This example uses a single HA-LVM mount from SAN (activated with exclusive locks) and a RHCS-provided service object to start NFS. An IP for a secondary backup network is included as well as the Dell DRAC fencing devices on the same VLAN.
/etc/hosts
127.0.0.1 localhost localhost.localdomain
10.11.12.10 nfs1.example.com nfs1
10.11.12.11 nfs2.example.com nfs2
10.11.12.20 nfs1-drac
10.11.12.21 nfs2-drac
/etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="18" name="nfsclus1">
<cman expected_votes="1" two_node="1"/>
<fence_daemon post_fail_delay="5" post_join_delay="15"/>
<clusternodes>
<clusternode name="nfs1" nodeid="1">
<fence>
<method name="drac">
<device name="nfs1-drac"/>
</method>
</fence>
</clusternode>
<clusternode name="nfs2" nodeid="2">
<fence>
<method name="drac">
<device name="nfs2-drac"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice agent="fence_drac5" cmd_prompt="admin1" ipaddr="10.11.12.20" login="root" module_name="nfs1" name="nfs1-drac" passwd="calvin"/>
<fencedevice agent="fence_drac5" cmd_prompt="admin1" ipaddr="10.11.12.21" login="root" module_name="nfs2" name="nfs2-drac" passwd="calvin"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="nfs-fd" nofailback="1" restricted="1">
<failoverdomainnode name="nfs1"/>
<failoverdomainnode name="nfs2"/>
</failoverdomain>
</failoverdomains>
<resources>
<ip address="10.11.12.25" monitor_link="1" sleeptime="10"/>
<ip address="10.9.8.7" monitor_link="0" sleeptime="5"/>
<lvm lv_name="data00" name="nfs-lv" vg_name="vgsan00"/>
<fs device="/dev/vgsan00/lvdata00" force_fsck="0" force_unmount="0" fsid="64301" fstype="ext4" mountpoint="/san/nfs-fs" name="nfs-fs" options="noatime" self_fence="0"/>
<nfsserver name="nfs-srv" nfspath=".clumanager/nfs"/>
<nfsclient allow_recover="on" name="nfsclient1" options="rw,no_root_squash,no_subtree_check" target="10.11.12.0/24"/>
</resources>
<service domain="nfs-fd" name="nfs-svc" recovery="relocate">
<lvm ref="nfs-lv"/>
<fs ref="nfs-fs">
<nfsserver ref="nfs-srv">
<ip ref="10.11.12.25"/>
<ip ref="10.9.8.7"/>
<nfsclient ref="nfsclient1"/>
</nfsserver>
</fs>
</service>
</rm>
</cluster>
References
- https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Cluster_Administration/index.html
- http://www.sourceware.org/cluster/conga/
- https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Logical_Volume_Manager_Administration/LVM_Cluster_Overview.html
- http://en.wikipedia.org/wiki/Fencing_%28computing%29
- http://en.wikipedia.org/wiki/Distributed_lock_manager
- http://en.wikipedia.org/wiki/STONITH
- http://en.wikipedia.org/wiki/Network_block_device