tengel/papyri

Fork 0

tengel e8fb7b288e initial import

2024-03-20 11:40:22 -05:00

18 KiB

Raw Permalink Blame History

RHCS Mechanics

Acronyms
Configuration Files
Filesystem Locations
Operational Commands
Cluster Components
Operational Examples
Configuration Examples
References

Acronyms

AIS: Application Interface Specification
AMF: Availability Management Framework
CCS: Cluster Configuration System
CLM: Cluster Membership
CLVM: Cluster Logical Volume Manager
CMAN: Cluster Manager
DLM: Distributed Lock Manager
GFS2: Global File System 2
GNDB: Global Network Block Device
STONITH: Shoot The Other Node In The Head
TOTEM: Group communication algorithm for reliable group messaging among cluster members

Configuration Files

/etc/cluster/cluster.conf - The main cluster configuration file
/etc/lvm/lvm.conf - The LVM configuration file - typically locking_type and a filter are being configured here

Filesystem Locations

/usr/share/cluster/ - The main directory of code used for cluster objects |
/var/log/cluster/ - The main logging directory (RHEL6)

Operational Commands

Graphical Cluster Configuration

luci - Cluster Management Web Interface primarily used with RHEL6
system-config-cluster - Cluster Management X11/Motif Interface primarily used with RHEL5

RGManager - Resource Group Manager

clustat - Command used to display the status of the cluster, including node membership and services running
clusvcadm - Command used to manually enable, disable, relocate, and restart user services in a cluster
rg_test - Debug and test services and resource ordering

CCS - Cluster Configuration System

ccs_config_validate - Verify a configuration; can validate the running config or a named file (RHEL6)
ccs_config_dump - Tool to generate XML output of running configuration (RHEL6)
ccs_sync - Synchronize the cluster configuration file to one or more machines in a cluster (RHEL6)
ccs_update_schema - Update the cluster relaxng schema that validates cluster.conf (RHEL6)
ccs_test - Diagnostic and testing command used to retrieve information from configuration files via ccsd
ccs_tool - Used to make online updates of CCS configuration files - considered obsolete

CMAN - Cluster Manager

cman_tool - The administrative front end to CMAN, starts and stops CMAN infrastructure and can perform changes
group_tool - Used to get a list of groups related to fencing, DLM, GFS, and getting debug information
fence_XXXX - Fence agent for XXXX type of device- for example fence_drac (Dell DRAC), fence_ipmilan (IPMI) and fence_ilo (HP iLO)
fence_check - Test the fence configuration for each node in the cluster
fence_node - A program which performs I/O fencing on a single node
fence_tool - A program to join and leave the fence domain
dlm_tool - Utility for the dlm and dlm_controld daemon
gfs_control - Utility for the gfs_controld daemon

GFS2 - Global File System 2

mkfs.gfs2 - Creates a GFS2 file system on a storage device
mount.gfs2 - Mount a GFS2 file system; normally not used by the user directly
fsck.gfs2 - Repair an unmounted GFS2 file system
gfs2_grow - Grows a mounted GFS2 file system
gfs2_jadd - Adds journals to a mounted GFS2 file system
gfs2_quota - Manage quotas on a mounted GFS2 file system
gfs2_tool - Configures, tunes and gather information on a GFS2 file system

Quorum Disk

mkqdisk - Cluster Quorum Disk Utility

Cluster Components

RGManager - Resource Group Manager

rgmanager - Daemon used to handle user service requests including service start, service disable, service relocate, and service restart; RHEL6
clurgmgrd - Daemon used to handle user service requests including service start, service disable, service relocate, and service restart; RHEL5
cpglockd - Utilizes the extended virtual synchrony features of Corosync to implement a simplistic, distributed lock server for rgmanager

CLVM - Cluster Logical Volume Manager

clvmd - The daemon that distributes LVM metadata updates around a cluster. Requires cman to be running first

CCS - Cluster Configuration System

ricci - CCS daemon running on all cluster nodes and provides configuration file data to cluster software; RHEL6
ccsd - CCS daemon running on all cluster nodes and provides configuration file data to cluster software; RHEL5

CMAN - Cluster Manager

cman - Cluster initscript used to start/stop all the CMAN daemons
corosync - Corosync cluster communications infrastructure daemon using TOTEM; RHEL6
aisexec - OpenAIS cluster communications infrastructure daemon using TOTEM; RHEL5
fenced - Fences cluster nodes that have failed (fencing generally means rebooting)
dlm_controld - Daemon that configures dlm according to cluster events
gfs_controld - Daemon that coordinates GFS mounts and recovery
groupd - Compatibility daemon for fenced, dlm_controld and gfs_controld
qdiskd - Talks to CMAN and provides a mechanism for determining node-fitness in a cluster environment
cmannotifyd - Talks to CMAN and provides a mechanism to notify external entities about cluster changes

Operational Examples

The man pages for clustat and clusvcadm contain more in-depth explanations of all the shown options; more options exist than are shown here.

Configuration Validation

As RHEL5 does not have the ccs_config_validate utility, an alternate method is possible to perform XML validation against the cluster schema instead:

xmllint --relaxng /usr/share/system-config-cluster/misc/cluster.ng /etc/cluster/cluster.conf

The well-formatted XML file and a final message about validation should be printed out when run.

Status Check

Use the clustat command to check the cluster status:

# clustat
Cluster Status for cluster1 @ Fri Jan 17 16:49:45 2014
Member Status: Quorate

 Member Name                    ID   Status
 ------ ----                    ---- ------
 node1                          1    Online, Local, rgmanager
 node2                          2    Online, rgmanager

 Service Name                   Owner (Last)             State
 ------- ----                   ----- ------             -----
 service:pgsql-svc              node1                    started

Service Manipulation

Use the clusvcadm command to manipulate the services:

# Restart PostgreSQL in place on the same server
clusvcadm -R pgsql-svc

# Relocate PostgreSQL to a specific node
clusvcadm -r pgsql-svc -m <node name>

# Disable PostgreSQL
clusvcadm -d pgsql-svc

# Enable PostgreSQL
clusvcadm -e pgsql-svc

# Freeze PostgreSQL on the current node
clusvcadm -Z pgsql-svc

# Unfreeze PostgreSQL after it was frozen
clusvcadm -U pgsql-svc

Configuration Examples

Standard LVM and PgSQL Initscript

This example uses a single standard LVM mount from SAN (as opposed to HA-LVM) and a normal initscript to start the service. An IP for a secondary backup network is included as well as the Dell DRAC fencing devices on the same VLAN.

/etc/hosts

127.0.0.1    localhost localhost.localdomain
10.11.12.10  pgdb1.example.com pgdb1
10.11.12.11  pgdb2.example.com pgdb2
10.11.12.20  pgdb1-drac
10.11.12.21  pgdb2-drac

/etc/cluster/cluster.conf

<?xml version="1.0"?>
<cluster config_version="18" name="pgdbclus1">
  <cman expected_votes="1" two_node="1"/>
  <fence_daemon post_fail_delay="5" post_join_delay="15"/>
  <clusternodes>
    <clusternode name="pgdb1" nodeid="1">
      <fence>
        <method name="drac">
          <device name="pgdb1-drac"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="pgdb2" nodeid="2">
      <fence>
        <method name="drac">
          <device name="pgdb2-drac"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <fencedevices>
    <fencedevice agent="fence_drac5" cmd_prompt="admin1" ipaddr="10.11.12.20" login="root" module_name="pgdb1" name="pgdb1-drac" passwd="calvin"/>
    <fencedevice agent="fence_drac5" cmd_prompt="admin1" ipaddr="10.11.12.21" login="root" module_name="pgdb2" name="pgdb2-drac" passwd="calvin"/>
  </fencedevices>
  <rm>
    <failoverdomains>
      <failoverdomain name="pgsql-fd" nofailback="1" restricted="1">
        <failoverdomainnode name="pgdb1"/>
        <failoverdomainnode name="pgdb2"/>
      </failoverdomain>
    </failoverdomains>
    <resources>
      <ip address="10.11.12.25" monitor_link="1" sleeptime="10"/>
      <ip address="10.9.8.7" monitor_link="0" sleeptime="5"/>
      <fs device="/dev/vgsan00/lvdata00" fsid="64301" fstype="ext4" mountpoint="/var/lib/pgsql/" name="pgsql-fs" options="noatime"/>
      <script file="/etc/init.d/postgresql-9.3" name="pgsql-srv"/>
    </resources>
    <service domain="pgsql-fd" name="pgsql-svc" recovery="relocate">
      <ip ref="10.11.12.25">
        <fs ref="pgsql-fs">
          <script ref="pgsql-srv"/>
        </fs>
      </ip>
      <ip ref="10.9.8.7"/>
    </service>
  </rm>
</cluster>

HA-LVM and MySQL Object

This example uses a single HA-LVM mount from SAN (activated with exclusive locks) and a RHCS-provided service object to start MySQL. An IP for a secondary backup network is included as well as the Dell DRAC fencing devices on the same VLAN.

/etc/hosts

127.0.0.1    localhost localhost.localdomain
10.11.12.10  mydb1.example.com mydb1
10.11.12.11  mydb2.example.com mydb2
10.11.12.20  mydb1-drac
10.11.12.21  mydb2-drac

/etc/cluster/cluster.conf

<?xml version="1.0"?>
<cluster config_version="18" name="mydbclus1">
  <cman expected_votes="1" two_node="1"/>
  <fence_daemon post_fail_delay="5" post_join_delay="15"/>
  <clusternodes>
    <clusternode name="mydb1" nodeid="1">
      <fence>
        <method name="drac">
          <device name="mydb1-drac"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="mydb2" nodeid="2">
      <fence>
        <method name="drac">
          <device name="mydb2-drac"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <fencedevices>
    <fencedevice agent="fence_drac5" cmd_prompt="admin1" ipaddr="10.11.12.20" login="root" module_name="mydb1" name="mydb1-drac" passwd="calvin"/>
    <fencedevice agent="fence_drac5" cmd_prompt="admin1" ipaddr="10.11.12.21" login="root" module_name="mydb2" name="mydb2-drac" passwd="calvin"/>
  </fencedevices>
  <rm>
    <failoverdomains>
      <failoverdomain name="mysql-fd" nofailback="1" restricted="1">
        <failoverdomainnode name="mydb1"/>
        <failoverdomainnode name="mydb2"/>
      </failoverdomain>
    </failoverdomains>
    <resources>
      <ip address="10.11.12.25" monitor_link="1" sleeptime="10"/>
      <ip address="10.9.8.7" monitor_link="0" sleeptime="5"/>
      <lvm lv_name="data00" name="mysql-lv" vg_name="vgsan00"/>
      <fs device="/dev/vgsan00/lvdata00" force_fsck="0" force_unmount="0" fsid="64301" fstype="ext4" mountpoint="/var/lib/mysql/" name="mysql-fs" options="noatime" self_fence="0"/>
      <mysql config_file="/etc/my.cnf" listen_address="10.11.12.25" mysqld_options="" name="mysql" shutdown_wait="600"/>
    </resources>
    <service domain="mysql-fd" name="mysql-svc" recovery="relocate">
      <ip ref="10.11.12.25"/>
      <lvm ref="mysql-lv"/>
      <fs ref="mysql-fs"/>
      <mysql ref="mysql"/>
      <ip ref="10.9.8.7"/>
    </service>
  </rm>
</cluster>

Standard LVM, MySQL script and NFS

This example uses a two standard LVM mounts from DAS, a MySQL initscript and RHCS-provided NFS objects to run two services. An IP for a secondary backup network is included for each, as well as the Dell DRAC fencing devices on the same VLAN. Notice that we're also adding preferred priority (where the service will run) to ask the cluster to self-balance from a cold-start, each node will run one service.

/etc/hosts

127.0.0.1    localhost localhost.localdomain
10.11.12.10  node1.example.com node1
10.11.12.11  node2.example.com node2
10.11.12.20  node1-drac
10.11.12.21  node2-drac

/etc/cluster/cluster.conf

<?xml version="1.0"?>
<cluster config_version="18" name="cluster1">
  <cman expected_votes="1" two_node="1"/>
  <fence_daemon post_fail_delay="5" post_join_delay="15"/>
  <clusternodes>
    <clusternode name="node1" nodeid="1" votes="1">
      <fence>
        <method name="drac">
          <device modulename="" name="node1-drac"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="node2" nodeid="2" votes="1">
      <fence>
        <method name="drac">
          <device modulename="" name="node2-drac"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <fencedevices>
    <fencedevice agent="fence_drac5" cmd_prompt="admin1" ipaddr="10.11.12.20" login="root" name="node1-drac" passwd="calvin"/>
    <fencedevice agent="fence_drac5" cmd_prompt="admin1" ipaddr="10.11.12.21" login="root" name="node2-drac" passwd="calvin"/>
  </fencedevices>
  <rm log_facility="local4" log_level="7">
    <failoverdomains>
      <failoverdomain name="mysql-fd" ordered="1" restricted="1">
        <failoverdomainnode name="node1" priority="1"/>
        <failoverdomainnode name="node2" priority="2"/>
      </failoverdomain>
      <failoverdomain name="nfs-fd" ordered="1" restricted="1">
        <failoverdomainnode name="node1" priority="2"/>
        <failoverdomainnode name="node2" priority="1"/>
      </failoverdomain>
    </failoverdomains>
    <resources>
      <ip address="10.11.12.25" monitor_link="1" sleeptime="10"/>
      <ip address="10.11.12.26" monitor_link="1" sleeptime="10"/>
      <ip address="10.9.8.7" monitor_link="0" sleeptime="5"/>
      <ip address="10.9.8.6" monitor_link="0" sleeptime="5"/>
      <fs device="/dev/vgdas00/mysql00" force_fsck="0" force_unmount="0" fsid="14404" fstype="ext3" mountpoint="/das/mysql-fs" name="mysql-fs" options="noatime" self_fence="0"/>
      <fs device="/dev/vgdas01/nfs00" force_fsck="0" force_unmount="1" fsid="31490" fstype="ext3" mountpoint="/das/nfs-fs" name="nfs-fs" options="noatime" self_fence="0"/>
      <script file="/etc/init.d/mysqld" name="mysql-script"/>
      <nfsexport name="nfs-res"/>
      <nfsclient name="nfs-export" options="rw,no_root_squash" path="/das/nfs-fs" target="10.11.12.0/24"/>
    </resources>
    <service autostart="1" domain="mysql-fd" name="mysql-svc">
      <ip ref="10.11.12.25">
        <fs ref="mysql-fs">
          <script ref="mysql-script"/>
        </fs>
      </ip>
      <ip ref="10.9.8.7"/>
    </service>
    <service autostart="1" domain="nfs-fd" name="nfs-svc" nfslock="1">
      <ip ref="10.11.12.26">
        <fs ref="nfs-fs">
          <nfsexport ref="nfs-res">
            <nfsclient ref="nfs-export"/>
          </nfsexport>
        </fs>
      </ip>
      <ip ref="10.9.8.6"/>
    </service>
  </rm>
</cluster>

HA-LVM and NFS Object

This example uses a single HA-LVM mount from SAN (activated with exclusive locks) and a RHCS-provided service object to start NFS. An IP for a secondary backup network is included as well as the Dell DRAC fencing devices on the same VLAN.

/etc/hosts

127.0.0.1    localhost localhost.localdomain
10.11.12.10  nfs1.example.com nfs1
10.11.12.11  nfs2.example.com nfs2
10.11.12.20  nfs1-drac
10.11.12.21  nfs2-drac

/etc/cluster/cluster.conf

<?xml version="1.0"?>
<cluster config_version="18" name="nfsclus1">
  <cman expected_votes="1" two_node="1"/>
  <fence_daemon post_fail_delay="5" post_join_delay="15"/>
  <clusternodes>
    <clusternode name="nfs1" nodeid="1">
      <fence>
        <method name="drac">
          <device name="nfs1-drac"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="nfs2" nodeid="2">
      <fence>
        <method name="drac">
          <device name="nfs2-drac"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <fencedevices>
    <fencedevice agent="fence_drac5" cmd_prompt="admin1" ipaddr="10.11.12.20" login="root" module_name="nfs1" name="nfs1-drac" passwd="calvin"/>
    <fencedevice agent="fence_drac5" cmd_prompt="admin1" ipaddr="10.11.12.21" login="root" module_name="nfs2" name="nfs2-drac" passwd="calvin"/>
  </fencedevices>
  <rm>
    <failoverdomains>
      <failoverdomain name="nfs-fd" nofailback="1" restricted="1">
        <failoverdomainnode name="nfs1"/>
        <failoverdomainnode name="nfs2"/>
      </failoverdomain>
    </failoverdomains>
    <resources>
      <ip address="10.11.12.25" monitor_link="1" sleeptime="10"/>
      <ip address="10.9.8.7" monitor_link="0" sleeptime="5"/>
      <lvm lv_name="data00" name="nfs-lv" vg_name="vgsan00"/>
      <fs device="/dev/vgsan00/lvdata00" force_fsck="0" force_unmount="0" fsid="64301" fstype="ext4" mountpoint="/san/nfs-fs" name="nfs-fs" options="noatime" self_fence="0"/>
      <nfsserver name="nfs-srv" nfspath=".clumanager/nfs"/>
      <nfsclient allow_recover="on" name="nfsclient1" options="rw,no_root_squash,no_subtree_check" target="10.11.12.0/24"/>
    </resources>
    <service domain="nfs-fd" name="nfs-svc" recovery="relocate">
      <lvm ref="nfs-lv"/>
      <fs ref="nfs-fs">
        <nfsserver ref="nfs-srv">
          <ip ref="10.11.12.25"/>
          <ip ref="10.9.8.7"/>
          <nfsclient ref="nfsclient1"/>
        </nfsserver>
      </fs>
    </service>
  </rm>
</cluster>

18 KiB Raw Permalink Blame History