7.6 KiB
Red Hat Module Deprecation Methodology
Overview
With major new releases of RHEL, Red Hat decides to use a specific kernel version for the lifetime of that major release. The kernel source for RHEL kernels is therefore curated by Red Hat, upstream fixes and changes are cherry-picked then backported in-house by Red Hat kernel developers.
Part of the Red Hat process also chooses which hardware they wish to deprecate from functioning in the new release to meet their commercial Supported (SLA) stance, often disabling older hardware kernel modules ("drivers") to prevent the hardware from functioning at all. This article describes the kernel module deprecation techniques used by Red Hat.
This article is using the CentOS kernel source for ease of use and learning; the CentOS kernel is a recompilation of Red Hat sources and is identical for the purposes herein.
Method 1 - Entire Module Disabled
If the hardware they wish to deprecate is wholly contained within one kernel module, Red Hat uses the standard kernel compilation techniques to not compile that module when the RPM package is built. The resulting compiled kernel simply lacks those kernel modules from existing, which prevents the hardware from being recognized and initialized when the kernel boots.
Example: EL8 Brocade Fiber Adapter
The BR-1860-2p PCI device is serviced by the bfa.ko (Brocade Fiber Adapter) and bna.ko (Brocade Network Adapter) kernel modules; the hardware can operate in several modes (often called a "converged" adapter). Red Hat has disabled these modules from being compiled into their kernel RPM package.
Runtime Observation
As the kernel RPM includes a copy of the configuration used to build it, the disable of the module can be observed without looking at the source. We look for the configuration settings required to build the modules, then the filesystem of the installed package. This specific set of modules (bfa, bna) are shipped in the kernel-modules child RPM new to EL8 design. On an installed EL8 system:
$ egrep "(BFA|BNA)" /boot/config-$(uname -r)
# CONFIG_SCSI_BFA_FC is not set
# CONFIG_BNA is not set
$ find /lib/modules/$(uname -r) -name \*bfa\*
(no output)
$ find /lib/modules/$(uname -r) -name \*bna\*
(no output)
Source Code
The kernel source RPM contains a build configuration for each architecture, we must look at the x86_64 configuration file specifically.
$ curl -O http://vault.centos.org/8.0.1905/BaseOS/Source/SPackages/kernel-4.18.0-80.7.1.el8_0.src.rpm
$ bsdtar xf kernel-4.18.0-80.7.1.el8_0.src.rpm
$ bsdtar xf linux-4.18.0-80.7.1.el8_0.tar.xz
$ cd linux-4.18.0-80.7.1.el8_0/
$ egrep "(BFA|BNA)" kernel-x86_64.config
# CONFIG_BNA is not set
# CONFIG_SCSI_BFA_FC is not set
Method 2 - Shared Module Filtering
If the PCI hardware they wish to disable is part of a shared module, Red Hat has a technique which filters the device by it's PCI ID; when the module discovers the device during initialization the code re-routes to a shared custom function which emits an error message on screen, then returns "NULL" to the kernel module and does not initialize that specific PCI device, while allowing others in the same code to function normally.
Example: EL8 PERC 6/i RAID Controller (Dell R710)
The Dell R710 with the basic PERC 6/i RAID controller PCI hardware device is serviced by the megaraid_sas.ko kernel module which also services many other types of RAID controllers based on the same technology; internally to the Linux kernel, this is known as the LSI SAS1078R device.
Red Hat has disabled this PCI device by it's PCI ID within the shared module using a custom function injected into the kernel for PCI devices and must be observed in the source code. In technical terms, this PERC 6/i RAID controller PCI identification is as follows:
- System: 1000:0060
- Subsystem: 1028:1f0c
- https://pci-ids.ucw.cz/read/PC/1000/0060/10281f0c
Start by unpacking the kernel source code:
$ curl -O http://vault.centos.org/8.0.1905/BaseOS/Source/SPackages/kernel-4.18.0-80.7.1.el8_0.src.rpm
$ bsdtar xf kernel-4.18.0-80.7.1.el8_0.src.rpm
$ bsdtar xf linux-4.18.0-80.7.1.el8_0.tar.xz
$ cd linux-4.18.0-80.7.1.el8_0/
Specific Module Source
Working backwards from the specific kernel module, we identify it by name within the source, then identify how it's re-routed to the common function. The identification of the device is declared in the shared header file megaraid_sas.h using a standard C define:
drivers/scsi/megaraid/megaraid_sas.h:
#define PCI_DEVICE_ID_LSI_SAS1078R 0x0060
Given this definition, we can observe that it is now inserted to a custom structure in the source code, aptly named megasas_pci_ids_removed:
drivers/scsi/megaraid/megaraid_sas_base.c:
static struct pci_device_id megasas_pci_ids_removed[] = {
...
{PCI_DEVICE(PCI_VENDOR_ID_LSI_LOGIC, PCI_DEVICE_ID_LSI_SAS1078R)},
...
}
We then observe in the same source code how the above two items are used together:
drivers/scsi/megaraid/megaraid_sas_base.c:
if (pci_device_support_removed(megasas_pci_table,
megasas_pci_ids_removed, pdev))
return -ENODEV
As shown, this routine calls a shared function, pci_device_support_removed, and passes it the structure of disabled devices and the ID of the current device it's scanning.
Shared Module Disablement
The pci_device_support_removed function is defined in the common PCI source code, and includes a very descriptive header:
drivers/pci/pci-driver.c:
/**
* pci_device_support_removed - Tell if a PCI device support is removed
* @ids: array of PCI device id structures to search in
* @dev: the PCI device structure to match against
*
* Used by a driver to check whether this device is in its list of removed
* devices. Returns the matching pci_device_id structure or %NULL if there is
* no match.
*
* Reserved for Internal Red Hat use only.
*/
const struct pci_device_id *pci_device_support_removed(
...
)
The actual internals of this function simply run the comparison of the inputs, and either returns a positive or negative match. However, it also handles the on-screen output of telling the user this device is disabled and also informing the kernel to remove this PCI ID from it's internal structure! The important bit about this last part:
ret = pci_match_id(removed_ids, dev);
if (ret) {
snprintf(devinfo, sizeof(devinfo), "%s %s [%04x:%04x]",
dev_driver_string(&dev->dev), dev_name(&dev->dev),
dev->vendor, dev->device);
mark_hardware_removed(devinfo);
}
This effectively handles the deprecation in two ways; it returns to the child module the PCI ID matched device which is deprecated (hooking into the above code which then returns -ENODEV), and it instructs the kernel itself to remove this PCI ID from the data structures held internally. It's as if it doesn't exist to the kernel at this stage.