papyri/md/kernel_module_weak_updates.md
2024-03-20 11:40:22 -05:00

318 lines
16 KiB
Markdown

# Kernel Module Weak Updates
## Contents
- [Overview](#overview)
- [Module Loading](#module-loading)
- [Functional Examples](#functional-examples)
- [Example: bfa](#example-bfa)
- [Example: lpfc](#example-lpfc)
- [Compatible Modules](#compatible-modules)
- [Kernel Symbols](#kernel-symbols)
- [Module Symbols](#module-symbols)
- [Methodology](#methodology)
- [Dependencies](#dependencies)
- [Weak Modules](#weak-modules)
- [Incompatible Example](#incompatible-example)
- [Caveats](#caveats)
- [References](#references)
## Overview
The Red Hat oriented Linux kernel architecture has a method for 3rd party entities to provide a kernel module for an entire family of kernel releases, based on the fundamental understanding the kernel's entry tables and module interface does not change within that family. This document goes over the basic design behind the solution.
The use of this methodology is popular amongst 3rd party vendors who provide a pre-compiled kernel module for their hardware and allow that same binary module to work for a number of compatible kernels. Shipping a new binary module for each and every Red Hat kernel release is therefore not required, reducing the complexity of producing the module and it's runtime maintenance on a server.
In common usage, these types of modules are delivered in packages named `kmod-<foo>`, where `foo>` is the name of the existing stock kernel module as shipped by the distribution. The overall compatibility is referred to as **kABI** or _Kernel Application Binary Interface_.
## Module Loading
Key to understanding the method is how the kernel will look for the modules to load. It varies vendor by vendor depending on the distribution, but generally speaking the modules are looked for in this order:
- `/lib/modules/(kernel-version)/updates`
- manually controlled area for use by sysadmins to insert a module by hand and override everything
- `/lib/modules/(kernel-version)/extra`
- override everything shipped with the kernel and weak-updates (see below)
- `/lib/modules/(kernel-version)/*`
- stock kernel modules (usually in a subdirectory kernel) and other named directories; a vendor may choose to have a top-level directory here, such as the EMC PowerPath software using `/lib/modules/(kernel-version)/powerpath` as it's standard location
- `/lib/modules/(kernel-version)/weak-updates`
- compatible kernel modules for this kernel, but were actually compiled against another similar kernel in the family
The concept named _weak-updates_ works in tandem with the extra module location; typically the original module is installed in an 'extra' directory named where it was compiled, and a symlink exists in the weak-updates directory from another kernel.
## Functional Examples
Using a Red Hat Enterprise (RHEL) 7 system, we first note that only one kernel is installed:
```
# rpm -qa | grep ^kernel-3
kernel-3.10.0-327.49.1.el7.x86_64
```
In Red Hat's versioning scheme, this is read as two parts of a design:
- 3.10.0-327.\* - the suite or "family" of kernel releases
- .49.1.el7 - the specific patched release of this kernel within the family
This design indicates that any kernel module built for the 3.10.0-327.\* family of kernels _should be_ compatible with any specific kernel in the family; but as it's not possible to 100% guarantee this ahead of time, safety checks exist (more on this below). On this server, we have a kmod kernel module that is replacing one of the stock ones.
### Example: bfa
This package `kmod-bfa` was obtained from a 3rd party provider for the Brocade fiber channel adapters.
```
/lib/modules/3.10.0-327.el7.x86_64/extra/bfa:
-rw-r--r--. 1 root root 23431886 Apr 22 2016 bfa.ko
/lib/modules/3.10.0-327.49.1.el7.x86_64/weak-updates/bfa:
lrwxrwxrwx. 1 root root 51 Feb 14 12:54 bfa.ko -> /lib/modules/3.10.0-327.el7.x86_64/extra/bfa/bfa.ko
```
Notice that the real file is in a directory for a kernel that is not installed; it's located in the "base" kernel for the family (the first one released in the family, in this case RHEL 7.2) `extra/` directory, and from our running kernel has a symlink from it's `weak-updates/` directory back to the module. This module is compatible for weak-updates; it was compiled against kernel 3.10.0-327 but functionally works with kernel 3.10.0-327.49.1 as is, no modifications needed.
### Example: lpfc
This package `kmod-lpfc` is provided in the main RHEL7 software repository by Red Hat, providing newer upstream code for Emulex fiber channel adapters.
```
/lib/modules/3.10.0-327.el7.x86_64/extra/lpfc:
-rw-r--r--. 1 root root 1180268 Sep 5 02:51 lpfc.ko
/lib/modules/3.10.0-327.49.1.el7.x86_64/weak-updates/lpfc:
lrwxrwxrwx. 1 root root 53 Feb 16 15:09 lpfc.ko -> /lib/modules/3.10.0-327.el7.x86_64/extra/lpfc/lpfc.ko
```
The design is exactly that of the previous example; this module is compatible for weak-updates; it was compiled against kernel 3.10.0-327 but functionally works with kernel 3.10.0-327.49.1 as is, no modifications needed.
## Compatible Modules
Ensuring that a module compiled for one version of a kernel is compatible with another kernel is key to the system working correctly; the topic deals a great deal with compilers, assemblers and linkers which provide the needed data to compare for compatibility. When a binary is compiled it has a _symbol table_ which basically indicates the structural location of all usable functions; this is both the kernel itself, and any modules trying to load themselves into that kernel.
The location address of all kernel functions a module expects to use are embedded in itself, as well as what it exports for others (imagine a module using a module) - the process is at it's simplest asking the target kernel if the map the module knows about has changed or not. If nothing has changed, it's compatible so long as some sort of internal change has not happened that is not visible to the outside world. This is the **kABI** in effect, the module is kernel ABI compatible between several compiled kernels.
### Kernel Symbols
The kernel(s) ship with a pre-exported symbol table stored in the /boot directory next to the kernel:
```
# ls -l /boot/symvers-3.10.0-327.49.1.el7.x86_64.gz
-rw-r--r--. 1 root root 252731 Jan 25 11:37 /boot/symvers-3.10.0-327.49.1.el7.x86_64.gz
# zgrep blk_queue_init_tags /boot/symvers-3.10.0-327.49.1.el7.x86_64.gz
0x00a006aa blk_queue_init_tags vmlinux EXPORT_SYMBOL
```
The output above shows the function `blk_queue_init_tags` is exported for all to use (`EXPORT_SYMBOL`) by the binary `vmlinux` (the kernel) with address `0x00a006aa` in the stack. This is a general function being used for example purposes herein, there are many more in use.
Due to specifically how the kernel operates, the shipped `vmlinuz` file (a compressed, stripped copy of `vmlinux`) typically does not contain the symbols; hence, they are extracted while the kernel package is being compiled and packaged and saved as a separate file for use in userspace. The `symvers` file contains all the symbols of every module as well as just the main kernel itself, making it quite a large set of data. If a symbol is exported by a module the module's name will be located where `vmlinux` is shown above.
Also note that some exports are for GPL compliant module use only; they have `EXPORT_SYMBOL_GPL` type and can only be used by GPL compliant modules.
### Module Symbols
A module is nothing more fancy than a standard library (shared object) designed specifically to work with the kernel. As such, all the normal commands to deal with symbol tables can be used like so with GNU `nm`:
```
# nm /lib/modules/3.10.0-327.el7.x86_64/extra/bfa/bfa.ko | grep blk_queue_init_tags
U blk_queue_init_tags
# nm /lib/modules/3.10.0-327.el7.x86_64/extra/lpfc/lpfc.ko | grep blk_queue_init_tags
U blk_queue_init_tags
```
You'll notice that this information is not super useful as shown; how a binary is assembled is more complex and requires a bit of work to get the data required in a format which makes sense. The `modprobe` tool with a bit of `sed` can be used to reassemble the data in a way that makes more sense for the task at hand, namely comparisons of addresses to names:
```
# modprobe --dump-modversions /lib/modules/3.10.0-327.el7.x86_64/extra/bfa/bfa.ko \
| sed -r -e 's:^(0x[0]*[0-9a-f]{8}\t.*):\1:' | grep blk_queue_init_tags
0x00a006aa blk_queue_init_tags
# modprobe --dump-modversions /lib/modules/3.10.0-327.el7.x86_64/extra/lpfc/lpfc.ko \
| sed -r -e 's:^(0x[0]*[0-9a-f]{8}\t.*):\1:' | grep blk_queue_init_tags
0x00a006aa blk_queue_init_tags
```
This shows us the kernel is exporting the function `blk_queue_init_tags` at `0x00a006aa` and the _weak module_ (compiled for another kernel) is expecting to find this same function at address `0x00a006aa` - this is a compatible function entry point, nothing has changed. From here, all that's left is to ensure each and every function the module uses or exports undergoes the same scrutiny for kABI compatibility.
## Methodology
There are several steps to ensuring a weak-updates kernel module is integrated well with the system and is compatible with a given target kernel. Each kernel is checked on it's own, so it is possible to have one kernel in a family using the kernel module (a symlink exists from it to the older file), or to not be using it (no symlink exists).
### Dependencies
The dependencies first must be taken care of; in the chance the module being inserted as a _weak-update_ is used by another module, the system needs to know about the symbols in the weak-update version as they may have changed (therefore causing a cascaded incompatibility by accident).
The entity shipping the module creates a file in `/etc/depmod.d/` with the override, like so:
```
# cat /etc/depmod.d/bfa.conf
override bfa 3.10.0-* weak-updates/bfa
# cat /etc/depmod.d/lpfc.conf
override lpfc 3.10.0-327.* weak-updates/lpfc
```
This is telling the system to use the `weak-updates/bfa` module version for all kernels in the 3.10.0-\* suite (which is all of RHEL 7 in this example) if it is found for bfa, but for `lpfc` the wildcard is more refined to only work with 3.10.0-327.\* kernels as an alternate example.
The entity shipping the module then runs this command after the module has been added (via RPM post-install, etc.); in this example, the module was compiled for 3.10.0-327 so the `depmod` command is using that version to update the symbols:
```
# depmod -aeF "/boot/System.map-3.10.0-327.el7.x86_64" "3.10.0-327.el7.x86_64"
```
As might be inferred from the above, this updated the `/boot/System.map-3.10.0-327.el7.x86_64` file with all symbols from the new file in the `extra/` directory.
### Weak Modules
The second step is to now create all the compatibility symlinks in the `weak-updates/` subdirectories of all kernels installed on the system which are in fact 100% compatible with this new module. From the outside, it's all built into a script that can just be used by the entity shipping the module (again, in their package post-install):
```
# weak-modules --add-modules /lib/modules/3.10.0-327.el7.x86_64/extra/bfa/bfa.ko
...or:
# weak-modules --add-modules /lib/modules/3.10.0-327.el7.x86_64/extra/lpfc/lpfc.ko
```
> The `weak-updates` script will also rebuild the _initramfs_ files in `/boot` for all the kernels found, inserting the new module setup for stage 1 boot. When installing a kmod package this is the perceived lag, after the RPM has placed the bits down it's updating all initramfs files for kernels it adjusted.
The process inside the script can be broken down into these basic steps:
1. Take the kernel symbols file and massage it into a format that works with `diff` and `join` later (loops for every kernel found):
```
# krel=$(uname -r)
# zcat /boot/symvers-$krel.gz \
| sed -r -ne 's:^(0x[0]*[0-9a-f]{8}\t[0-9a-zA-Z_.]+)\t.*:\1:p' \
> symvers-$krel
```
2. If required (the kernel may not have any), extract and prepare the same information from any `extra/` modules in the target kernel (this will loop for every installed kernel). Notice that we're only extracting data of the installed kernels and if they have something in `extra/` - this file may be zero bytes if none are there:
```
# krel=$(uname -r)
# find /lib/modules/$krel/extra -name '*.ko' \
| xargs nm \
| sed -nre 's:^[0]*([0-9a-f]{8}) A __crc_(.*):0x\1 \2:p' \
> addon-symvers-$krel
```
3. Do the same action as the above, but **specifically for the kernel the module was built against** known as `vermagic` within the module's data:
```
# modinfo -F vermagic bfa lpfc
3.10.0-327.el7.x86_64 SMP mod_unload modversions
3.10.0-327.el7.x86_64 SMP mod_unload modversions
# module_krel=3.10.0-327.el7.x86_64
# find /lib/modules/$module_krel/extra -name '*.ko' \
| xargs nm \
| sed -nre 's:^[0]*([0-9a-f]{8}) A __crc_(.*):0x\1 \2:p' \
> extra-symvers-$module_krel
```
4. Take the data from the above steps and simply combine and sort it for use:
```
# sort -u symvers-$krel \
extra-symvers-$module_krel \
addon-symvers-$krel \
> all-symvers-$krel-$module_krel
```
5. Now extract the data from the new module physically being added to the system and extract it's symbols as well:
```
# module="/lib/modules/3.10.0-327.el7.x86_64/extra/bfa/bfa.ko"
...or:
# module="/lib/modules/3.10.0-327.el7.x86_64/extra/lpfc/lpfc.ko"
# /sbin/modprobe --dump-modversions "$module" \
| sed -r -e 's:^(0x[0]*[0-9a-f]{8}\t.*):\1:' \
| sort -u \
> modvers
```
6. Last, use the `join` command in reverse mode (think `grep -v`) to tell us if any lines from all the known symbols provided does **not** match the symbols the new module is expecting:
```
join -j 1 -v 2 all-symvers-$krel-$module_krel modvers
```
This set of steps tells us if the incoming module being added is identical in symbols to what's actually running an expected on the system; any output from the last step is indicating that something was found that differs in either address or availability and this module is not compatible. No output means it's fully compatible and cane be symlinked to the target kernel safely.
## Incompatible Example
Using the above methodology, we can examine a different kernel module which is incompatible. This specific version of `kmod-bna` has 4 occurrences of incompatible function addresses with a specific (older) kernel that has been installed. Each of the items above is covered in order:
```
The setup:
# rpm -qa | egrep "^(kernel-3|kmod-bna)"
kernel-3.10.0-229.20.1.el7.x86_64
kernel-3.10.0-327.49.1.el7.x86_64
kmod-bna-3.2.7.0-0.el7.x86_64
Step 1:
# krel=3.10.0-229.20.1.el7.x86_64
# zcat /boot/symvers-$krel.gz \
> | sed -r -ne 's:^(0x[0]*[0-9a-f]{8}\t[0-9a-zA-Z_.]+)\t.*:\1:p' \
> > symvers-$krel
Step 2:
# find /lib/modules/$krel/extra -name '*.ko' \
> | xargs nm \
> | sed -nre 's:^[0]*([0-9a-f]{8}) A __crc_(.*):0x\1 \2:p' \
> > addon-symvers-$krel
Step 3:
# modinfo -F vermagic bna | cut -f1 -d' '
3.10.0-327.el7.x86_64
# module_krel=3.10.0-327.el7.x86_64
# find /lib/modules/$module_krel/extra -name '*.ko' \
> | xargs nm \
> | sed -nre 's:^[0]*([0-9a-f]{8}) A __crc_(.*):0x\1 \2:p' \
> > extra-symvers-$module_krel
Step 4:
# sort -u symvers-$krel \
> extra-symvers-$module_krel \
> addon-symvers-$krel \
> > all-symvers-$krel-$module_krel
Step 5:
# module="/lib/modules/3.10.0-327.el7.x86_64/extra/bna/bna.ko"
# /sbin/modprobe --dump-modversions "$module" \
> | sed -r -e 's:^(0x[0]*[0-9a-f]{8}\t.*):\1:' \
> | sort -u \
> > modvers
Step 6:
# join -j 1 -v 2 all-symvers-$krel-$module_krel modvers
0x7efd609f __netif_napi_add
0x905307be napi_complete_done
0xd93737a0 napi_disable
0xe1d1af76 __dev_kfree_skb_any
```
The methodology is showing us there are 4 addresses which do not match up between the older kernel and this newer module, making them incompatible for use together.
## Caveats
A compatible kernel module as determined by the _weak-updates_ methodology is an observation from the symbol addresses from the outside only; there is no way to functionally test the module works at runtime transparently, only that it can be inserted to the target kernel without error. It is entirely possible for a coding error internally to occur and the module not work; the kernel engineers patching a given kernel may have changed something which causes breakage.
Testing a newly updated kernel against any existing weak module must performed to ensure all functionality is retained.
## References
- <https://lists.fedoraproject.org/pipermail/devel/2006-August/088293.html>