20240723: おうちのConnect-X4 LxにDOCA-OFEDを入れる¶
MLNX_OFED has transitioned into DOCA-Host, and now available as DOCA-OFED (learn about DOCA-Host profiles here).
MLNX_OFED last standalone release is October 2024 Long Term Support (3 years). Starting January 2025 all new features will be included in DOCA-OFED only.
なのでDOCA-OFEDにできるならしたい。
- 事前確認:
Device Type: ConnectX4LX
Part Number: MCX4121A-XCA_Ax
Description: ConnectX-4 Lx EN network interface card; 10GbE dual-port SFP28; PCIe3.0 x8; ROHS R6
PSID: MT_2420110004
PCI Device Name: 0000:01:00.0
Base MAC: b83fd20fd7ee
Versions: Current Available
FW 14.32.1010 N/A
PXE 3.6.0502 N/A
UEFI 14.25.0017 N/A
DOCA¶
Manualはこれ。
doca-hostというのが、
These files contain the following components suitable for their respective OS version.
DOCA Devel v2.7.0
DOCA Runtime v2.7.0
DOCA Extra v2.7.0
DOCA OFED v2.7.0
なのでこれを入れれば良さそう。
sudo su
export DOCA_URL="https://linux.mellanox.com/public/repo/doca/2.7.0/ubuntu22.04/x86_64/"
curl https://linux.mellanox.com/public/repo/doca/GPG-KEY-Mellanox.pub | gpg --dearmor > /etc/apt/trusted.gpg.d/GPG-KEY-Mellanox.pub
echo "deb [signed-by=/etc/apt/trusted.gpg.d/GPG-KEY-Mellanox.pub] $DOCA_URL ./" > /etc/apt/sources.list.d/doca.list
exit
sudo apt-get update
sudo apt-get -y install doca-all
...
knem.ko:
- Uninstallation
- Deleting from: /lib/modules/5.15.0-100-generic/updates/dkms/
- Original module
- No original module was found for this module on this kernel.
- Use the dkms install command to reinstall any previous module version.
depmod...
Module knem-1.1.4.90mlnx2 for kernel 5.15.0-97-generic (x86_64).
Before uninstall, this module version was ACTIVE on this kernel.
knem.ko:
- Uninstallation
- Deleting from: /lib/modules/5.15.0-97-generic/updates/dkms/
- Original module
- No original module was found for this module on this kernel.
- Use the dkms install command to reinstall any previous module version.
...
Setting up mlnx-ofed-kernel-dkms (24.04.OFED.24.04.0.7.0.1-1) ...Loading new mlnx-ofed-kernel-24.04.OFED.24.04.0.7.0.1 DKMS files...
First Installation: checking all kernels...
Building for 5.15.0-97-generic and 5.15.0-100-generic
Building for architecture x86_64
Building initial module for 5.15.0-97-generic
sudo /etc/init.d/openibd restart
sudo mst restart
sudo mst start
sudo systemctl status rshim
sudo /etc/init.d/openibd restart
いいはずだが…
ubuntu@optiplex:~$
sudo /etc/init.d/openibd restart
Unloading mlx_compat [FAILED]
rmmod: ERROR: Module mlx_compat is in use by: nvme_core nvme_fabrics
ubuntu@optiplex:~$ ls -alt /usr/bin/mlxfwmanager
-rwxr-xr-x 1 root root 10887728 Apr 25 15:27 /usr/bin/mlxfwmanager
ubuntu@optiplex:~$ sudo mlxconfig -d /dev/mst/mt4117_pciconf0 q
Device #1:
----------
Device type: ConnectX4LX
Name: MCX4121A-XCA_Ax
Description: ConnectX-4 Lx EN network interface card; 10GbE dual-port SFP28; PCIe3.0 x8; ROHS R6
Device: /dev/mst/mt4117_pciconf0
あまりmlx resetしても意味がなさそうなので
# sudo shutdown -r now