Chapter 42. GPUDirect RDMA Peer Memory Client

Table of Contents

Background
Usage
Module Parameters
Known Issues

Background

GPUDirect RDMA (Remote Direct Memory Access) is a technology that enables a direct path for data exchange between the GPU and a third-party peer device using standard features of PCI Express.

The NVIDIA GPU driver package provides a kernel module, nvidia-peermem.ko, which provides Mellanox InfiniBand based HCAs (Host Channel Adapters) direct peer-to-peer read and write access to the NVIDIA GPU's video memory. It allows GPUDirect RDMA-based applications to use GPU computing power with the RDMA interconnect without needing to copy data to host memory.

This capability is supported with Mellanox ConnectX-3 VPI or newer adapters. It works with both InfiniBand and RoCE (RDMA over Converged Ethernet) technologies.

Mellanox OFED (Open Fabrics Enterprise Distribution) or MOFED, introduces an API between the InfiniBand Core and peer memory clients such as NVIDIA GPUs, called PeerDirect, see https://community.mellanox.com/s/article/howto-implement-peerdirect-client-using-mlnx-ofed.

The nvidia-peermem.ko module registers the NVIDIA GPU with the InfiniBand subsystem by using peer-to-peer APIs provided by the NVIDIA GPU driver.

This module, originally maintained by Mellanox on GitHub, is now included with the NVIDIA Linux GPU driver. The original GitHub project at https://github.com/Mellanox/nv_peer_memory should be considered deprecated and only critical bugs will be addressed for existing installations.

Usage

The kernel must have the required support for RDMA peer memory either through additional patches to the kernel or via Mellanox OFED package (MOFED) as a prerequisite for loading and using nvidia-peermem.ko.

It is possible that the nv_peer_mem module from the GitHub project may be installed and loaded on the system. Installation of nvidia-peermem.ko will not affect the functionality of the existing nv_peer_mem module. But, to load and use nvidia-peermem.ko, users must disable the nv_peer_mem service. Additionally, it is encouraged to uninstall the nv_peer_mem package to avoid any conflict with nvidia-peermem.ko since only one module can be loaded at any time.

Stop the nv_peer_mem service:

    # service nv_peer_mem stop

Check if nv_peer_mem.ko is still loaded after stopping the service:

    # lsmod | grep nv_peer_mem

If nv_peer_mem.ko is still loaded, unload it with:

    # rmmod nv_peer_mem

Uninstall nv_peer_mem package:

For DEB based OS:

    # dpkg -P nvidia-peer-memory

    # dpkg -P nvidia-peer-memory-dkms

For RPM based OS:

    # rpm -e nvidia_peer_memory

After ensuring kernel support and installing the GPU driver, nvidia-peermem.ko can be loaded with the following command with root privileges in a terminal window:

    # modprobe nvidia-peermem

Note: If the NVIDIA GPU driver is installed before MOFED, the GPU driver must be uninstalled and installed again to make sure nvidia-peermem.ko is compiled with the RDMA APIs that are provided by MOFED.

Module Parameters

peerdirect_support: this parameter takes the following integer values:
  • 0, which is the default and is appropriate for a kernel that has the PeerDirect APIs roughly corresponding to MOFED 5.1.

  • 1, which is required in combination with the legacy PeerDirect APIs, as currently shipping in MOFED 5.0 and older releases, notably in MOFED LTS.

As a reference, in the legacy PeerDirect APIs, the peer_memory_client structure declared in peer_mem.h has the two extra function pointers shown below:

  void* (*get_context_private_data)(u64 peer_id);
  void  (*put_context_private_data)(void *context);

Note that MOFED LTS as well as MOFED 5.0 and previous releases ship with legacy PeerDirect APIs. So for example, when using MOFED LTS, GPUDirect RDMA support for the Mellanox HCAs will not work correctly unless peerdirect_support is set to one.

Instead for MOFED 5.1 or newer, the default value of zero is appropriate, so no special actions are needed.

Known Issues

  • Currently, there is no service to automatically load nvidia-peermem.ko. Users need to load the module manually.

  • When loading nvidia-peermem.ko on a kernel with legacy PeerDirect APIs, the module parameter peerdirect_support has to be set to one.

  • The PeerDirect APIs shipping in MOFED releases 5.1 and later are affected by a lock inversion bug which may lead to a kernel-side deadlock. This is tracked by the NVIDIA-internal reference number 2696789. PeerDirect APIs in newer MOFED releases belonging to some branches, like 5.3-1.0.0.1.43, offer an opt-in feature to mitigate that problem. Starting from this release the nvidia-peermem.ko kernel module explicitly enables it, unless peerdirect_support is set to one.