Chapter 10. Allocating DMA Buffers on 64-bit Platforms

NVIDIA GPUs have limits on how much physical memory they can address. This directly impacts DMA buffers, as a DMA buffer allocated in physical memory that is unaddressable by the NVIDIA GPU cannot be used (or may be truncated, resulting in bad memory accesses).

All pre-PCI Express GPUs and non-Native PCI Express GPUs (often known as bridged GPUs) are limited to 32 bits of physical address space, which corresponds to 4 GB of memory. On a system with greater than 4 GB of memory, allocating usable DMA buffers can be a problem. Native PCI Express GPUs are capable of addressing greater than 32 bits of physical address space and do not experience the same problems.

Newer kernels provide a simple way to allocate memory that is guaranteed to reside within the 32 bit physical address space. Linux 2.6.15 provides this functionality with the __GFP_DMA32 interface. Kernels earlier than this version provide a software I/O TLB on Intel's EM64T and IOMMU support on AMD's AMD64 platform.

Unfortunately, some problems exist with both interfaces. Early implementations of the Linux SWIOTLB set aside a very small amount of memory for its memory pool (only 4 MB). Also, when this memory pool is exhausted, some SWIOTLB implementations forcibly panic the kernel. This is also true for some implementations of the IOMMU interface.

The NVIDIA Linux driver does not support the SWIOTLB. NVIDIA recommends that users of Intel's EM64T platform upgrade to Linux 2.6.11 or a more recent Linux kernel.

On AMD's AMD64 platform, the size of the IOMMU can be configured in the system BIOS or, if no IOMMU BIOS option is available, using the 'iommu=memaper' kernel parameter. This kernel parameter expects an order and instructs the Linux kernel to create an IOMMU of size 32 MB^order overlapping physical memory. If the system's default IOMMU is smaller than 64 MB, the Linux kernel automatically replaces it with a 64 MB IOMMU.

To reduce the risk of stability problems as a result of IOMMU space exhaustion on the X86-64 platform, the NVIDIA Linux driver internally limits its use of these interfaces. By default, the driver will not use more than 60 MB of IOMMU space, leaving at least 4 MB for the rest of the system (assuming a 64 MB IOMMU).

This limit can be adjusted with the 'NVreg_RemapLimit' NVIDIA kernel module option. Specifically, if the IOMMU is larger than 64 MB, the limit can be adjusted to take advantage of the additional space. The 'NVreg_RemapLimit' option expects the size argument in bytes.

NVIDIA recommends leaving 4 MB available for the rest of the system when changing the limit. For example, if the internal limit is to be relaxed to account for a 128 MB IOMMU, the recommended remap limit is 124 MB. This remap limit can be specified by passing 'NVreg_RemapLimit=0x7c00000' to the NVIDIA kernel module.

Also see the 'The X86-64 platform (AMD64/EM64T) and early Linux 2.6 kernels' section in The X86-64 platform (AMD64/EM64T) and early Linux 2.6 kernels.