Many PCIe devices have limitations in what memory addresses they can access for DMA purposes (based on the number of lines dedicated to memory addressing). This can cause problems if the host system has memory mapped to addresses beyond what the PCIe device can support. If a PCIe device is allocated memory at an address beyond what the device can support, the address may be truncated and the device will access the incorrect memory location.
Note that since certain system resources, such as ACPI tables and PCI I/O regions, are mapped to address ranges below the 4 GB boundary, the RAM installed in x86/x86-64 systems cannot necessarily be mapped contiguously. Similarly, system firmware is free to map the available RAM at its or its users' discretion. As a result, it is common for systems to have RAM mapped outside of the address range [0, RAM_SIZE], where RAM_SIZE is the amount of RAM installed in the system.
For example, it is common for a system with 512 GB of RAM installed to have physical addresses up to ~513 GB. In this scenario, a GPU with an addressing capability of 512 GB would force the driver to fall back to the 4 GB DMA zone for this GPU.
There are multiple potential ways to solve a discrepancy between your system configuration and a GPU's addressing capabilities.
Select a GPU with addressing capabilities that match your target configuration.
The best way to achieve optimal system and GPU performance is to make sure that the capabilities of the two are in alignment. This is especially important with multiple GPUs in the system, as the GPUs may have different addressing capabilities. In this multiple GPU scenario, other solutions could needlessly impact the GPU that has larger addressing capabilities.
Configure the system's IOMMU to the GPU's addressing capabilities.
This is a solution targeted at developers and system builders. The use of IOMMU may be an option, depending on system configuration and IOMMU capabilities. Please contact NVIDIA to discuss solutions for specific configurations.
Limit the amount of memory seen by the Operating System to match your GPU's addressing capabilities with kernel configuration.
This is best used in the scenario where RAM is mapped to addresses that slightly exceeds a GPU's capabilities and other solutions are either not achievable or more intrusive. A good example is the 512 GB RAM scenario outlined above with a GPU capable of addressing 512 GB. The kernel parameter can be used to ignore the RAM mapped above 512 GB.
This solution does affect the entire system and will limit how much memory the OS and other devices can use. In scenarios where there is a large discrepancy between the system configuration and GPU capabilities, this is not a desirable solution.
Remove RAM from the system to align with the GPU's addressing capabilities.
This is the most heavy-handed, but may ultimately be the most reliable solution.