Chapter 38. Addressing Capabilities

Many PCIe devices have limitations in what memory addresses they can access for DMA purposes (based on the number of lines dedicated to memory addressing). This can cause problems if the host system has memory mapped to addresses beyond what the PCIe device can support. If a PCIe device is allocated memory at an address beyond what the device can support, the address may be truncated and the device will access the incorrect memory location.

Note that since certain system resources, such as ACPI tables and PCI I/O regions, are mapped to address ranges below the 4 GB boundary, the RAM installed in x86/x86-64 systems cannot necessarily be mapped contiguously. Similarly, system firmware is free to map the available RAM at its or its users' discretion. As a result, it is common for systems to have RAM mapped outside of the address range [0, RAM_SIZE], where RAM_SIZE is the amount of RAM installed in the system.

For example, it is common for a system with 512 GB of RAM installed to have physical addresses up to ~513 GB. In this scenario, a GPU with an addressing capability of 512 GB would force the driver to fall back to the 4 GB DMA zone for this GPU.

The NVIDIA Linux driver attempts to identify the scenario where the host system has more memory than a given GPU can address. If this scenario is detected, the NVIDIA driver will drop back to allocations from the 4 GB DMA zone to avoid address truncation. This means that the driver will use the __GFP_DMA32 flag and limit itself to memory addresses below the 4 GB boundary. This is done on a per-GPU basis, so limiting one GPU will not limit other GPUs in the system.

The addressing capabilities of an NVIDIA GPU can be queried at runtime via the procfs interface:

% cat /proc/driver/nvidia/gpus/domain:bus:device.function/information
...
DMA Size:        40 bits
DMA Mask:        0xffffffffff
...

The memory mapping of RAM on a given system can be seen in the BIOS-e820 table printed out by the kernel and available via `dmesg`. Note that the 'usable' ranges are actual RAM:

[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009f000 (usable)
[    0.000000]  BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved)
[    0.000000]  BIOS-e820: 0000000000100000 - 000000003fe5a800 (usable)
[    0.000000]  BIOS-e820: 000000003fe5a800 - 0000000040000000 (reserved)

Solutions

There are multiple potential ways to solve a discrepancy between your system configuration and a GPU's addressing capabilities.

  1. Select a GPU with addressing capabilities that match your target configuration.

    The best way to achieve optimal system and GPU performance is to make sure that the capabilities of the two are in alignment. This is especially important with multiple GPUs in the system, as the GPUs may have different addressing capabilities. In this multiple GPU scenario, other solutions could needlessly impact the GPU that has larger addressing capabilities.

  2. Configure the system's IOMMU to the GPU's addressing capabilities.

    This is a solution targeted at developers and system builders. The use of IOMMU may be an option, depending on system configuration and IOMMU capabilities. Please contact NVIDIA to discuss solutions for specific configurations.

  3. Limit the amount of memory seen by the Operating System to match your GPU's addressing capabilities with kernel configuration.

    This is best used in the scenario where RAM is mapped to addresses that slightly exceeds a GPU's capabilities and other solutions are either not achievable or more intrusive. A good example is the 512 GB RAM scenario outlined above with a GPU capable of addressing 512 GB. The kernel parameter can be used to ignore the RAM mapped above 512 GB.

    This can be achieved in Linux by use of the "mem" kernel parameter. See the kernel-parameters.txt documentation for more details on this parameter.

    This solution does affect the entire system and will limit how much memory the OS and other devices can use. In scenarios where there is a large discrepancy between the system configuration and GPU capabilities, this is not a desirable solution.

  4. Remove RAM from the system to align with the GPU's addressing capabilities.

    This is the most heavy-handed, but may ultimately be the most reliable solution.