NVIDIA GPUs have limits on how much physical memory they can address. This directly impacts DMA buffers, as a DMA buffer allocated in physical memory that is unaddressable by the NVIDIA GPU cannot be used (or may be truncated, resulting in bad memory accesses). See Chapter 34, Addressing Capabilities for details on the addressing limitations of specific GPUs.
When an NVIDIA GPU has an addressing limit less than the maximum possible physical system memory address, the NVIDIA Linux driver will use the __GFP_DMA32 flag to limit system memory allocations to the lowest 4 GB of physical memory in order to guarantee accessibility. This restriction applies even if there is hardware capable of remapping physical memory into an accessible range present, such as an IOMMU, because the NVIDIA Linux driver cannot determine at the time of memory allocation whether the memory can be remapped. This limitation can significantly reduce the amount of physical memory available to the NVIDIA GPU in some configurations.
The Linux kernel requires that device drivers use the DMA remapping APIs to make physical memory accessible to devices, even when no remapping hardware is present. The NVIDIA Linux driver generally adheres to this requirement, except when it detects that the remapping is implemented using the SWIOTLB, which is not supported by the NVIDIA Linux driver. When the NVIDIA Linux driver detects that the SWIOTLB is in use, it will instead calculate the correct bus address needed to access a physical allocation instead of calling the kernel DMA remapping APIs to do so, as SWIOTLB space is very limited and exhaustion can result in a kernel panic.
The NVIDIA Linux driver does not generally limit its usage of the Linux kernel DMA remapping APIs, and this can result in IOMMU space exhaustion when large amounts of physical memory are remapped for use by the NVIDIA GPU. Most modern IOMMU drivers generally fail gracefully when IOMMU space is exhausted, but NVIDIA recommends configuring the IOMMU in such a way to avoid resource exhaustion if possible, either by increasing the size of the IOMMU or disabling the IOMMU.
On AMD's AMD64 platform, the size of the IOMMU can be configured in the system BIOS or, if no IOMMU BIOS option is available, using the 'iommu=memaper' kernel parameter. This kernel parameter expects an order and instructs the Linux kernel to create an IOMMU of size 32 MB^order overlapping physical memory. If the system's default IOMMU is smaller than 64 MB, the Linux kernel automatically replaces it with a 64 MB IOMMU.
Also see the 'The X86-64 platform (AMD64/EM64T) and early Linux 2.6 kernels' section in The X86-64 platform (AMD64/EM64T) and early Linux 2.6 kernels.