Appendix L. Known Issues

The following problems still exist in this release and are in the process of being resolved.

Known Issues

OpenGL and dlopen()

There are some issues with older versions of the glibc dynamic loader (e.g., the version that shipped with Red Hat Linux 7.2) and applications such as Quake3 and Radiant, that use dlopen(). Please see Chapter 4, Frequently Asked Questions for more details.

Multicard, Multimonitor

In some cases, the secondary card is not initialized correctly by the NVIDIA kernel module. You can work around this by enabling the XFree86 Int10 module to soft-boot all secondary cards. See Appendix D, X Config Options for details.

Interaction with pthreads

Single-threaded applications that use dlopen() to load NVIDIA's libGL library, and then use dlopen() to load any other library that is linked against libpthread will crash in libGL. This does not happen in NVIDIA's new ELF TLS OpenGL libraries (please see Appendix C, Installed Components for a description of the ELF TLS OpenGL libraries). Possible workarounds for this problem are:

  1. Load the library that is linked with libpthread before loading libGL.

  2. Link the application with libpthread.

The X86-64 platform (AMD64/EM64T) and 2.6 kernels

Many 2.4 and 2.6 x86_64 kernels have an accounting problem in their implementation of the change_page_attr kernel interface. Early 2.6 kernels include a check that triggers a BUG() when this situation is encountered (triggering a BUG() results in the current application being killed by the kernel; this application would be your OpenGL application or potentially the X server). The accounting issue has been resolved in the 2.6.11 kernel.

We have added checks to recognize that the NVIDIA kernel module is being compiled for the x86-64 platform on a kernel between 2.6.0 and 2.6.11. In this case, we will disable usage of the change_page_attr kernel interface. This will avoid the accounting issue but leaves the system in danger of cache aliasing (see entry below on Cache Aliasing for more information about cache aliasing). Note that this change_page_attr accounting issue and BUG() can be triggered by other kernel subsystems that rely on this interface.

If you are using a 2.6 x86_64 kernel, it is recommended that you upgrade to a 2.6.11 or later kernel.

IOMMU/SWIOTLB interaction on the X86-64 platform

Linux does not currently provide a mechanism for allocating memory with addresses that fall within the first 4GB of the physical memory installed in a Linux/x86-64 system. Addresses within this range are necessary for 32-bit PCI hardware to provide DMA capabilities. Instead, the Linux kernel provides a software I/O TLB on Intel's EM64T and IOMMU support on AMD's AMD64 platform.

Unfortunately, some problems exist with both interfaces. Early implementations of the Linux SWIOTLB set aside a very small amount of memory for its memory pool (only 4MB). Also, when this memory pool is exhausted, some SWIOTLB implementations forcibly panic the kernel. This is also true for some implementations of the IOMMU interface.

Kernel panics and related stability problems on Intel's EM64T platform can be avoided by increasing the size of the SWIOTLB pool with the 'swiotlb' kernel parameter. This kernel parameter expects the desired size in KB, divided by two. NVIDIA suggests raising the size of the SWIOTLB pool to 64MB; this is accomplished by passing 'swiotlb=32768' to the kernel.

Starting with Linux 2.6.9, the default size of the SWIOTLB is 64MB and overflow handling is improved. Both of these changes are expected to greatly improve stability on Intel's EM64T platform. If you consider upgrading your Linux kernel to benefit from these improvements, NVIDIA recommends that you upgrade to Linux 2.6.11 or a more recent Linux kernel. Please see the previous section for additional information.

On AMD's AMD64 platform, the size of the IOMMU can be configured in the system BIOS or, if no IOMMU BIOS option is available, using the 'iommu=memaper' kernel parameter. This kernel parameter expects an order and instructs the Linux kernel to create an IOMMU of size 32MB^order overlapping physical memory. If the system's default IOMMU is smaller than 64MB, the Linux kernel automatically replaces it with a 64MB IOMMU.

To reduce the risk of stability problems as a result of IOMMU or SWIOTLB exhaustion on the X86-64 platform, the NVIDIA Linux driver internally limits its use of these interfaces. By default, the driver will not use more than 60MB of IOMMU/SWIOTLB space, leaving 4MB for the rest of the system (assuming a 64MB IOMMU/SWIOTLB).

This limit can be adjusted with the 'NVreg_RemapLimit' NVIDIA kernel module option. Specifically, if the IOMMU/SWIOTLB is larger than 64MB, the limit can be adjusted to take advantage of the additional space. The 'NVreg_RemapLimit' option expects the size argument in bytes.

NVIDIA recommends leaving 4MB available for the rest of the system when changing the limit. For example, if the internal limit is to be relaxed to account for a 128MB IOMMU/SWIOTLB, the recommended remap limit is 124MB. This remap limit can be specified by passing 'NVreg_RemapLimit=0x7c00000' to the NVIDIA kernel module.

Please also read the previous known issues section for information on additional stability problems on this platform.

Cache Aliasing

Cache aliasing occurs when multiple mappings to a physical page of memory have conflicting caching states, such as cached and uncached. Due to these conflicting states, data in that physical page may become corrupted when the processor's cache is flushed. If that page is being used for DMA by a driver such as NVIDIA's graphics driver, this can lead to hardware stability problems and system lockups.

NVIDIA has encountered bugs with some Linux kernel versions that lead to cache aliasing. Although some systems will run perfectly fine when cache aliasing occurs, other systems will experience severe stability problems, including random lockups. Users experiencing stability problems due to cache aliasing will benefit from updating to a kernel that does not cause cache aliasing to occur.

NVIDIA has added driver logic to detect cache aliasing and to print a warning with a message similar to the following:

NVRM: bad caching on address 0x1cdf000: actual 0x46 != expected 0x73

If you see this message in your log files and are experiencing stability problems, you should update your kernel to the latest version.

If the message persists after updating your kernel, please send a bug report to NVIDIA.

64-Bit BARs (Base Address Registers)

Starting with native PCI Express GPUs, NVIDIA's GPUs will advertise a 64-bit BAR capability (a Base Address Register stores the location of a PCI I/O region, such as registers or a frame buffer). This means that the GPU's PCI I/O regions (registers and frame buffer) can be placed above the 32-bit address space (the first 4 gigabytes of memory).

The decision of where the BAR is placed is made by the system BIOS at boot time. If the BIOS supports 64-bit BARs, then the NVIDIA PCI I/O regions may be placed above the 32-bit address space. If the BIOS does not support this feature, then our PCI I/O regions will be placed within the 32-bit address space as they have always been.

Unfortunately, current Linux kernels (as of 2.6.11.x) do not understand or support 64-bit BARs. If the BIOS does place any NVIDIA PCI I/O regions above the 32-bit address space, the kernel will reject the BAR and the NVIDIA driver will not work.

There is no known workaround at this point.

Laptops

If you are using a laptop please see the "Known Laptop Issues" in Appendix I, Configuring a Laptop.

FSAA

When FSAA is enabled (the __GL_FSAA_MODE environment variable is set to a value that enables FSAA and a multisample visual is chosen), the rendering may be corrupted when resizing the window.

libGL DSO finalizer and pthreads

When a multithreaded OpenGL application exits, it is possible for libGL's DSO finalizer (also known as the destructor, or "_fini") to be called while other threads are executing OpenGL code. The finalizer needs to free resources allocated by libGL. This can cause problems for threads that are still using these resources. Setting the environment variable "__GL_NO_DSO_FINALIZER" to "1" will work around this problem by forcing libGL's finalizer to leave its resources in place. These resources will still be reclaimed by the operating system when the process exits. Note that the finalizer is also executed as part of dlclose(3), so if you have an application that dlopens(3) and dlcloses(3) libGL repeatedly, "__GL_NO_DSO_FINALIZER" will cause libGL to leak resources until the process exits. Using this option can improve stability in some multithreaded applications, including Java3D applications.

XVideo and the Composite X extension

XVideo will not work correctly when Composite is enabled. See Appendix S, The X Composite Extension.

This section describes problems that will not be fixed. Usually, the source of the problem is beyond the control of NVIDIA. Following is the list of problems:

Problems that Will Not Be Fixed

Gigabyte GA-6BX Motherboard

This motherboard uses a LinFinity regulator on the 3.3 V rail that is only rated to 5 A -- less than the AGP specification, which requires 6 A. When diagnostics or applications are running, the temperature of the regulator rises, causing the voltage to the NVIDIA chip to drop as low as 2.2 V. Under these circumstances, the regulator cannot supply the current on the 3.3 V rail that the NVIDIA chip requires.

This problem does not occur when the graphics board has a switching regulator or when an external power supply is connected to the 3.3 V rail.

VIA KX133 and 694X Chip sets with AGP 2x

On Athlon motherboards with the VIA KX133 or 694X chip set, such as the ASUS K7V motherboard, NVIDIA drivers default to AGP 2x mode to work around insufficient drive strength on one of the signals.

Irongate Chip sets with AGP 1x

AGP 1x transfers are used on Athlon motherboards with the Irongate chipset to work around a problem with signal integrity.

ALi chipsets, ALi1541 and ALi1647

On ALi1541 and ALi1647 chipsets, NVIDIA drivers disable AGP to work around timing issues and signal integrity issues. See Chapter 5, Common Problems for more information on ALi chipsets.

I/O APIC (SMP)

If you are experiencing stability problems with a Linux SMP machine and seeing I/O APIC warning messages from the Linux kernel, system reliability may be greatly improved by setting the "noapic" kernel parameter.

Local APIC (UP)

On some systems, setting the "Local APIC Support on Uniprocessors" kernel configuration option can have adverse effects on system stability and performance. If you are experiencing lockups with a Linux UP machine and have this option set, try disabling local APIC support.