This section provides answers to frequently asked questions associated with the NVIDIA Linux x86_64 Driver and its installation. Common problem diagnoses can be found in Chapter 8, Common Problems and tips for new users can be found in Appendix I, Tips for New Linux Users. Also, detailed information for specific setups is provided in the Appendices.
7.1. NVIDIA-INSTALLER |
|
How do I extract the contents of the |
|
Run the installer as follows: # sh NVIDIA-Linux-x86_64-346.16.run --extract-only This will create the directory
NVIDIA-Linux-x86_64-346.16, containing the uncompressed
contents of the |
|
How can I see the source code to the kernel interface layer? |
|
The source files to the kernel interface layer are in the kernel directory of the extracted .run file. To get to these sources, run: # sh NVIDIA-Linux-x86_64-346.16.run --extract-only # cd NVIDIA-Linux-x86_64-346.16/kernel/ |
|
How and when are the NVIDIA device files created? |
|
When a user-space NVIDIA driver component needs to communicate with the NVIDIA kernel module, and the NVIDIA character device files do not yet exist, the user-space component will first attempt to load the kernel module and create the device files itself. Device file creation and kernel module loading generally require root privileges. The X driver, running within a setuid root X server, will have these privileges, but not, e.g., the CUDA driver within the environment of a normal user. If the user-space NVIDIA driver component cannot load the kernel module or create the device files itself, it will attempt to invoke the setuid root nvidia-modprobe utility, which will perform these operations on behalf of the non-privileged driver. See the nvidia-modprobe(1) man page, or its source code, available here: ftp://download.nvidia.com/XFree86/nvidia-modprobe/ When possible, it is recommended to use your Linux distribution's native mechanisms for managing kernel module loading and device file creation. nvidia-modprobe is provided as a fallback to work out-of-the-box in a distribution-independent way. Whether a user-space NVIDIA driver component does so itself, or invokes nvidia-modprobe, it will default to creating the device files with the following attributes: UID: 0 - 'root' GID: 0 - 'root' Mode: 0666 - 'rw-rw-rw-' Existing device files are changed if their attributes don't match these defaults. If you want the NVIDIA driver to create the device files with different attributes, you can specify them with the "NVreg_DeviceFileUID" (user), "NVreg_DeviceFileGID" (group) and "NVreg_DeviceFileMode" NVIDIA Linux kernel module parameters. For example, the NVIDIA driver can be instructed to create device files with UID=0 (root), GID=44 (video) and Mode=0660 by passing the following module parameters to the NVIDIA Linux kernel module: NVreg_DeviceFileUID=0 NVreg_DeviceFileGID=44 NVreg_DeviceFileMode=0660 The "NVreg_ModifyDeviceFiles" NVIDIA kernel module parameter will disable dynamic device file management, if set to 0. |
|
Why does NVIDIA not provide RPMs? |
|
Not every Linux distribution uses RPM, and NVIDIA provides a single solution that works across all Linux distributions. NVIDIA encourages Linux distributions to repackage and redistribute the NVIDIA Linux driver in their native package management formats. These repackaged NVIDIA drivers are likely to inter-operate best with the Linux distribution's package management technology. For this reason, NVIDIA encourages users to use their distribution's repackaged NVIDIA driver, where available. |
|
Can the nvidia-installer use a proxy server? |
|
Yes, because the FTP support in nvidia-installer is based on
snarf, it will honor the |
|
What is the significance of the |
|
To distinguish between Linux-x86_64 driver package files that do
or do not also contain 32-bit compatibility libraries,
"-no-compat32" is be appended to the latter. |
|
Can I add my own precompiled kernel interfaces to a
|
|
Yes, the |
|
Where can I find the source code for the |
|
The |
|
How can I minimize software overhead when driving many GPUs in a single system? |
|
Coordinating access to many GPUs within a system can introduce software overhead, hurting performance. When separate GPUs are intended to process independent workloads (e.g., CUDA running on separate GPUs, or a virtualized environment where different GPUs are dedicated to different virtual machines), the NVIDIA GPU driver can be configured to use multiple NVIDIA kernel modules, where each GPU is assigned to a specific instance of the NVIDIA kernel module. This can reduce the software overhead needed to coordinate across GPUs. You can build and install multiple kernel modules with a command such as: # sh NVIDIA-Linux-x86_64-346.16.run \ --multiple-kernel-modules=MULTIPLE-KERNEL-MODULES Replace 'MULTIPLE-KERNEL-MODULES' with the number of NVIDIA kernel modules to build. The maximum number of modules that can be built is 8. This will install nvidia-frontend.ko and nvidia[0-7].ko modules, instead of nvidia.ko, in /lib/modules/`uname -r`/kernel/drivers/video. By default, all of the NVIDIA GPUs will be assigned to the same module. The 'NVreg_AssignGpus' NVIDIA Linux kernel module option can be used to assign GPUs to specific module instances. E.g., If you wish to assign NVIDIA GPU 0:01:00.0 and 0:02:00.0 to the nvidia0 module: # modprobe nvidia0 NVreg_AssignGpus="0:01:00.0,0:02:00.0" The nvidia-frontend module is responsible for dispatching work to the individual nvidia[0-7] modules, and does not directly drive GPUs on its own. Therefore, GPUs cannot be assigned to the nvidia-frontend module. When multiple NVIDIA kernel modules are used, any applications which use the NVIDIA driver will need to be directed to a particular module instance. The module instances are numbered sequentially, beginning with 0, and the instance number is reflected in the name of the module. The environment variable '__NVIDIA_KERNEL_MODULE_INSTANCE' controls the module instance used by each application. E.g., if an application uses a device that is associated with module nvidia2, the module instance would be 2, and __NVIDIA_KERNEL_MODULE_INSTANCE should be set to 2 in that application's environment. This can be done by exporting the value in a shell before launching the application. If you do not set the variable '__NVIDIA_KERNEL_MODULE_INSTANCE', it will result in a failure with the following message. FATAL: Module nvidia not found. |
|
7.2. NVIDIA Driver |
|
Where should I start when diagnosing display problems? |
|
One of the most useful tools for diagnosing problems is the X
log file in (==) Using config file: Also make sure that the NVIDIA driver is being used, rather than another driver. Search for (II) LoadModule: "nvidia" Lines from the driver should begin with: (II) NVIDIA(0) |
|
How can I increase the amount of data printed in the X log file? |
|
By default, the NVIDIA X driver prints relatively few messages
to stderr and the X log file. If you need to troubleshoot, then it
may be helpful to enable more verbose output by using the X command
line options % startx -- -verbose 5 -logverbose 5 |
|
What is NVIDIA's policy towards development series Linux kernels? |
|
NVIDIA does not officially support development series kernels.
However, all the kernel module source code that interfaces with the
Linux kernel is available in the |
|
Where can I find the tarballs? |
|
Plain tarballs are not available. The |
|
How do I tell if I have my kernel sources installed? |
|
If you are running on a distro that uses RPM (Red Hat, Mandriva, SuSE, etc), then you can use rpm to tell you. At a shell prompt, type: % rpm -qa | grep kernel and look at the output. You should see a package that corresponds to your kernel (often named something like kernel-2.6.15-7) and a kernel source package with the same version (often named something like kernel-devel-2.6.15-7). If none of the lines seem to correspond to a source package, then you will probably need to install it. If the versions listed mismatch (e.g., kernel-2.6.15-7 vs. kernel-devel-2.6.15-10), then you will need to update the kernel-devel package to match the installed kernel. If you have multiple kernels installed, you need to install the kernel-devel package that corresponds to your running kernel (or make sure your installed source package matches the running kernel). You can do this by looking at the output of uname -r and matching versions. |
|
What is SELinux and how does it interact with the NVIDIA driver ? |
|
Security-Enhanced Linux (SELinux) is a set of modifications
applied to the Linux kernel and utilities that implement a security
policy architecture. When in use it requires that the security type
on all shared libraries be set to 'shlib_t'. The installer detects
when to set the security type, and sets it on all shared libraries
it installs. The option |
|
How can I build multiple NVIDIA kernel modules? |
|
Multiple NVIDIA kernel modules can be built by using the standard Linux kernel module building technique: # sh ./NVIDIA-Linux-x86_64-346.16.run --extract-only # cd NVIDIA-Linux-x86_64-346.16/kernel # make module and passing the NV_BUILD_MODULE_INSTANCES variable to make. Set
this variable to the desired number of NVIDIA kernel modules, not
greater than eight. This is equivalent to the # make module NV_BUILD_MODULE_INSTANCES=2 This will generate two NVIDIA kernel modules: nvidia0.ko and nvidia1.ko. An additional nvidia-frontend.ko module will also be generated, which handles the redirection of system calls corresponding to GPUs to their respective NVIDIA kernel modules. |
|
Why does X use so much memory? |
|
When measuring any application's memory usage, you must be careful to distinguish between physical system RAM used and virtual mappings of shared resources. For example, most shared libraries exist only once in physical memory but are mapped into multiple processes. This memory should only be counted once when computing total memory usage. In the same way, the video memory on a graphics card or register memory on any device can be mapped into multiple processes. These mappings do not consume normal system RAM. This has been a frequently discussed topic on XFree86 mailing lists; see, for example: http://marc.theaimsgroup.com/?l=xfree-xpert&m=96835767116567&w=2 The pmap utility described in the above thread is available in the "procps" package shipped with most recent Linux distributions, and is a useful tool in distinguishing between types of memory mappings. For example, while top may indicate that X is using several hundred MB of memory, the last line of output from the output of pmap (note that pmap may need to be run as root): # pmap -d `pidof X` | tail -n 1 mapped: 161404K writeable/private: 7260K shared: 118056K reveals that X is really only using roughly 7MB of system RAM (the "writeable/private" value). Note, also, that X must allocate resources on behalf of X clients (the window manager, your web browser, etc); the X server's memory usage will increase as more clients request resources such as pixmaps, and decrease as you close X applications. The IndirectMemoryAccess X configuration option may cause additional virtual address space to be reserved. |
|
Why do applications that use DGA graphics fail? |
|
The NVIDIA driver does not support the graphics component of the XFree86-DGA (Direct Graphics Access) extension. Applications can use the XDGASelectInput() function to acquire relative pointer motion, but graphics-related functions such as XDGASetMode() and XDGAOpenFramebuffer() will fail. The graphics component of XFree86-DGA is not supported because it requires a CPU mapping of framebuffer memory. As graphics cards ship with increasing quantities of video memory, the NVIDIA X driver has had to switch to a more dynamic memory mapping scheme that is incompatible with DGA. Furthermore, DGA does not cooperate with other graphics rendering libraries such as Xlib and OpenGL because it accesses GPU resources directly. NVIDIA recommends that applications use OpenGL or Xlib, rather than DGA, for graphics rendering. Using rendering libraries other than DGA will yield better performance and improve interoperability with other X applications. |
|
My kernel log contains messages that are prefixed with "Xid"; what do these messages mean? |
|
"Xid" messages indicate that a general GPU error occurred, most often due to the driver misprogramming the GPU or to corruption of the commands sent to the GPU. These messages provide diagnostic information that can be used by NVIDIA to aid in debugging reported problems. |
|
I use the Coolbits overclocking interface to adjust my graphics card's clock frequencies, but the defaults are reset whenever X is restarted. How do I make my changes persistent? |
|
Clock frequency settings are not saved/restored automatically by
default to avoid potential stability and other problems that may be
encountered if the chosen frequency settings differ from the
defaults qualified by the manufacturer. You can add an |
|
Why is the refresh rate not reported correctly by utilities that use the XF86VidMode X extension and/or RandR X extension versions prior to 1.2 (e.g., `xrandr --q1`)? |
|
These extensions are not aware of multiple display devices on a single X screen; they only see the MetaMode bounding box, which may contain one or more actual modes. This means that if multiple MetaModes have the same bounding box, these extensions will not be able to distinguish between them. In order to support dynamic display configuration, the NVIDIA X driver must make each MetaMode appear to be unique and accomplishes this by using the refresh rate as a unique identifier. You can use `nvidia-settings -q RefreshRate` to query the actual refresh rate on each display device. |
|
Why does starting certain applications result in Xlib error messages indicating extensions like "XFree86-VidModeExtension" or "SHAPE" are missing? |
|
If your X config file has a Xlib: extension "SHAPE" missing on display ":0.0" Xlib: extension "XFree86-VidModeExtension" missing on display ":0.0" Xlib: extension "XFree86-DGA" missing on display ":0.0" You can solve this problem by adding the line below to your X
config file's Load "extmod" |
|
Where can I find older driver versions? |
|
Please visit ftp://download.nvidia.com/XFree86/Linux-x86_64/ |
|
What is the format of a PCI Bus ID? |
|
Different tools have different formats for the PCI Bus ID of a PCI device. The X server's "BusID" X configuration file option interprets the BusID string in the format "bus@domain:device:function" (the "@domain" portion is only needed if the PCI domain is non-zero), in decimal. More specifically, "%d@%d:%d:%d", bus, domain, device, function in printf(3) syntax. NVIDIA X driver logging, nvidia-xconfig, and nvidia-settings match the X configuration file BusID convention. The lspci(8) utility, in contrast, reports the PCI BusID of a PCI device in the format "domain:bus:device.function", printing the values in hexadecimal. More specifically, "%04x:%02x:%02x.%x", domain, bus, device, function in printf(3) syntax. The "Bus Location" reported in the information file matches the lspci format. Also, the name of per-GPU directory in /proc/driver/nvidia/gpus is the same as the corresponding GPU's PCI BusID in lspci format. On systems where both an integrated GPU and a PCI slot are present, setting the "BusID" option to "AXI" selects the integrated GPU. By default, not specifying this option or setting it to an empty string selects a discrete GPU if available, the integrated GPU otherwise. |
|
How do I interpret X server version numbers? |
|
X server version numbers can be difficult to interpret because some X.Org X servers report the versions of different things. In 2003, X.Org created a fork of the XFree86 project's code base, which used a monolithic build system to build the X server, libraries, and applications together in one source code repository. It resumed the release version numbering where it left off in 2001, continuing with 6.7, 6.8, etc., for the releases of this large bundle of code. These version numbers are sometimes written X11R6.7, X11R6.8, etc. to include the version of the X protocol. In 2005, an effort was made to split the monolithic code base into separate modules with their own version numbers to make them easier to maintain and so that they could be released independently. X.Org still occasionally releases these modules together, with a single version number. These releases are simply referred to as “X.Org releases”, or sometimes “katamari” releases. For example, X.Org 7.6 was released on December 20, 2010 and contains version 1.9.3 of the xorg-server package, which contains the core X server itself. The release management changes from XFree86, to X.Org monolithic
releases, to X.Org modular releases impacted the behavior of the X
server's XFree86 Version 4.3.0 (Red Hat Linux release: 4.3.0-2) Release Date: 27 February 2003 X Protocol Version 11, Revision 0, Release 6.6 X servers in X.Org monolithic and early “katamari” releases did something similar: X Window System Version 7.1.1 Release Date: 12 May 2006 X Protocol Version 11, Revision 0, Release 7.1.1 However, X.Org later modified the X server to start printing its individual module version number instead: X.Org X Server 1.9.3 Release Date: 2010-12-13 X Protocol Version 11, Revision 0 Please keep this in mind when comparing X server versions: what looks like “version 7.x” is older than version 1.x. |
|
Why doesn't the NVIDIA X driver make more display resolutions and refresh rates available via RandR? |
|
Prior to the 302.* driver series, the list of modes reported to applications by the NVIDIA X driver was not limited to the list of modes natively supported by a display device. In order to expose the largest possible set of modes on digital flat panel displays, which typically do not accept arbitrary mode timings, the driver maintained separate sets of "front-end" and "back-end" mode timings, and scaled between them to simulate the availability of more modes than would otherwise be supported. Front-end timings were the values reported to applications, and back-end timings were what was actually sent to the display. Both sets of timings went through the full mode validation process, with the back-end timings having the additional constraint that they must be provided by the display's EDID, as only EDID-provided modes can be safely assumed to be supported by the display hardware. Applications could request any available front-end timings, which the driver would implicitly scale to either the "best fit" or "native" mode timings. For example, an application might request an 800x600 @ 60 Hz mode and the driver would provide it, but the real mode sent to the display would be 1920x1080 @ 30 Hz. While the availability of modes beyond those natively supported by a display was convenient for some uses, it created several problems. For example:
Version 1.2 of the X Resize and Rotate extension (henceforth referred to as "RandR 1.2") allows configuration of display scaling in a much more flexible and standardized way. The protocol allows applications to choose exactly which (back-end) mode timing is used, and exactly how the screen is scaled to fill that mode. It also allows explicit control over which displays are enabled, and which portions of the screen they display. This also provides much-needed transparency: the mode timings reported by RandR 1.2 are the actual mode timings being sent to the display. However, this means that only modes actually supported by the display are reported in the RandR 1.2 mode list. Scaling configurations, such as the 800x600 to 1920x1080 example above, need to be configured via the RandR 1.2 transform feature. Adding implicitly scaled modes to the mode list would conflict with the transform configuration options and reintroduce the same problems that the previous front-end/back-end timing system had. With the introduction of RandR 1.2 support to the 302.* driver series, the front-end/back-end timing system was abandoned, and the list of mode timings exposed by the NVIDIA X driver was simplified to include only those modes which would actually be driven by the hardware. Although it remained possible to manually configure all of the scaling configurations that were previously possible, and many scaling configurations which were previously impossible, this change resulted in some inconvenient losses of functionality:
Subsequent driver releases restored some of this functionality without reverting to the front-end/back-end system:
As mentioned previously, the RandR 1.2 mode list contains only modes which are supported by the display. Modern applications that wish to set modes other than those available in the RandR 1.2 mode list are encouraged to use RandR 1.2 transformations to program any required scaling operations. For example, the xrandr utility can program RandR scaling transformations, and the following command can scale a 1280x720 mode to a display connected to output DVI-I-0 that does not support the desired mode, but does support 1920x1080: xrandr --output DVI-I-0 --mode 1920x1080 --scale-from 1280x720 |