readme.txt for NVShaderPerf.exe Copyright 1998-2006 NVIDIA, Corporation. All rights reserved. NVShaderPerf is used to take a shader file (.ps/.psh or .fx (HLSL) for Direct3D, .fp and .glsl for OpenGL) and produce scheduling information for the NVIDIA GeForce FX and GeForce 6 Series families. Usage: nvshaderperf [OPTION] files -? [print this message] -o/-output outputfilename [location for results, no specification = stdout] -e/-error errorfilename [location for error file, no specification = stderr] -a/-arch chipname [specify the chip name to run the performance analysis on. Supports wildcards like NV3X, NV4*, *, etc.] -c/-chips [output chip names to outputfilename, does not run performance (ie inputfilename is ignored)] -v X [Set the verbosity level (0-2, default is 1)] -t/-technique techid [when using .fx files, specify which technique (either name or 0 based index, * = all) in a given fx file to run performance on, defaults to index 0] -p/-pass passid [when using .fx files, specify which pass (either name or 0 based index, * = all) in a given technique to run performance on, defaults to index 0] -f/-function funcname [when using .fx files, specify the function name (* = all) to profile (instead of using technique/pass)]. Must specify -s/-shader -s/-shader shaderversion [when using .fx files, specify the target version for the -f/-function (ps_1_1, ps_1_2, ps_1_3, ps_1_4, ps_2_0, ps_2_a, ps_2_sw, ps_3_0). Defaults to ps_2_a] -d/-dxversion ver [set the DX Application Version: (8 or 9) default is 9] -z/-zreplace [enable depth replace (default is off)] -Color16 [Destination color register is 16 bpc wide instead of 32 (default: off)] -d3dfog mode [Mode is 1 for EXP fog, 2 for EXP2 fog, and 3 for Linear Fog. Only applies to PS text shaders.] -m/-mrt count [Enable MRT programs and sets the surface count to 'count'] -texrange hexbitfield [2 bits per texture, 0: Texture range FP32, 1: Texture range FP16, 2: Texture range (-1, 1) FX9, 3: Texture range (0, 1) FX8] -signtex hexbitfield [1 bit per texture, 0: Does not suppport signed remap, 1: supports signed remap] -texdim hexbitfield [2 bits per texture, 0: unused, 1: 1D, 2: 2D, 3: 3D, default is 0xAAAAAAAA or all 2D] Examples: nvshaderperf -c *** Display the list of chips supported nvshaderperf -o myshaderresult.txt -a NV30 *.ps *** Runs NV30 performance on all of the .ps shader files found in that dir nvshaderperf -o myeffectresult.txt -t firsttech -p firstpass -a NV3X myeffect.fx *** Runs all NV3X family chips performance on the fx/technique/pass specified nvshaderperf -o myeffectresult.txt -f MyCoolPSFunc -v 2 -a NV30 myeffect.fx *** Runs NV30 performance on the function called MycoolPSFunc with verbosity=2 nvshaderperf -f * -a NV30 myeffect.fx *** Runs NV30 performance on all of the UNIQUE functions (code and shaderversion) in myeffect.fx, this is the same as -t * -p * When using -texrange, -signtex, and -texdim, the parameter is a hexbitfield. All of the bitfields start from the LSB to the MSB (right to left for PCs) for textures 0 - N. For example, when setting the texture dimension for 3 textures (2D, 3D, 2D respectively), the bitpattern would be 101110. This translates to 0x0000002Efor a hexbitfield. Similarly for -signtex, for 4 textures (signed, not, signed signed) would be 1011 or 0x0000000B. Directory Contents: NVShaderPerf.exe: main executable plugins.xml: plugin specification nv_sys.dll: common libraries for our tool set nv_perf3x.dll, nv_perf4x.dll: chip level performance module Output: Example output from NVShaderPerf: -------------------- NV35 -------------------- Target: GeForceFX 5900 Ultra (NV35) :: Unified Compiler: v61.77 Cycles: 1 :: # R Registers: 2 Pixel throughput (assuming 1 cycle texture lookup) 1.80 GP/s The output shows the architecture targeted (NV35 in this case), the version of the NVIDIA Unified Compiler used (v61.77), the number of cycles per pixel, the number of R registers used, and the Pixel throughput (takes into account clock speed and number of pixel pipes). As this is more a measure of ALU performance, NVShaderPerf does not measure texture lookup stalls, so it assumes only 1 cycle for every texture lookup. NOTE: GeForce6 performance shows the maximum R register index used as this is what impacts performance. If (max index + 1) > # R Registers, please email SDKFeedback@nvidia.com. Please also send questions or comments to SDKFeedback@nvidia.com.