GoForce 3D: New Low-Power Architecture
Transform/Setup
Raster
Texture
Fragment ALU
Data Write
•Flexible Fragment ALU
•Raster – fragment generation and loop management
•Pipelines only trigger on activity
•Low Power
–< 50 mW  per 100M pixel/sec
–During actual gameplay
•Very scalable architecture
(~50 pipe stages)
GoForce 3D architecture is a completely new architecture
•designed from the ground up to be efficient in terms of performance and power.
•By low power, what this means is that
•Implementations require less than 100mW or in some cases 50mW per 100 mega-pixels per second.
•That’s a significant reduction in power consumption when compared to the traditional graphics pipeline.

•Much shallower architecture, there are fewer pipeline stages.
•Stages only trigger on activity.
•Unlike the traditional pipeline, where the units are always triggering.  Here things are more power efficient, because only the stages that are busy are clocking.

•Built around a “fragment ALU”
•Programmable unit that we use to implement most per-pixel operations

•Transform and setup engine
•VLIW-like unit that handles the transformation and setup tasks
•Native fixed and floating point data, BUT setup work is computed in floating point – ensures accurate rendering.
•Vertex cache – vertex re-use improves performance.  Indexed trianglelists or strips allows hardware to use cached copy
•Frustum clipper – eliminates/clips triangles outside of the view frustum.

•Raster – two roles
•Generates pixel fragments – (Z, colors, texture coordinates,etc.)
•Manages recirculation of pixels to downstream parts of the pipeline

•Texture Unit
•Z-fetch and Z-comparison (early)
•Color fetch operation – when doing FB blending this unit fetches the FB color
•Undithering (optional)
•Texture Fetch
•Cache, Filtering, Format Conversion
•Decompression

•Fragment ALU
•Signed 10-bit math per-component
•Programmable – used to implement Texture Combine modes
•Per-pixel ops:  fog, alpha blend, alpha test

•Data Write
•Writes data to the framebuffer
•Format conversions when writing data to the framebuffer (I.e. it can do “dithering”)
•Optional: recirculation

•Key Benefits
•Scalable
•Low Power
•Programmable


Floating point VLIW machine
Precision for accurate rendering
Vertex Buffer for vertex re-use
Triangle strips, fans, meshes …
Supports Float or Fixed formats
Frustum clipper

Scoreboard – basically a “traffic cop” – manages recirculation of pixels traffic

Z Fetch and compare (Early Z)
Color Fetch and compare (color keying)
Optional un-dithering of color
Texture Fetch
Cache
Filter
Format conversion
Decompression

Signed 10bit math per component
Texture Combine
Fog
Alpha Blend
Alpha Test


Format conversion
Optional Dithering
Optional Re-circulation forwarding