Revision History

Version Date Description



Initial Release

PIO versus DMA

Graphics related hardware modules on Tegra contain a number of registers, which configure, control, and/or invoke the module’s functionality.

These registers are exposed in a memory-mapped fashion, as is typical for hardware. This allows the CPU to perform direct reads from or writes to those registers. This mode of operation can be described as direct PIO; direct because the CPU is directly accessing the registers using a register-specific address, and PIO meaning Programmed IO, implying that the CPU’s program is performing the access.

However, this mode of operation is inefficient from a software or CPU point of view; the CPU must wait for each register access to synchronously complete, or at least to be posted into some form of write buffer. Equally, the code running on the CPU must explicitly contain instructions not only to generate the data to be written into the registers, but also to perform the actual IO itself.

For this reason, hardware modules often allow the use of some form of DMA engine, so that the raw register IO can be offloaded from the CPU, allowing the CPU to perform other tasks, or go idle, while the raw register IO is being performed by dedicated hardware.


The Tegra host1x module is the DMA engine for register access to Tegra’s graphics- and multimedia-related modules. The modules served by host1x are referred to as clients. host1x includes some other functionality, such as synchronization.

host1x contains a FIFO (in fact, both a write and a read FIFO) for each client module. These FIFOs contain the register addresses and values for data written to, or read from, the client modules.

host1x contains memory-mapped registers that allow the CPU to control these FIFOs. Accessing client module registers in this manner can be described as indirect access, since the client’s registers are access indirectly through the host1x FIFO, rather than directly through a memory map. However, use of this access mechanism is discouraged.


host1x is typically used as a complete DMA engine. In this mode, host1x reads a memory buffer consisting of host1x opcodes (a representation of register accesses or related operations), and streams these into host1x’s FIFOs for processing by client modules. The memory buffers are known as push buffers. Multiple buffers may be active at once, with each client module’s FIFO being fed from up to one push buffer at a given time.

The memory region for a push buffer begins at the address in register HOST1X_CHANNEL_DMASTART. The end of the buffer is defined by register HOST1X_CHANNEL_DMAEND, which points at the first byte after or outside the buffer.

Software fills the buffer, then informs host1x which region it has written by updating register HOST1X_CHANNEL_DMAPUT. This register points at the first byte that does not contain valid data, i.e. it points immediately after the data that was written. host1x hardware reports which portion of the buffer it has read in register HOST1X_CHANNEL_DMAGET. This register again points at the first byte not processed by hardware, i.e. it points immediately after the data that has been read. Whenever HOST1X_CHANNEL_DMAPUT ("PUT") and HOST1X_CHANNEL_DMAGET ("GET") differ, host1x will execute opcodes from the push buffer.

Each channel has some stored state, such as the client unit this channel is talking to.

The most important opcodes are:


Sets the target client unit. This is sometimes referred to as a class rather than a client.


Write values to registers of client unit. Registers are sometimes referred to as methods.


Instructs command DMA to fetch from another memory region.


Instructs command DMA to start over from beginning of push buffer.

The GATHER opcode requests that host1x process another memory region. This region can contain either a command-stream of host1x opcodes, or simply be data parameters to an INCR/NONINCR opcode encoded as part of the main GATHER opcode. When an opcode is embedded into the GATHER itself, the referenced region contains only data for that one opcode, and never any additional opcodes. The size of the region is indicated in GATHER opcode, and the base address is read from the following word. `GATHER`s are not recursive.

Note that the push buffer does not automatically loop; if host1x reaches the end of the push buffer without having seen a RESTART opcode, this is an error condition, and host1x will stop processing that push buffer. Special care must be taken when writing PUT to execute a RESTART opcode. PUT typically points just past the valid push-buffer content. However, this would cause a problem for a RESTART, since the execution of RESTART resets GET to the start of the buffer. At this point, GET would not be equal to PUT, and hence host1x would execute the buffer another time, and in fact loop forever. For this reason, set PUT to the start of the buffer rather than pointing it after the RESTART.


A sync point is a register in host1x. These registers simply hold a 32-bit value. There are 32 sync points in Tegra20 and Tegra30. Sync points should be initialized to zero at boot-up, and treated as monotonically incrementing counters with wrapping.

Sync point values can be changed by opcodes in a push buffer. Client units all have sync point increment registers at offset 0. Some clients have separate sync point registers at other locations. A write to one of these registers will wait for some (client-defined) condition, and then increment the specified sync point value by 1. The exact set of legal conditions and their semantics are defined by each client module. Some examples are:


Waits for all previous operations to be complete.


Waits until the client unit has finished all reads from buffers.


Waits until it’s safe to send more register writes without blocking. Most clients complete this condition immediately.


Immediately completes; no wait.

The CPU can increment a sync point by writing the sync point id (0-31 in Tegra20 and Tegra30) to register HOST1X_SYNC_SYNCPT_CPU_INCR.

Any channel can be frozen waiting for a sync point value to exceed a threshold. This is achieved by setting the channel’s client to host1x, and accessing register NV_CLASS_HOST_WAIT_SYNCPT.

host1x can assert an interrupt to the CPU when a sync point value exceeds a threshold. This allows the CPU to wait until a specific command in a push buffer is executed; one which increments the sync point register to (or past) the specified value.