Flushes dirty data from UCHE, and also writes a GPU timestamp to
the address if one is provided.
If A6XX_RB_SAMPLE_COUNT_CONTROL.copy is true, writes OQ Z passed
sample counts to RB_SAMPLE_COUNT_ADDR. This writes to main
memory, skipping UCHE.
Writes the GPU timestamp to the address that follows, once RB
access and flushes are complete.
Invalidates depth attachment data from the CCU. We assume this
happens in the last stage.
Invalidates color attachment data from the CCU. We assume this
happens in the last stage.
Flushes the small cache used by CP_EVENT_WRITE::BLIT (which,
along with its registers, would be better named RESOLVE).
Flushes depth attachment data from the CCU. We assume this
happens in the last stage.
Flushes color attachment data from the CCU. We assume this
happens in the last stage.
2D blit to resolve GMEM to system memory (skipping CCU) at the
end of a render pass. Compare to CP_BLIT's BLIT_OP_SCALE for
more general blitting.
Clears based on GRAS_LRZ_CNTL configuration, could clear
fast-clear buffer or LRZ direction.
LRZ direction is stored at lrz_fc_offset + 0x200, has 1 byte which
could be expressed by enum:
CUR_DIR_DISABLED = 0x0
CUR_DIR_GE = 0x1
CUR_DIR_LE = 0x2
CUR_DIR_UNSET = 0x3
Clear of direction means setting the direction to CUR_DIR_UNSET.
Invalidates UCHE.
Doesn't seem to do anything
initialize CP's micro-engine
skip N 32-bit words to get to the next packet
indirect buffer dispatch. prefetch parser uses this packet
type to determine whether to pre-fetch the IB
Takes the same arguments as CP_INDIRECT_BUFFER, but jumps to
another buffer at the same level. Must be at the end of IB, and
doesn't work with draw state IB's.
indirect buffer dispatch. same as IB, but init is pipelined
Waits for the IDLE state of the engine before further drawing.
This is pipelined, so the CP may continue.
wait until a register or memory location is a specific value
wait until a register location is equal to a specific value
wait until a register location is >= a specific value
wait until a read completes
wait until all base/size writes from an IB_PFD packet have completed
register read/modify/write
Set binning configuration registers
reads register in chip and writes to memory
write N 32-bit words to memory
write CP_PROG_COUNTER value to memory
conditional execution of a sequence of packets
conditional write to memory or register
generate an event that creates a write to memory when completed
generate a VS|PS_done event
generate a cache flush done event
generate a z_pass done event
not sure the real name, but this seems to be what is used for
opencl, instead of CP_DRAW_INDX..
initiate fetch of index buffer and draw
draw using supplied indices in packet
initiate fetch of index buffer and binIDs and draw
initiate fetch of bin IDs and draw using supplied indices
begin/end initiator for viz query extent processing
fetch state sub-blocks and initiate shader code DMAs
load constant into chip and to memory
load sequencer instruction memory (pointer-based)
load sequencer instruction memory (code embedded in packet)
load constants from a location in memory
selective invalidation of state pointers
dynamically changes shader instruction memory partition
sets the 64-bit BIN_MASK register in the PFP
sets the 64-bit BIN_SELECT register in the PFP
updates the current context, if needed
generate interrupt from the command stream
copy sequencer instruction memory to system memory
sets draw initiator flags register in PFP, gets bitwise-ORed into
every draw initiator
sets the register protection mode
load high level sequencer command
Conditionally load a IB based on a flag, prefetch enabled
Conditionally load a IB based on a flag, prefetch disabled
Load a buffer with pre-fetch enabled
Set bin (?)
test 2 memory locations to dword values specified
Write register, ignoring context state for context sensitive registers
Record the real-time when this packet is processed by PFP
PFP waits until the FIFO between the PFP and the ME is empty
Used a bit like CP_SET_CONSTANT on a2xx, but can write multiple
groups of registers. Looks like it can be used to create state
objects in GPU memory, and on state change only emit pointer
(via CP_SET_DRAW_STATE), which should be nice for reducing CPU
overhead:
(A4x) save PM4 stream pointers to execute upon a visible draw
Enable or disable predication globally. Also resets the
predicate to "passing" and the local bit to enabled when
enabling global predication.
Enable or disable predication locally. Unlike globally enabling
predication, this packet doesn't touch any other state.
Predication only happens when enabled globally and locally and a
predicate has been set. This should be used for internal draws
which aren't supposed to use the predication state:
CP_DRAW_PRED_ENABLE_LOCAL(0)
... do draw...
CP_DRAW_PRED_ENABLE_LOCAL(1)
Latch a draw predicate into the internal register.
for A4xx
Write to register with address that does not fit into type-0 pkt
copy from ME scratch RAM to a register
Copy from REG to ME scratch RAM
Wait for memory writes to complete
Conditional execution based on register comparison
Memory to REG copy
for a5xx
Tells CP the current mode of GPU operation
Instruct CP to set a few internal CP registers
Enables IB2 skipping. If both GLOBAL and LOCAL are 1 and
nothing is left in the visibility stream, then
CP_INDIRECT_BUFFER will be skipped, and draws will early return
from their IB.
General purpose 2D blit engine for image transfers and mipmap
generation. Reads through UCHE, writes through the CCU cache in
the PS stage.
Write CP_CONTEXT_SWITCH_*_INFO from CP to the following dwords,
and forcibly switch to the indicated context.
These first appear in a650_sqe.bin. They can in theory be used
to loop any sequence of IB1 commands, but in practice they are
used to loop over bins. There is a fixed-size per-iteration
prefix, used to set per-bin state, and then the following IB1
commands are executed until CP_END_BIN which are always the same
for each iteration and usually contain a list of
CP_INDIRECT_BUFFER calls to IB2 commands which setup state and
execute restore/draw/save commands. This replaces the previous
technique of just repeating the CP_INDIRECT_BUFFER calls and
"unrolling" the loop.
Make next dword 1 to disable preemption, 0 to re-enable it.
Can clear BV/BR counters, or wait until one catches up to another
Clears, adds to local, or adds to global timestamp
Write to a scratch memory that is read by CP_REG_TEST with
SOURCE_SCRATCH_MEM set. It's not the same scratch as scratch registers.
However it uses the same memory space.
Executes an array of fixed-size command buffers where each
buffer is assumed to have one draw call, skipping buffers with
non-visible draw calls.
Reset various on-chip state used for synchronization
Load state, a3xx (and later?)
inline with the CP_LOAD_STATE packet
in buffer pointed to by EXT_SRC_ADDR
Load state, a4xx+
Load state, a6xx+
SS6_UBO used by the a6xx vulkan blob with tesselation constants
in this case, EXT_SRC_ADDR is (ubo_id shl 16 | offset)
to load constants from a UBO loaded with DST_OFF = 14 and offset 0,
EXT_SRC_ADDR = 0xe0000
(offset is a guess, should be in bytes given that maxUniformBufferRange=64k)
DST_OFF same as in CP_LOAD_STATE6 - vec4 VS const at this offset will
be updated for each draw to {draw_id, first_vertex, first_instance, 0}
value of 0 disables it
Read a 64-bit value at the given address and
test if it equals/doesn't equal 0.
value at offset 0 always seems to be 0x00000000..
Like CP_SET_BIN_DATA5, but set the pointers as offsets from the
pointers stored in VSC_PIPE_{DATA,DATA2,SIZE}_ADDRESS. Useful
for Vulkan where these values aren't known when the command
stream is recorded.
Modifies DST_REG using two sources that can either be registers
or immediates. If SRC1_ADD is set, then do the following:
$dst = (($dst & $src0) rot $rotate) + $src1
Otherwise:
$dst = (($dst & $src0) rot $rotate) | $src1
Here "rot" means rotate left.
Like CP_REG_TO_MEM, but the memory address to write to can be
offsetted using either one or two registers or scratch
registers.
Like CP_REG_TO_MEM, but the memory address to write to can be
offsetted using a DWORD in memory.
Wait until a memory value is greater than or equal to the
reference, using signed comparison.
This uses the same internal comparison as CP_COND_WRITE,
but waits until the comparison is true instead. It busy-loops in
the CP for the given number of cycles before trying again.
Waits for REG0 to not be 0 or REG1 to not equal REF
Tell CP the current operation mode, indicates save and restore procedure
Set internal CP registers, used to indicate context save data addresses
Tests bit in specified register and sets predicate for CP_COND_REG_EXEC.
So:
opcode: CP_REG_TEST (39) (2 dwords)
{ REG = 0xc10 | BIT = 0 }
0000: 70b90001 00000c10
opcode: CP_COND_REG_EXEC (47) (3 dwords)
0000: 70c70002 10000000 00000004
opcode: CP_INDIRECT_BUFFER (3f) (4 dwords)
Will execute the CP_INDIRECT_BUFFER only if b0 in the register at
offset 0x0c10 is 1
Executes the following DWORDs of commands if the dword at ADDR0
is not equal to 0 and the dword at ADDR1 is less than REF
(signed comparison).
Used by the userspace driver to set various IB's which are
executed during context save/restore for handling
state that isn't restored by the
context switch routine itself.
Executed unconditionally when switching back to the context.
Executed when switching back after switching
away during execution of
a CP_SET_MARKER packet with RM6_YIELD as the
payload *and* the normal save routine was
bypassed for a shorter one. I think this is
connected to the "skipsaverestore" bit set by
the kernel when preempting.
Executed when switching away from the context,
except for context switches initiated via
CP_YIELD.
This can only be set by the RB (i.e. the kernel)
and executes with protected mode off, but
is otherwise similar to SAVE_IB.
Note, kgsl calls this CP_KMD_AMBLE_TYPE
Keep shadow copies of these registers and only set them
when drawing, avoiding redundant writes:
- VPC_CNTL_0
- HLSQ_CONTROL_1_REG
- HLSQ_UNKNOWN_B980
Track RB_RENDER_CNTL, and insert a WFI in the following
situation:
- There is a write that disables binning
- There was a draw with binning left enabled, but in
BYPASS mode
Presumably this is a hang workaround?
Do a mysterious CP_EVENT_WRITE 0x3f when the low bit of
the data to write is 0. Used by the Vulkan blob with
PC_MULTIVIEW_CNTL, but this isn't predicated on particular
register(s) like the others.
Tracks GRAS_LRZ_CNTL::GREATER, GRAS_LRZ_CNTL::DIR, and
GRAS_LRZ_DEPTH_VIEW with previous values, and if one of
the following is true:
- GRAS_LRZ_CNTL::GREATER has changed
- GRAS_LRZ_CNTL::DIR has changed, the old value is not
CUR_DIR_GE, and the new value is not CUR_DIR_DISABLED
- GRAS_LRZ_DEPTH_VIEW has changed
then it does a LRZ_FLUSH with GRAS_LRZ_CNTL::ENABLE
forced to 1.
Only exists in a650_sqe.fw.
Note that the SMMU's definition of TTBRn can take different forms
depending on the pgtable format. But a5xx+ only uses aarch64
format.
Unused, does not apply to aarch64 pgtable format
Size of prefix for each bin. For each bin index i, the
prefix commands at PREFIX_ADDR + i * PREFIX_DWORDS are
executed in an IB2 before the IB1 commands following
this packet.
Number of dwords after this packet until CP_END_BIN
Best guess is that it is a faster way to fetch all the VSC_STATE registers
and keep them in a local scratch memory instead of fetching every time
when skipping IBs.
Scratch memory size is 48 dwords`