Flushes dirty data from UCHE, and also writes a GPU timestamp to the address if one is provided. If A6XX_RB_SAMPLE_COUNT_CONTROL.copy is true, writes OQ Z passed sample counts to RB_SAMPLE_COUNT_ADDR. This writes to main memory, skipping UCHE. Writes the GPU timestamp to the address that follows, once RB access and flushes are complete. Invalidates depth attachment data from the CCU. We assume this happens in the last stage. Invalidates color attachment data from the CCU. We assume this happens in the last stage. Flushes the small cache used by CP_EVENT_WRITE::BLIT (which, along with its registers, would be better named RESOLVE). Flushes depth attachment data from the CCU. We assume this happens in the last stage. Flushes color attachment data from the CCU. We assume this happens in the last stage. 2D blit to resolve GMEM to system memory (skipping CCU) at the end of a render pass. Compare to CP_BLIT's BLIT_OP_SCALE for more general blitting. Clears based on GRAS_LRZ_CNTL configuration, could clear fast-clear buffer or LRZ direction. LRZ direction is stored at lrz_fc_offset + 0x200, has 1 byte which could be expressed by enum: CUR_DIR_DISABLED = 0x0 CUR_DIR_GE = 0x1 CUR_DIR_LE = 0x2 CUR_DIR_UNSET = 0x3 Clear of direction means setting the direction to CUR_DIR_UNSET. Invalidates UCHE. Doesn't seem to do anything initialize CP's micro-engine skip N 32-bit words to get to the next packet indirect buffer dispatch. prefetch parser uses this packet type to determine whether to pre-fetch the IB Takes the same arguments as CP_INDIRECT_BUFFER, but jumps to another buffer at the same level. Must be at the end of IB, and doesn't work with draw state IB's. indirect buffer dispatch. same as IB, but init is pipelined Waits for the IDLE state of the engine before further drawing. This is pipelined, so the CP may continue. wait until a register or memory location is a specific value wait until a register location is equal to a specific value wait until a register location is >= a specific value wait until a read completes wait until all base/size writes from an IB_PFD packet have completed register read/modify/write Set binning configuration registers reads register in chip and writes to memory write N 32-bit words to memory write CP_PROG_COUNTER value to memory conditional execution of a sequence of packets conditional write to memory or register generate an event that creates a write to memory when completed generate a VS|PS_done event generate a cache flush done event generate a z_pass done event not sure the real name, but this seems to be what is used for opencl, instead of CP_DRAW_INDX.. initiate fetch of index buffer and draw draw using supplied indices in packet initiate fetch of index buffer and binIDs and draw initiate fetch of bin IDs and draw using supplied indices begin/end initiator for viz query extent processing fetch state sub-blocks and initiate shader code DMAs load constant into chip and to memory load sequencer instruction memory (pointer-based) load sequencer instruction memory (code embedded in packet) load constants from a location in memory selective invalidation of state pointers dynamically changes shader instruction memory partition sets the 64-bit BIN_MASK register in the PFP sets the 64-bit BIN_SELECT register in the PFP updates the current context, if needed generate interrupt from the command stream copy sequencer instruction memory to system memory sets draw initiator flags register in PFP, gets bitwise-ORed into every draw initiator sets the register protection mode load high level sequencer command Conditionally load a IB based on a flag, prefetch enabled Conditionally load a IB based on a flag, prefetch disabled Load a buffer with pre-fetch enabled Set bin (?) test 2 memory locations to dword values specified Write register, ignoring context state for context sensitive registers Record the real-time when this packet is processed by PFP PFP waits until the FIFO between the PFP and the ME is empty Used a bit like CP_SET_CONSTANT on a2xx, but can write multiple groups of registers. Looks like it can be used to create state objects in GPU memory, and on state change only emit pointer (via CP_SET_DRAW_STATE), which should be nice for reducing CPU overhead: (A4x) save PM4 stream pointers to execute upon a visible draw Enable or disable predication globally. Also resets the predicate to "passing" and the local bit to enabled when enabling global predication. Enable or disable predication locally. Unlike globally enabling predication, this packet doesn't touch any other state. Predication only happens when enabled globally and locally and a predicate has been set. This should be used for internal draws which aren't supposed to use the predication state: CP_DRAW_PRED_ENABLE_LOCAL(0) ... do draw... CP_DRAW_PRED_ENABLE_LOCAL(1) Latch a draw predicate into the internal register. for A4xx Write to register with address that does not fit into type-0 pkt copy from ME scratch RAM to a register Copy from REG to ME scratch RAM Wait for memory writes to complete Conditional execution based on register comparison Memory to REG copy for a5xx Tells CP the current mode of GPU operation Instruct CP to set a few internal CP registers Enables IB2 skipping. If both GLOBAL and LOCAL are 1 and nothing is left in the visibility stream, then CP_INDIRECT_BUFFER will be skipped, and draws will early return from their IB. General purpose 2D blit engine for image transfers and mipmap generation. Reads through UCHE, writes through the CCU cache in the PS stage. Write CP_CONTEXT_SWITCH_*_INFO from CP to the following dwords, and forcibly switch to the indicated context. These first appear in a650_sqe.bin. They can in theory be used to loop any sequence of IB1 commands, but in practice they are used to loop over bins. There is a fixed-size per-iteration prefix, used to set per-bin state, and then the following IB1 commands are executed until CP_END_BIN which are always the same for each iteration and usually contain a list of CP_INDIRECT_BUFFER calls to IB2 commands which setup state and execute restore/draw/save commands. This replaces the previous technique of just repeating the CP_INDIRECT_BUFFER calls and "unrolling" the loop. Make next dword 1 to disable preemption, 0 to re-enable it. Can clear BV/BR counters, or wait until one catches up to another Clears, adds to local, or adds to global timestamp Write to a scratch memory that is read by CP_REG_TEST with SOURCE_SCRATCH_MEM set. It's not the same scratch as scratch registers. However it uses the same memory space. Executes an array of fixed-size command buffers where each buffer is assumed to have one draw call, skipping buffers with non-visible draw calls. Reset various on-chip state used for synchronization Load state, a3xx (and later?) inline with the CP_LOAD_STATE packet in buffer pointed to by EXT_SRC_ADDR Load state, a4xx+ Load state, a6xx+ SS6_UBO used by the a6xx vulkan blob with tesselation constants in this case, EXT_SRC_ADDR is (ubo_id shl 16 | offset) to load constants from a UBO loaded with DST_OFF = 14 and offset 0, EXT_SRC_ADDR = 0xe0000 (offset is a guess, should be in bytes given that maxUniformBufferRange=64k) DST_OFF same as in CP_LOAD_STATE6 - vec4 VS const at this offset will be updated for each draw to {draw_id, first_vertex, first_instance, 0} value of 0 disables it Read a 64-bit value at the given address and test if it equals/doesn't equal 0. value at offset 0 always seems to be 0x00000000.. Like CP_SET_BIN_DATA5, but set the pointers as offsets from the pointers stored in VSC_PIPE_{DATA,DATA2,SIZE}_ADDRESS. Useful for Vulkan where these values aren't known when the command stream is recorded. Modifies DST_REG using two sources that can either be registers or immediates. If SRC1_ADD is set, then do the following: $dst = (($dst & $src0) rot $rotate) + $src1 Otherwise: $dst = (($dst & $src0) rot $rotate) | $src1 Here "rot" means rotate left. Like CP_REG_TO_MEM, but the memory address to write to can be offsetted using either one or two registers or scratch registers. Like CP_REG_TO_MEM, but the memory address to write to can be offsetted using a DWORD in memory. Wait until a memory value is greater than or equal to the reference, using signed comparison. This uses the same internal comparison as CP_COND_WRITE, but waits until the comparison is true instead. It busy-loops in the CP for the given number of cycles before trying again. Waits for REG0 to not be 0 or REG1 to not equal REF Tell CP the current operation mode, indicates save and restore procedure Set internal CP registers, used to indicate context save data addresses Tests bit in specified register and sets predicate for CP_COND_REG_EXEC. So: opcode: CP_REG_TEST (39) (2 dwords) { REG = 0xc10 | BIT = 0 } 0000: 70b90001 00000c10 opcode: CP_COND_REG_EXEC (47) (3 dwords) 0000: 70c70002 10000000 00000004 opcode: CP_INDIRECT_BUFFER (3f) (4 dwords) Will execute the CP_INDIRECT_BUFFER only if b0 in the register at offset 0x0c10 is 1 Executes the following DWORDs of commands if the dword at ADDR0 is not equal to 0 and the dword at ADDR1 is less than REF (signed comparison). Used by the userspace driver to set various IB's which are executed during context save/restore for handling state that isn't restored by the context switch routine itself. Executed unconditionally when switching back to the context. Executed when switching back after switching away during execution of a CP_SET_MARKER packet with RM6_YIELD as the payload *and* the normal save routine was bypassed for a shorter one. I think this is connected to the "skipsaverestore" bit set by the kernel when preempting. Executed when switching away from the context, except for context switches initiated via CP_YIELD. This can only be set by the RB (i.e. the kernel) and executes with protected mode off, but is otherwise similar to SAVE_IB. Note, kgsl calls this CP_KMD_AMBLE_TYPE Keep shadow copies of these registers and only set them when drawing, avoiding redundant writes: - VPC_CNTL_0 - HLSQ_CONTROL_1_REG - HLSQ_UNKNOWN_B980 Track RB_RENDER_CNTL, and insert a WFI in the following situation: - There is a write that disables binning - There was a draw with binning left enabled, but in BYPASS mode Presumably this is a hang workaround? Do a mysterious CP_EVENT_WRITE 0x3f when the low bit of the data to write is 0. Used by the Vulkan blob with PC_MULTIVIEW_CNTL, but this isn't predicated on particular register(s) like the others. Tracks GRAS_LRZ_CNTL::GREATER, GRAS_LRZ_CNTL::DIR, and GRAS_LRZ_DEPTH_VIEW with previous values, and if one of the following is true: - GRAS_LRZ_CNTL::GREATER has changed - GRAS_LRZ_CNTL::DIR has changed, the old value is not CUR_DIR_GE, and the new value is not CUR_DIR_DISABLED - GRAS_LRZ_DEPTH_VIEW has changed then it does a LRZ_FLUSH with GRAS_LRZ_CNTL::ENABLE forced to 1. Only exists in a650_sqe.fw. Note that the SMMU's definition of TTBRn can take different forms depending on the pgtable format. But a5xx+ only uses aarch64 format. Unused, does not apply to aarch64 pgtable format Size of prefix for each bin. For each bin index i, the prefix commands at PREFIX_ADDR + i * PREFIX_DWORDS are executed in an IB2 before the IB1 commands following this packet. Number of dwords after this packet until CP_END_BIN Best guess is that it is a faster way to fetch all the VSC_STATE registers and keep them in a local scratch memory instead of fetching every time when skipping IBs. Scratch memory size is 48 dwords`