Configures the mapping between VSC_PIPE buffer and bin, X/Y specify the bin index in the horiz/vert direction (0,0 is upper left, 0,1 is leftmost bin on second row, and so on). W/H specify the number of bins assigned to this VSC_PIPE in the horiz/vert dimension. LRZ: (Low Resolution Z ??) ---- I think it serves two functions, early discard of primitives in binning pass without needing full resolution depth buffer, and also functions as a depth-prepass, used during the GMEM draws to discard primitives that would not be visible due to later draws. The LRZ buffer always seems to be z16 format, regardless of actual depth buffer format. Note that LRZ write should be disabled when blend/stencil/etc is enabled, since the occluded primitive can still contribute to final color value of a fragment. Only enabled for GL_LESS/GL_LEQUAL/GL_GREATER/GL_GEQUAL? LRZ write also disabled for blend/etc. update MAX instead of MIN value, ie. GL_GREATER/GL_GEQUAL Pitch is depth width (in pixels) / 8 (aligned to 32). Height is also divided by 8 (ie. covers 8x8 pixels) Z_READ_ENABLE bit is set for zfunc other than GL_ALWAYS or GL_NEVER stride of depth/stencil buffer size of layer Blits: ------ Blits are triggered by CP_EVENT_WRITE:BLIT, compared to previous generations where they shared most of the gl pipeline and were triggered by CP_DRAW_INDX* For gmem->mem blob uses RB_BLIT_CNTL.BUF to specify src of blit (ie MRTn, ZS, etc) and RB_BLIT_DST_LO/HI for destination gpuaddr. The gmem offset is taken from RB_MRT[n].BASE_LO/HI For mem->gmem blob uses just MRT0 or ZS and RB_BLIT_DST_LO/HI for the GMEM offset, and gpuaddr from RB_MRT[0].BASE_LO/HI (I suppose this is just to avoid trashing RB_MRT[1..7]??) For MASK, if RB_BLIT_CNTL.BUF=BLIT_ZS: 1 - depth 2 - stencil 3 - depth+stencil if RB_BLIT_CNTL.BUF=BLIT_MRTn then probably a component mask, I always see 0xf Buffer Metadata (flag buffers): ------------------------------- Blob seems to stick some metadata at the front of the buffer, both z/s and MRT. I think this is same as UBWC (bandwidth compression) metadata that mdp 1.7 and later supports. See 1d3fae5698ce5358caab87a15383b690941697e8 in downstream kernel. UBWC seems to stand for "universal bandwidth compression". Before glReadPixels() it does a pair of BYPASS blits (at least if metadata is used) presumably to resolve metadata. NOTES: see: getUBwcBlockSize(), getUBwcMetaBufferSize() at https://android.googlesource.com/platform/hardware/qcom/display/+/android-6.0.1_r40/msm8994/libgralloc/alloc_controller.cpp (note that bpp in bytes, not bits, so really cpp) Example Layout 2d w/ mipmap levels: 100x2000, ifmt=GL_RG, fmt=GL_RG16F, type=GL_FLOAT, meta=64x512@0x8000 (7x500) base=c072e000, offset=16384, size=1703936 color flags 0 c073a000 c0732000 - level 0 flags is address 1 c0838000 c0834000 programmed in texture state 2 c0879000 c0877000 3 c089a000 c0899000 4 c08ab000 c08aa000 5 c08b4000 c08b3000 6 c08b9000 c08b8000 7 c08bc000 c08bb000 8 c08be000 c08bd000 9 c08c0000 c08bf000 10 c08c2000 c08c1000 ARRAY_PITCH is the combined size of all the levels plus flags, so 0xc08c3000 - 0xc0732000 = 0x00191000 (1642496); each level takes up a minimum of 2 pages (since color and flags parts are each page aligned. { TILE_MODE = TILE5_3 | SWIZ_X = A5XX_TEX_X | SWIZ_Y = A5XX_TEX_Y | SWIZ_Z = A5XX_TEX_ZERO | SWIZ_W = A5XX_TEX_ONE | MIPLVLS = 0 | FMT = TFMT5_16_16_FLOAT | SWAP = WZYX } { WIDTH = 100 | HEIGHT = 2000 } { FETCHSIZE = TFETCH5_4_BYTE | PITCH = 512 | TYPE = A5XX_TEX_2D } { ARRAY_PITCH = 1642496 | 0x18800000 } - NOTE c2dc always has 0x18800000 but { BASE_LO = 0xc0732000 } this varies for blob gles driver.. { BASE_HI = 0 | DEPTH = 1 } not sure what it is num of varyings plus four for gl_Position (plus one if gl_PointSize) plus # of transform-feedback (streamout) varyings if using the hw streamout (rather than stg instructions in shader) Stream-Out: ----------- VPC_SO[0..3] registers setup details about streamout buffers, and number of components to write to each. VPC_SO_PROG provides the mapping between output varyings and the SO buffers. It is written multiple times (via a CP_CONTEXT_REG_BUNCH packet, not sure if that matters), each write can handle up to two components of stream-out output. Order matches up to OUTLOC, including padding. So, if outputting first 3 varyings: SP_VS_OUT[0].REG: { A_REGID = r0.w | A_COMPMASK = 0xf | B_REGID = r0.x | B_COMPMASK = 0x7 } SP_VS_OUT[0x1].REG: { A_REGID = r1.w | A_COMPMASK = 0x3 | B_REGID = r2.y | B_COMPMASK = 0xf } SP_VS_VPC_DST[0].REG: { OUTLOC0 = 0 | OUTLOC1 = 4 | OUTLOC2 = 8 | OUTLOC3 = 12 } Then: VPC_SO_PROG: { A_BUF = 0 | A_OFF = 0 | A_EN | A_BUF = 0 | B_OFF = 4 | B_EN } VPC_SO_PROG: { A_BUF = 0 | A_OFF = 8 | A_EN | A_BUF = 0 | B_OFF = 12 | B_EN } VPC_SO_PROG: { A_BUF = 2 | A_OFF = 0 | A_EN | A_BUF = 2 | B_OFF = 4 | B_EN } VPC_SO_PROG: { A_BUF = 2 | A_OFF = 8 | A_EN | A_BUF = 0 | B_OFF = 0 } VPC_SO_PROG: { A_BUF = 1 | A_OFF = 0 | A_EN | A_BUF = 1 | B_OFF = 4 | B_EN } Note that varying order is OUTLOC0, OUTLOC2, OUTLOC1, and note the padding between OUTLOC1 and OUTLOC2. The BUF bitfield indicates which of the four streamout buffers to write into at the specified offset. The VPC_SO[n].FLUSH_BASE_LO/HI is used for hw to write back next offset which gets loaded back into VPC_SO[n].BUFFER_OFFSET via a CP_MEM_TO_REG. Probably can be ignored until we have GS/etc, at which point we can't calculate the offset on the CPU. The size of memory that ldp/stp can address. Guessing that this is the same as a3xx/a6xx. per MRT Texture sampler dwords Texture constant dwords Pitch in bytes (so actually stride) Pitch in bytes (so actually stride)