Configures the mapping between VSC_PIPE buffer and
bin, X/Y specify the bin index in the horiz/vert
direction (0,0 is upper left, 0,1 is leftmost bin
on second row, and so on). W/H specify the number
of bins assigned to this VSC_PIPE in the horiz/vert
dimension.
LRZ: (Low Resolution Z ??)
----
I think it serves two functions, early discard of primitives in binning
pass without needing full resolution depth buffer, and also functions as
a depth-prepass, used during the GMEM draws to discard primitives that
would not be visible due to later draws.
The LRZ buffer always seems to be z16 format, regardless of actual
depth buffer format.
Note that LRZ write should be disabled when blend/stencil/etc is enabled,
since the occluded primitive can still contribute to final color value
of a fragment.
Only enabled for GL_LESS/GL_LEQUAL/GL_GREATER/GL_GEQUAL?
LRZ write also disabled for blend/etc.
update MAX instead of MIN value, ie. GL_GREATER/GL_GEQUAL
Pitch is depth width (in pixels) / 8 (aligned to 32). Height
is also divided by 8 (ie. covers 8x8 pixels)
Z_READ_ENABLE bit is set for zfunc other than GL_ALWAYS or GL_NEVER
stride of depth/stencil buffer
size of layer
Blits:
------
Blits are triggered by CP_EVENT_WRITE:BLIT, compared to previous
generations where they shared most of the gl pipeline and were
triggered by CP_DRAW_INDX*
For gmem->mem blob uses RB_BLIT_CNTL.BUF to specify src of
blit (ie MRTn, ZS, etc) and RB_BLIT_DST_LO/HI for destination
gpuaddr. The gmem offset is taken from RB_MRT[n].BASE_LO/HI
For mem->gmem blob uses just MRT0 or ZS and RB_BLIT_DST_LO/HI
for the GMEM offset, and gpuaddr from RB_MRT[0].BASE_LO/HI
(I suppose this is just to avoid trashing RB_MRT[1..7]??)
For MASK, if RB_BLIT_CNTL.BUF=BLIT_ZS:
1 - depth
2 - stencil
3 - depth+stencil
if RB_BLIT_CNTL.BUF=BLIT_MRTn
then probably a component mask, I always see 0xf
Buffer Metadata (flag buffers):
-------------------------------
Blob seems to stick some metadata at the front of the buffer,
both z/s and MRT. I think this is same as UBWC (bandwidth
compression) metadata that mdp 1.7 and later supports. See
1d3fae5698ce5358caab87a15383b690941697e8 in downstream kernel.
UBWC seems to stand for "universal bandwidth compression".
Before glReadPixels() it does a pair of BYPASS blits (at least
if metadata is used) presumably to resolve metadata.
NOTES: see: getUBwcBlockSize(), getUBwcMetaBufferSize() at
https://android.googlesource.com/platform/hardware/qcom/display/+/android-6.0.1_r40/msm8994/libgralloc/alloc_controller.cpp
(note that bpp in bytes, not bits, so really cpp)
Example Layout 2d w/ mipmap levels:
100x2000, ifmt=GL_RG, fmt=GL_RG16F, type=GL_FLOAT, meta=64x512@0x8000 (7x500)
base=c072e000, offset=16384, size=1703936
color flags
0 c073a000 c0732000 - level 0 flags is address
1 c0838000 c0834000 programmed in texture state
2 c0879000 c0877000
3 c089a000 c0899000
4 c08ab000 c08aa000
5 c08b4000 c08b3000
6 c08b9000 c08b8000
7 c08bc000 c08bb000
8 c08be000 c08bd000
9 c08c0000 c08bf000
10 c08c2000 c08c1000
ARRAY_PITCH is the combined size of all the levels plus flags,
so 0xc08c3000 - 0xc0732000 = 0x00191000 (1642496); each level
takes up a minimum of 2 pages (since color and flags parts are
each page aligned.
{ TILE_MODE = TILE5_3 | SWIZ_X = A5XX_TEX_X | SWIZ_Y = A5XX_TEX_Y | SWIZ_Z = A5XX_TEX_ZERO | SWIZ_W = A5XX_TEX_ONE | MIPLVLS = 0 | FMT = TFMT5_16_16_FLOAT | SWAP = WZYX }
{ WIDTH = 100 | HEIGHT = 2000 }
{ FETCHSIZE = TFETCH5_4_BYTE | PITCH = 512 | TYPE = A5XX_TEX_2D }
{ ARRAY_PITCH = 1642496 | 0x18800000 } - NOTE c2dc always has 0x18800000 but
{ BASE_LO = 0xc0732000 } this varies for blob gles driver..
{ BASE_HI = 0 | DEPTH = 1 } not sure what it is
num of varyings plus four for gl_Position (plus one if gl_PointSize)
plus # of transform-feedback (streamout) varyings if using the
hw streamout (rather than stg instructions in shader)
Stream-Out:
-----------
VPC_SO[0..3] registers setup details about streamout buffers, and
number of components to write to each.
VPC_SO_PROG provides the mapping between output varyings and the SO
buffers. It is written multiple times (via a CP_CONTEXT_REG_BUNCH
packet, not sure if that matters), each write can handle up to two
components of stream-out output. Order matches up to OUTLOC,
including padding. So, if outputting first 3 varyings:
SP_VS_OUT[0].REG: { A_REGID = r0.w | A_COMPMASK = 0xf | B_REGID = r0.x | B_COMPMASK = 0x7 }
SP_VS_OUT[0x1].REG: { A_REGID = r1.w | A_COMPMASK = 0x3 | B_REGID = r2.y | B_COMPMASK = 0xf }
SP_VS_VPC_DST[0].REG: { OUTLOC0 = 0 | OUTLOC1 = 4 | OUTLOC2 = 8 | OUTLOC3 = 12 }
Then:
VPC_SO_PROG: { A_BUF = 0 | A_OFF = 0 | A_EN | A_BUF = 0 | B_OFF = 4 | B_EN }
VPC_SO_PROG: { A_BUF = 0 | A_OFF = 8 | A_EN | A_BUF = 0 | B_OFF = 12 | B_EN }
VPC_SO_PROG: { A_BUF = 2 | A_OFF = 0 | A_EN | A_BUF = 2 | B_OFF = 4 | B_EN }
VPC_SO_PROG: { A_BUF = 2 | A_OFF = 8 | A_EN | A_BUF = 0 | B_OFF = 0 }
VPC_SO_PROG: { A_BUF = 1 | A_OFF = 0 | A_EN | A_BUF = 1 | B_OFF = 4 | B_EN }
Note that varying order is OUTLOC0, OUTLOC2, OUTLOC1, and note
the padding between OUTLOC1 and OUTLOC2.
The BUF bitfield indicates which of the four streamout buffers
to write into at the specified offset.
The VPC_SO[n].FLUSH_BASE_LO/HI is used for hw to write back next
offset which gets loaded back into VPC_SO[n].BUFFER_OFFSET via a
CP_MEM_TO_REG. Probably can be ignored until we have GS/etc, at
which point we can't calculate the offset on the CPU.
The size of memory that ldp/stp can address.
Guessing that this is the same as a3xx/a6xx.
per MRT
Texture sampler dwords
Texture constant dwords
Pitch in bytes (so actually stride)
Pitch in bytes (so actually stride)