Hmm, very interesting thread
I can fill in some info about how clx2 works
The clx2 is broken into 2 parts, the TA (tile accelerator) and the CORE (rasteriser).
On dreamcast normally a game generates 3d geometry with the cpu (frame n+2), sends 3d geometry to TA using DMA (frame n+1) and renders a frame using CORE (frame n).Probably the SGX works in a similar 'pipelined' fashion...
TA takes as input screen-space vertex strips, quads ('sprites') and modifier volumes and converts em to the format CORE understands. That’s all it does really.
TA has inputs for fp32 colours (various modes), 8888 colours, 16 or 32 bit vertex coordinates and other formats. TA can handle strips of any size, single quads (well, it can do quad-lists but not quad-strips), and triangle-lists (for modifier volumes only).TA has some internal memory to store information per tile (iirc maximum render target size is 2048x2048, with 32x32 tiles).TA converts vertexes to the CORE format, splits strips to the CORE format (only up to 5 triangles supported natively), stores vertexes to memory and generates the display lists for the tiles (in a linked-list-like format).TA can also do some basic clipping (by not including the geometry data on tile lists, only works on 32 pixel units).TA calculates the bounding box of the triangle(s) and uses that to generate the display lists.
Region array (this must be generated by the cpu, its on vram)
The region array stores information about how CORE should render the tiles. For each tile it includes the number of passes to do, if the buffers should be cleared, and has pointers to the display lists.
CORE is split in two parts, the ISP (Image Synthesiser Processor) and TSP (Texture and Shading Processor).ISP does z buffering to an internal z buffer, then generates spans from it, RLE compresses the spans and sends em to TSP for calculation of the colour/etc. After all opaque triangles are processed alpha-tested triangles follow and then alpha-blended triangles. For alpha-tested polygons the ISP has to work in parallel with the TSP (to drop the pixels that fail the alpha test).For alpha blended polygons ISP does multipass processing (using layer peeling) and sends each layer to be rendered on the TSP in the correct order (always in RLE spans). This is done for all passes as described on the region array. After processing is done the tile colour buffer is written to the memory, with possible 2:1 down sampling (for 2x AA on the x direction) and colour conversion (internal buffers are argb8888, output can be 8888/4444/565/1555/0555). The z buffer is lost after that and processing for the next tile begins.
Tiles on CLX2 are 32x32.The pc version used 16x32 tiles.
The SGX should work more or less similarly. TA (or w/e its called now :p) is most likely fed directly from the geometry processing stage (vertex shaders or geometry shaders or whatever ...) and stores display-lists in ram . The rendering engine reads from ram, renders to internal buffers which are written back to ram when a tile is done. Z-buffer write back seems to be possible (it was possible on Kyro2 too) but its optional (and that’s why the docs suggest to clear the z buffer always). There's probably still some fixed function blocks for Z processing and such (doesn’t really make sense to waste programmable resources on em, that’s the 'ISP' part) and TSP seems to be totally replaced by the USSE units.
Also, about the 14 MTriangles that the SGX is supposed to handle, the CLX2 was marketed as up to 7 MTriangles and games never used 1 Million. The best games I have seen use around 800 K …