HomeForumsWhat's newResources 
 
 
SCU DSP for matrix transformation?
XL2 - Oct 31, 2017

 1  2  Next 

   XL2 Oct 31, 2017 
So I've started to work on a BSP/PVS solution (very early) for my game, which made me realize that I could also save several clock cycles by changing the 3d implementation to skip the z-sort (I'm not saying I will do it, just that it would be interesting to look at).
According to Sega's documentation, it's possible to use the SCU DSP for matrix transformation and Sega suggested back in 1995 to use the SCU DSP for the matrix and the SH2 for the polygon processing in parallel.
Sega even give an example of assembly code to do the matrix transformation. (https://antime.kapsi.fi/sega/files/ST-240-A-042795...)
Now, I know that SGL doesn't support matrix transformation with the SCU DSP and the SGL functions for the SCU DSP seem pretty much useless for almost everything, but before I waste too much time one this, as anyone tried doing it?
The SCU DSP doesn't support divisions, but it can do multiplications/additions, so it should be fine for matrix transformation, and even it's slower than the SH2, it won't need to do slower operations such as nearclipping/light normals processing/gouraud shading processing, so it might still be possible to keep it synched.
It seems like it can hold 256 sint32 values, which is more than what my maps have on each quad planes (since each quad is like 64 pixels wide and each plane is 256 pixels wide, I should be fine)
It would require writing a new 3d implementation or using the obscure one from SBL that nobody used (AFAIK), but it would still be nice to know about others' experience with it.

Thanks!

   mrkotfw Nov 1, 2017 
It's not 100% clear to me what you want the SCU DSP to do.

Do you want the SCU DSP to spit out 4 projected points while the CPU walks the BSP tree and feeds the SCU DSP more quads (points)?

I would also really look at the assembly output and see if you can optimize on the CPU end before thinking about the DSP. More specifically, how well the CPU cache is being used, and if any of the SH-2 DSP instructions are being used effectively (reordering to avoid pipeline stalls).

It's a pain in the ass...

I was also thinking that you could split the slave CPU cache into two and have it spin for jobs. The jobs would be batches (< 2KiB) worth of points to project. When done, DMA to HWRAM.

   XL2 Nov 1, 2017 

  mrkotfw said:


Well, quads and points are 2 different things since you can use the same point in a couple of quads. Each point is also 3 int32 values (x,y,z).
The idea would be to transform a small list of points (like 10) from on BSP leaf, while the CPU does other things, like uncompressing the BSP tree data, frustrum culling, walking the tree, calculating lightning, gouraud, stuff like that.

Sega suggested using the SCU DSP for matrix transformation (so, the points), so I'm wondering if anyone tried and if so, does it create issues.

Like you said, optimizing the CPU code first is the most important step, but before I move too far in one direction I'd like to know my options and try to plan a use for the SCU DSP.

   mrkotfw Nov 2, 2017 
To really answer your question, I don't think anyone has really tried the SCU DSP. Except for maybe Rockin'-B, but he's been MIA for a few years.

How about also do some tests and see if it's worth all the trouble? Maybe it's faster to do it on the slave CPU since you still have to do the perspective divide on the CPU.

   XL2 Nov 2, 2017 

  mrkotfw said:

Yeah, if nobody tried it I might, but I need to plan ahead.
I'll dig deeper in Sega's documentation.
If I ever make it work I'll make sure to update the fps demo with it.

   mrkotfw Nov 2, 2017 
The SCU DSP documentation has wrong information (opcodes) I believe. So be wary of that...

   XL2 Nov 3, 2017 
I don't know who said that Quake on Saturn didn't use the SCU DSP, but it seems like it does actually according to Yabause.
I've yet to learn assembly, so I'm not sure what it is exactly, but it seems to involve several multiplications/additions.
Could it be matrix transformation?

   mrkotfw Nov 3, 2017 
Can you dump the 1KiB and disassemble it using antime's SCU DSP disassembler?

You can probably use the DSP as a non-VLIW arch at the beginning.

I myself don't fully understand the idea how memory is segmented.

   XL2 Nov 3, 2017 
Code:
  
000: 00001c10 nop nop nop mov 10,CT0 001: 81000000 mvi 1000000,MC0 002: 00001e01 nop nop nop mov 1,CT2 003: 88010000 mvi 10000,MC2 004: 00001e03 nop nop nop mov 3,CT2 005: 88000001 mvi 1,MC2 006: 88000100 mvi 100,MC2 007: 88010000 mvi 10000,MC2 008: 94800000 mvi 800000,PL 009: 10040000 add nop mov ALU,A nop 00a: 28003209 sl nop nop mov ALL,MC2 00b: 00001c00 nop nop nop mov 0,CT0 00c: 00003600 nop nop nop mov M0,RA0 00d: 00001c01 nop nop nop mov 1,CT0 00e: c001000f dma2 D0,MC0,f 00f: d340000f jmp T0,f 010: 00001c0e nop nop nop mov e,CT0 011: 00003604 nop nop nop mov MC0,RA0 012: 00003704 nop nop nop mov MC0,WA0 013: 00001c0d nop nop nop mov d,CT0 014: 00823500 nop clr A mov M0,PL 015: 08040000 or nop mov ALU,A nop 016: d208001a jmp NZ,1a 017: 00000000 nop nop nop nop 018: f8000000 endi 019: 00000000 nop nop nop nop 01a: 00001e02 nop nop nop mov 2,CT2 01b: 00801514 nop nop mov 14,PL 01c: 14040000 sub nop mov ALU,A nop 01d: d2100023 jmp NS,23 01e: 00000000 nop nop nop nop 01f: 00023200 nop nop clr A mov M0,MC2 020: 00003009 nop nop nop mov ALL,MC0 021: d0000025 jmp 25 022: 00000000 nop nop nop nop 023: 00003009 nop nop nop mov ALL,MC0 024: 88000014 mvi 14,MC2 025: 00001d00 nop nop nop mov 0,CT1 026: c001010f dma2 D0,MC1,f 027: d3400027 jmp T0,27 028: 00001e03 nop nop nop mov 3,CT2 029: 00001d00 nop nop nop mov 0,CT1 02a: 00001f00 nop nop nop mov 0,CT3 02b: 00001c10 nop nop nop mov 10,CT0 02c: 00071a0e nop nop mov MC0,A mov e,LOP 02d: 00001b30 nop nop nop mov 30,TOP 02e: 02598000 nop mov MC1,X mov MC2,Y nop 02f: 01098000 nop mov MUL,P mov MC2,Y nop 030: 0509b309 and mov MUL,P mov MC2,Y mov ALL,MC3 031: 0509b309 and mov MUL,P mov MC2,Y mov ALL,MC3 032: 00001e03 nop nop nop mov 3,CT2 033: 0759b309 and mov MC1,X mov MUL,P mov MC2,Y mov ALL,MC3 034: e0000000 btm 035: 0509b309 and mov MUL,P mov MC2,Y mov ALL,MC3 036: 00001e04 nop nop nop mov 4,CT2 037: 00001d00 nop nop nop mov 0,CT1 038: 00021f00 nop nop clr A mov 0,CT3 039: 00001a0e nop nop nop mov e,LOP 03a: 00001b3d nop nop nop mov 3d,TOP 03b: 02788000 nop mov MC3,X mov M2,Y nop 03c: 03700000 nop mov MC3,X mov MUL,P nop nop 03d: 1b70310a ad2 mov MC3,X mov MUL,P nop mov ALH,MC1 03e: 1b70310a ad2 mov MC3,X mov MUL,P nop mov ALH,MC1 03f: 1b70310a ad2 mov MC3,X mov MUL,P nop mov ALH,MC1 040: e0000000 btm 041: 1b70310a ad2 mov MC3,X mov MUL,P nop mov ALH,MC1 042: 94000001 mvi 1,PL 043: 00001c01 nop nop nop mov 1,CT0 044: 00001e02 nop nop nop mov 2,CT2 045: 00069d00 nop nop mov M2,A mov 0,CT1 046: 14041f00 sub nop mov ALU,A mov 0,CT3 047: 00003a09 nop nop nop mov ALL,LOP 048: 00001b4b nop nop nop mov 4b,TOP 049: 00001e00 nop nop nop mov 0,CT2 04a: 00821500 nop clr A mov 0,PL 04b: 12593209 add mov MC1,X mov MC0,Y mov ALL,MC2 04c: 035b1e00 nop mov MC1,X mov MUL,P mov MC0,Y clr A mov 0,CT2 04d: 1b5d3d06 ad2 mov MC1,X mov MUL,P mov MC0,Y mov ALU,A mov MC2,CT1 04e: 1b2d0000 ad2 mov M2,X mov MUL,P mov MC0,Y mov ALU,A nop 04f: 1b5d1e00 ad2 mov MC1,X mov MUL,P mov MC0,Y mov ALU,A mov 0,CT2 050: 1b5b330a ad2 mov MC1,X mov MUL,P mov MC0,Y clr A mov ALH,MC3 051: 1b5d3d06 ad2 mov MC1,X mov MUL,P mov MC0,Y mov ALU,A mov MC2,CT1 052: 1b2d0000 ad2 mov M2,X mov MUL,P mov MC0,Y mov ALU,A nop 053: 1b5d0000 ad2 mov MC1,X mov MUL,P mov MC0,Y mov ALU,A nop 054: 1b5b330a ad2 mov MC1,X mov MUL,P mov MC0,Y clr A mov ALH,MC3 055: 1b5d0000 ad2 mov MC1,X mov MUL,P mov MC0,Y mov ALU,A nop 056: 1b2d1e00 ad2 mov M2,X mov MUL,P mov MC0,Y mov ALU,A mov 0,CT2 057: 19041c01 ad2 mov MUL,P mov ALU,A mov 1,CT0 058: 1900330a ad2 mov MUL,P nop mov ALH,MC3 059: e0000000 btm 05a: 00869503 nop mov M2,A mov 3,PL 05b: 00001f00 nop nop nop mov 0,CT3 05c: c001133c dma2 MC3,D0,3c 05d: d340005d jmp T0,5d 05e: 00000000 nop nop nop nop 05f: d0000013 jmp 13 060: 00000000 nop nop nop nop


(after 060 it's just end code)


The code seems to stay the same everytime I look at it ingame, which (I guess) means it's always using the same function.

   XL2 Nov 3, 2017 
Ok, I feel retarded : SBL (Sega Basic Library) already has all the functions in, with source code.
Including 3d processing using the SH2 and SCU DSP working in parallel.
I don't know the performance level, so maybe SGL is still faster, but it does include the source code.

From the SPR manual :

(3) USE_DSP

When USE_DSP is defined, the coordinate transform matrix
calculations are done with the DSP in parallel the SH side.
Commenting the define out disables this feature.

   WingMantis Nov 5, 2017 

  XL2 said:


Is USE_DSP defined by default in one of the required libraries, and the developers comment it out if they don't want to use it?

Or do they have to define it explicitly in their project?

I just loaded up Elan Doree in Yabause and checked the SCU-DSP debug and it just has 20 lines saying "END" so that game isn't using it.

How do we tell whether a game is using SGL or not?

   XL2 Nov 5, 2017 

  WingMantis said:


You just need to define it and call the proper functions, but I'm pretty sure SGL is faster as they already stopped supporting the SBL 3d functions in 1995/1996.

You can't tell if a game is using SGL, but only few games are as it was too little too late.

Night is supposedly using SGL, and isn't using the SCU DSP.

Games using the SCU DSP that I know of : Sonic R, Quake and Burning Rangers. They all have in common that they are late games and look amazing.

   mrkotfw Nov 6, 2017 
Have you been able to understand what Quake is doing?

   XL2 Nov 6, 2017 

  mrkotfw said:

It's almost exactly 1:1 the same code as the matrix transformation example in the SCU DSP manual / SBL SCU DSP functions, so I guess it's matrix transformation.

It could also be related to lightning.

   WingMantis Nov 7, 2017 

  XL2 said:


I got out Fighters Megamix and tried it in Yabause and it looks like that is using the SCU DSP. That was released Dec 1996 in Japan. The debug code list didn't seem as dense as what you posted for Quake though.

Can you point me to the manual that has the USE_DSP and functions etc? I was looking through some of the antime list of stuff the other day but didn't see it yet.

Thanks

   XL2 Nov 7, 2017 

  WingMantis said:

It's in the SBL folder under the MAN (manual) folder if I remember correctly. The 3d functions are under SPR in the Segalib folder (again, if I remember right).
The USE_DSP is simply in the SBL code, always on.
I never tried to display a quad in SBL, but I will look at it and maybe try to modify it a bit just to see if it could be improved.

   WingMantis Nov 7, 2017 

  XL2 said:


Can you cheat the quad 3D system by having 2 points share the same coordinates to get triangles? Obviously no speed advantage but more flexibility in 3D model design if possible

   XL2 Nov 7, 2017 

  WingMantis said:

Yes, you can. My map tool for Sonic Z-Treme does it when I detect a triangle.
But the textures get all squished and it just look bad, unless you use untextured triangles, so I would avoid it.
It's also easy to merge 2 triangles in Blender to do a quad, so you can avoid it most of the time.
Rockin B also made a texture mapping demo, but I never looked how he did it.
It will be slower than just using sprites, so I would avoid it too.

   XL2 Nov 8, 2017 
On a side-note, mixing maps from different games can lead to weird moments :




At least nobody is shooting at Sonic...

   itsstillthinking Nov 12, 2017 
Just wait, some Sonic fans are somewhere on the map waiting to do some bashing


 1  2  Next