SegaXtreme

Home	Forums	What's new	Resources

about Saturn dual CPUs

vbt - Dec 30, 2003

		vbt	Dec 30, 2003
		After playing with dual proc. sample from the SBL6. I tried to include the use of the slave processor into a simple program. The result is that it works only if I do a sprintf (???) into my slave function like this : sprintf(toto,"%08d" ,SlaveParam ); SlaveParam is used to share a parameter between Master proc and Dual proc. Is there a reason about this strange behavior ?

		antime	Dec 30, 2003
		If you're sharing data between the CPUs the most important things to remember are to declare the variables as volatile and to access them bypassing the cache. The SH2s do not feature bus snooping, so one CPU has no idea the other one wrote something to memory. They also don't maintain cache coherency, so data that is still only in the cache of one CPU is invisible to the other one. EDIT: Another important thing I forgot to mention is that memory accesses are not atomic, ie. any operation can be interrupted at any time. Therefore any shared variables or memory areas should be guarded with locks or some other signalling mechanism. Sega's libraries should provide the necessary primitives, and GCC comes with a basic set as well.

vbt

Dec 30, 2003

Originally posted by antime@Dec 30, 2003 @ 08:58 PM

If you're sharing data between the CPUs the most important things to remember are to declare the variables as volatile and to access them bypassing the cache

I declared my variable as volatile and it solved my problem

Thanks a million

Thanks also for memory access tips

		M3d10n	Dec 30, 2003
		Any evil plans for the newly found dual CPU power, VBT?

		Gallstaff	Dec 31, 2003
		How the hell do you guys learn how to do this?

		slinga	Dec 31, 2003
		Yeah these guys are impressive. What really amazes me though, is how Charles Macdonald did all of his programming WITHOUT using the SGL\SBL libraries, and he did this years ago.

		antime	Dec 31, 2003
		Do you mean Charles Doty? His programs use SGL. Charles MacDonald didn't start coding for the Saturn until I had released the C version of my (libless) copperbar sample. Bart Trzynadlowski, Tyranid and others made some programs without libraries and Azuco started on his own set of libraries. Some of the documentation (VDP1 and VDP2 manuals, plus a few others) have been available on the net since around 1997 or so, it's just a matter of being able to read and understand them.

vbt

Dec 31, 2003

Originally posted by M3d10n@Dec 31, 2003 @ 04:19 AM

Any evil plans for the newly found dual CPU power, VBT?

For now nothing, in fact on my simple tests(applied to sms plus after a useless prog) I lost speed. I registered the bg function to be used with the slave proc and while this one was running the master proc ran the sprite rendering function. Something like that :

Code:

    .... #ifdef SLAVE2 useSlave((Uint32)line, render_bg_sms); #else render_bg(line); #endif /* Draw sprites */ render_obj(line); #ifdef SLAVE2 waitSlave(); #endif ... //--------------------------------------------------------------------------------------------- void useSlave(Uint32 param, volatile Uint32 *function) { slave_command= function; SlaveParam = param; *(Uint16 *)0x21000000 = 0xffff; /* slave FRT inp invoke */ } //--------------------------------------------------------------------------------------------- void waitSlave() { while( slave_command != 0 ) ; } //---------------------------------------------------------------------------------------------

		antime	Jan 1, 2004
		Both CPUs are connected to the rest of the Saturn using a single, shared bus. When both CPUs want to access something, one CPU gets the bus and the other has to wait until it's free, resulting in slowdown. To help against this the cache of the CPUs can be configured as 2K shared cache and 2K RAM (normal mode is 4K mixed cache), and IIRC the slave CPU is configured like this by default. By working out of cache on data in the internal RAM external bus accesses can be minimized which should lead to better performance.

		vbt	Jan 1, 2004
		Ok I'll try to use the cache and if I understood I have only to create each time a second variable that points on the source variable address with 0x20000000 added and it will copy the variable to the cache automatically.

		antime	Jan 1, 2004
		No, that would bypass the cache entirely. When reading a data location with the top three address bits set to zero an entire cache line (16 bytes on the Saturn's CPUs) is read into the cache (which is why hardware register accesses have to bypass the cache). The cache chapter in the 7604 manual describes how it works. It's a tricky subject and not really worth bothering with unless you suspect you actually have a performance problem due to it (like having arranged your data so you always get cache misses). When the cache is configured as cache+RAM, 0xc0000000 to 0xc00007ff become RAM so copy your data there, do whatever operations you want to on it and copy it back out to wherever you want it. The code that operates on this data should be as small as possible to make effective use of the remaining cache, which means many small loops rather than one big loop and so forth.

		ExCyber	Jan 1, 2004
		Ideally, shouldn't you just disable the cache and use the full 4K for code+data?

		antime	Jan 1, 2004
		Yes, you can do that as well, I forgot about that possibility. To create code that runs from that area you must play around with your link script and use GCC's section attribute to map the code and data to the right addresses. The ld manual has an example on how to create a section with different load and virtual addresses which you can use pretty much as-is.

AntiPasta

Jan 2, 2004

Originally posted by antime@Dec 31, 2003 @ 03:12 PM

it's just a matter of being able to read and understand them.

That's where I pretty much gave up on the libless approach

		slinga	Jan 3, 2004
		VBT: There's another sample program and some more information in Saturn Technical Bulletin #28 if you need some more code to look at.