SegaXtreme

Home	Forums	What's new	Resources

Saturn SCU DSP Notes

Ponut - Oct 10, 2022

Ponut

Oct 10, 2022

I wanted to make this a formal write-up summary, going over the uses of the SCU-DSP & the things I've bothered to do with it, in detail. However, I'm running short on time in my life so I'll have to make it quick.

First: Should you be reading this? What is your level of understanding?

This is a question that I start with just to try and touch base with you, the reader. I am going to talk about computer science stuff: architecture, pipeline, bits, bytes, data types, memory bus width, even/odd addresses, and pointers.
If you do not have a computer science background, this post will not be useful to you. Either close the tab now, or read on with caution.
And even if you do have a computer science background, I have a further filter for you:

Do you know what these are expressing? Don't have to answer in the comments, just think about it.
If you get it, that's good. You have a level of C mastery that means you might actually be able to make use of the DSP.

void function(void(*thing)(void))

unsigned int notiCommand = 0x4A<<25 | ((unsigned int)dsp_noti_addr)>>3;

I am being pedantic about this just because I want to make a point: the SCU-DSP is difficult to use.
It is so difficult that fine-tuning your program written for the SH2s in C is going to net you a far greater gain than learning the SCU-DSP.
I would even argue that even at the point where your code is logically where you want to be, messing with compiler flags is a better use of your time.

This of course assumes you just want to "GSD", as it were. If you want to learn something unique and fun in programming, by all means, go ahead.

Second: What is the SCU-DSP?

The Saturn actually has two processors referred to as "the DSP". There is the DSP inside the Saturn Control Unit / System Control Unit (SCU), and the DSP inside of the Saturn Custom Sound Processor (SCSP). These two processors are COMPLETELY DIFFERENT. They do not function even remotely the same way. It is extremely important to specify which DSP you are talking about. Today, we are talking about the SCU-DSP. The SCSP-DSP is a piece of hardware covered by a different discipline, that being a digital audio engineer.

What you see above here is a logical diagram of the SCU-DSP.

...

Yes, that's really it. That's the whole thing. I'm not kidding, it is that simple. Some of you of course are saying, "What?! SIMPLE?!"
Well, trying to fit the logical diagram of something as slow as a M68K onto a single sheet of paper is a challenge. Go look up a diagram of one; it's probably going to have to span multiple entire pages. I think a bigger note is that the manual for a 68K might be hundreds of pages whereas the SCU-DSP is a part of the SCU manual taking up 86 pages. The point of this is that while using the SCU-DSP is complicated, it is not because the processor is complicated. It just requires intricate management by the programmer to function.

The core feature highlights of the SCSP-DSP are as follows:
A 48-bit MAC unit and ALU (Accumulator) unit with the processor in total running at about 12.7 MHz.
A simplified RISC-like instruction set where one instruction is executed in one clock cycle.
1KB of program RAM, at four bytes per instructions, allowing a single program to be 256 instructions (program changing covered later).
Technically 1KB of on-chip ""RAM"". This however is not RAM, it is indexed access memory, split across four banks of 64 x 4 byte entries.
A very wide instruction bus, with the capacity for a single 4-byte instruction to execute five simultaneous actions.
A "pipeline" wherein the instruction after the current one being executed is pre-loaded and prepared for execution-in-sequence.

Due to the extremely wide instruction bus, the SCU-DSP can only be programmed for in its own assembly language.
This is not Assembly like you will see for most of any other processor, however, if you've worked with a 56K before, it's probably familiar.

Here is a primitive program example written using the SCU-DSP's Assembly, highlighted via a language plugin for notepad++ to color-code the various instruction bus of this processor.

What you will notice in this primitive (which should run in the DSP simulator), is a conditional jump. Unlike the SCSP-DSP, the SCU-DSP has a feature of critical usefulness and that is a number of logic flags and conditional instructions. Of course, you probably haven't looked at the SCU-DSP's instruction set yet, so that screenshot doesn't make sense. I am just using it to highlight that the SCU-DSP is a fully functional logical processor. In other words, this chip is more like a CPU than a DSP.

I will now insert a link to Antime's Saturn page, where you can find the SCU-DSP manual.
Saturn documentation, libs & stuff ... I have no idea where the hell I found these, but they could be on Antime's page too, but these are DOS tools.
These tools (being the dsp-simulator and the dsp-assembler) are needed tools to test and assemble DSP programs.
They must be run in DOSBox. Also, you'll need their respective manuals. (See Attached Files)

Okay, those are the basic footnotes about the SCU-DSP. I will not cover its explicit architecture as GameHut already did a video on that. I will also attach a file that is a sample program demonstrating the SCU-DSP's logical capabilities in a program that allows the DSP to divide numbers using recursive logic and a root seeking algorithm (i just googled the method, nothing special). If you intend to understand how it actually works & write code for it, bouncing back and forth between sample code and the manual whilst writing your own program is more helpful than what I could cram in here.

		Ponut	Oct 10, 2022
		Except for a few things... Another major complication regarding the use of the SCU-DSP is its RAM access. That is, its lack of access. It cannot be understated that the SCU-DSP cannot directly address memory from its own instructions. It can select a memory index from its own memory blocks, which take effect on the next instruction. Any instruction regarding moving to/from its memory banks' regard which memory bank to choose from and whether or not to index that memory bank (MC,X or M,X). <- edited out misinformation , see fafling reply -> A quick aside, covered in fafling's reply, is that HWRAM addresses are only going to be 27 bits long, byte-wise. DSP-DMA addresses on a four-byte boundary. To turn a 27-bit bytewise address into a four-bytewise address, you have to divide it by four, or shift it right twice. That results in a 25-bit value. This should fit in the SCU-DSP's move-immediate instruction which can handle up to 25 bits. Sega's own manuals seem to indicate that using <mvi hwramaddress,RA0> is a functional method of directly moving a HWRAM address to the DMA address registers (RA0, WA0). However, I have tested it and can confirm this does not work. The reason this doesn't work is because the move-immediate instruction is for signed 25-bit data. In moving a signed 25-bit value to the address register, it sign-extends the 24th bit of your address out to bits 25-31, resulting in an incorrect address being used. Here are screenies of the test scenario: Let's not pretend that anyone can understand exactly what is happening from such isolated code snippets. The point is that the address, from SH2, is being put into the command as shifted right twice to be put in as a 4-byte aligned address. Then the DSP manipulates it into RAM3 58. This weird procedure is followed to put it in RAM3 58 instead of inputting 'MVI NOTI,MC3' because changing a DSP program in a way that removes or adds instructions is way more difficult than just replacing a 'sl' with a 'nop' on a line. The 'working' variant is pictured as followed (and makes the weird procedure make more sense): The scientists' of you will no doubt have already noticed that I did not, in fact, use <mvi hwramaddress,RA0>. The logic here is the same. The MVI instruction will sign-extend a 25-bit address into a 32-bit signed integer. That won't crash the DSP, but it's also not the address you intended to use. To get a valid address for DMA, you can follow the pictured procedure of passing in a 24-bit value and then shifting it left once to produce a 8-byte aligned address at the valid bit depth for the DSP to use for DMA. Alternatively, you can pass in the 25-bit address and follow a different procedure to mask out the bits you do not want to be high. Just be aware that the shifting instructions on the ALU also sign extend, and there's otherwise no immediate way to generate 25 high bits (since, you know, the move-immediate instruction will sign extend that). The other way is of course to use your DSP control library to write in an address that has already been converted by the SH2 to the DSP's memory blocks. If you do that, just be aware that ANY access to the DSP's memory blocks will increment the memory counters. You have to keep very explicit control of the memory counters regardless (CT0, CT1, CT2, CT3). The reason why I point out this 'complication' with memory addresses and the DSP is as it relates to the organization of the game software as a whole. The DSP does not have enough program RAM to itself to control an entire 'pipeline' of your software on its own. This means, no matter what, the SH2 must ordain and allocate the memory that the DSP is allowed to use in HWRAM in order to communicate with the DSP program the size of its workload or perhaps even where to look for the address of where its workload even is. In my own software, this has a two-part solution. In the first part, the DSP program starts with a 'header' which, as written, starts with dummy addresses that the SH2 will later change the program before it is loaded to the DSP to the actual 8-byte aligned yet 25-bit addresses that the DSP will use. The reason I didn't just pass over that data into the DSP's RAM banks is because, for whatever reason, that didn't work--I suspect because upon system reset, the DSP program must be loaded and run a procedure to set the CT's to a known state before data can be safely loaded as system reset (not a fresh power-on, but a reset) may leave the DSP's memory bank access counters in an unknown state. So rather than mess with that, I decided to just write-in the addresses I want to use into the instructions that the DSP will run itself. This was easy enough to do since the SCU manual lays out the instructions bitwise. Oh, and if it wasn't obvious already, your DSP program must be loaded by the SH2 from HWRAM following a specific sequence. SBL or Yaul can take care of that sequence for you. P.0: The DSP part P.00: The C / SH2 Part The SCU-DSP has DMA access to HWRAM and the B-Bus. Its internal bus is 32-bits and its addressing is 32-bits / 4-bytes aligned. It can't access LWRAM (sad horn). The SCU-DSP also has an issue regarding access to the B-bus: A couple of things come together to mean that the SCU-DSP will only be able to access the first two bytes of every four bytes on any B-bus address. This is because the B-Bus is a 16-bit bus and the SCU-DSP' can only address on a four byte alignment. The manual does have a procedure that indicates pulsing the DMA in succession will have the SCU correct the DMA such that the second DMA pulse will read to/from the the other two bytes, but I wasn't able to get that to work. Other developers have said that did work, but they otherwise ran into another critical issue: SCU-DSP triggering a DMA to a B-Bus address has a random chance of just, you know, crashing the system. Either crashing/locking up the DSP, or hard-locking the Saturn. There is probably an untraced condition on which this will or will not happen, so if you are using Yaul, you may not run into that. A final note on memory issues with the DSP is that the Master SH2, Slave SH2, and SCU-DSP all end up fighting for access to HWRAM. When bus contention for all of them add up at once, the delay for either SH2 may be as long as 30 cycles before the SCU has released the bus for the SH2s to use due to the strangely slow behavior of SCU-DSP-DMA. Though, it can be easy enough to schedule the time that the SCU-DSP is accessing RAM to fall outside of the times either SH2 is heavily accessing RAM, especially if you are aware that the SCU-DSP should spend as little time as possible using its DMA (which means, as little data as possible). Part Four: What good is it then? Objectively, what can the SCU-DSP do that the other CPUs in the Saturn can't? Oh, but before I tell you that, let me remind you that the SCU-DSP has no arithmetic division operation. However, with fixed-point numbers, you can manage a division by multiplying by a fraction using the MAC unit. It also has both left and right shifts, both rotation shifts and normal shifts. To be blunt the SCU-DSP does not give the Saturn any additional features. It only exists to add extra MIPS. The SCU-DSP has two main advantages that, depending on the code chunk you are speaking of, can end up being a break from a CPU-bottlenecked game. The first advantage the DSP has to do with logical branches: the SCU-DSP does not pipeline stall on a logical branch; it executes jumps in one cycle like any other instruction. Remember that the SCU-DSP has a tiny little 'pipeline' where it pre-loads the instruction after the current one being executed, this even counts for jumps, both conditional and not. Depending on the circumstances, you can either lose 1 instruction to a jump (for not wanting or having no use for the instruction slot after the jump) or lose nothing. Usually though, you end up losing 1 instruction because of how strict memory control must be. This is an advantage compared to the time loss that the SH2 and M68k experience from logical branches. The SH2 will often experience a total pipeline reset upon a branch cut, which can cost up to 10 cycles in and of itself, not including the execution states of the jump instruction and the instructions within the regeneration of the pipeline. The DSP is clocked at half the rate of the SH2 though, making a comparison being an average of 4 SH2-equivalent cycles lost on the DSP to 10 cycles lost on the SH2. The M68K is dramatically slower in comparison; a branch on the 68k might take 10 cycles, but said 68k is running at a comparable clock-rate to the SCU-DSP. It should be noted of course that an SH2 expert is going to understand what exactly the SH2 loses on a branch cut a lot more than I do. And that's why I say that studying & optimizing your SH2 code is going to get you further than the DSP will.

Ponut

Oct 10, 2022

The second advantage of the SCU-DSP is actually the reason it exists to begin with. Behold! The fully unlocked potential of the DSP's wide instruction bus!

This is an example primitive of what transforming a fixed-point vertex by a matrix might look like on the SCU-DSP.

This is also why Sega first included the SCU-DSP in the Saturn in the first place; it was intended to perform this task to enable the system to have the grunt necessary to be a truly 3D system. Most developer partners of Sega, and most developers at Sega themselves, knew the SCU-DSP for this purpose and this purpose only. If a retail game used the SCU-DSP, this is most likely what they used it for: matrix transformation.

Of course, the myth and the legend goes that after Sega learned of the Sony Playstation's specifications, they were alarmed. They knew the Saturn, at that point having only one SH2 and the SCU-DSP, could not compete with Sony's hardware. To improve the Saturn's power to be a better match for the PlayStation, Sega added a second SH2 to the Saturn. At that point, the SCU-DSP was obsolescent. It was also probably at that point that hardware development on the SCU-DSP was stopped; its feature-set showing that it was thought of as more than a maths unit... because a maths unit doesn't need logical branches or DMA access... but it wasn't developed to a point where it could serve nearly as well as a second SH2 simply due to how difficult it was to use & coordinate the rest of the program with. With its proclivity to crash the system and a few other bugs ( did I mention the DSP end interrupt, as triggered by ENDI, sometimes just doesn't work? ) I imagine the SCU-DSP was finalized right after the second SH2 was added. Maybe. I don't know.

What I do know is that the SCU-DSP is fast at MACs, since it can do a 48-bit MAC in one cycle (or two SH2-equivalent cycles). It takes the SH2 an average of three cycles to do a 64-bit MAC. But listen, seriously. The SH2 can do its own set-up for said MAC. It can also do a 64-bit division in-line with the MAC thanks to the SH7604's DIVU. The SH2 can also arrange such that it can work with 8-bit or 16-bit vertices. Yadda yadda yadda. The complication means that your Master SH2 is very likely to lose more than one cycle per transformed vertex in simply setting up the DSP to do its work. In fairness, the DSP does not need to spend instructions rearranging a fixed-point operation back into 32-bits, whereas the SH2 will end up using the ``xtrct`` instruction a lot.

The SH2 version of point-by-matrix:

Though, none of this much matters when the main end goal of the SCU-DSP was always to add more MIPS, not to enable some specific new feature. Because of this, it is often the case where making use of the SCU-DSP will improve the performance of your software in CPU-bound scenarios, no matter what the SCU-DSP is doing. As long as it is taking enough of a load off of the SH2s that one or two milliseconds are saved, that might be the one or two milliseconds you needed to make that 33.3ms frametime.

Of course, if you are not CPU-limited, the SCU-DSP will do literally nothing for you. The attached "dsp_bench" ZIP file demonstrates this. It contains two versions of a 'game' of sorts with an unlocked frame-rate. If you performance test them, you'll notice that they perform exactly the same, yet one version of the game is using the DSP whereas one version of the game is not using the DSP. This is because that build never runs into a CPU bottleneck. I do know of course that if I manufacture a CPU bottleneck, the DSP version runs faster, but those are tests that I ran long ago to come the conclusions I present here.

Part What: Program Switching

This is a short note. The SCU-DSP having only 256 instructions to be loaded in a single program is kind of a bummer. I thought it'd be dope if the SCU-DSP could run a program, load a new program at the end of that one, and continue running the new program that was loaded. And then that second program would re-load the first program and then enter a wait-state pending SH2 communication. I tried this, and it seemed buggy. Some emulators would let it run, but The Codex As- i mean Mednafen would not let it run more than a few times. Real hardware seemed to corroborate that the DSP would crash after running this loop back and forth a few times. Sometimes, it wouldn't run it more than once. It was confusing that it would sometimes work and sometimes not work.

My memories of it are foggy, since isolated test-runs with simplified programs (not running next to the real game) would not work at all, not even loop through once. I deleted all traces of it... I kind of feel like this one demands more study for an interested party, but really, does it? The DSP is hard enough to use as it is.

After reading the hardware manuals up and down for an explanation of why the DSP would stop itself after loading a new program, it seems this is actually the intended behavior, since loading a program into program RAM is supposed to always halt the PC. These rules are kind of implied by the DSP control port.

Of course, you have sections like this (which I definitely read & tried)

It's confusing, because it's like Sega intended you do to this, but don't tell you that the program stops after you load it. I suggest you research further if curious.

Conclusion

In an alternate universe, there exists a Saturn that released without a second SH2, in this same universe a 32X did not release. In this universe, Sega contracted with Motorola to build the SCU instead of Yamaha (or was it Hitachi?). In doing so, Motorola was able to include a 56K inside the SCU. In this universe, the Saturn was able to do this but at an even higher frame-rate with the added support of VDP1:

https://youtu.be/WpwlZgQPCpk

Of course, said Saturn cost a lot more to manufacture.

/e: i got the clockspeeds wrong

We don't live in that universe, and frankly, we should all be thankful we do not. As we will soon learn this coming January, the Saturn does not need a DSP to do that, if that wasn't clear already. Sega's inclusion of a second SH2 as a reaction to the PlayStation was successful if short-sighted because what Sega needed more than anything was a Saturn that was easy to use. Two 28.6 MHz CPUs may not be as good as one 57.2 MHz CPU, but they sure as hell are better than one 28.6 MHz CPU and a 14.3 MHz enigma machine. And we got two 28.6 MHz CPUs, and a 14.3 MHz busted enigma!

fafling

Oct 11, 2022

Very interesting read @Ponut... !

Another issue is that the immediate-data instruction can, at most, host a 25-bit number. This is a problem, because a DMA instruction takes a 4-byte aligned address. If you divide the total Saturn memory map by four (>>2), you get the addressing in terms of four-bytes aligned. That ends up being 30 bits; five bits more than the immediate-data instruction of the SCU-DSP can express. Unless you calculate a specific hard-coded address, or calculate addresses from an offset, a known address can only be known by the SCU-DSP on a 32-byte boundary instead of a 4-byte boundary

RAM addresses fit on 29 bits on Saturn. The 3 extra bits are used for the access space by the SH2s, but they're useless on the SCU.

And in fact, you can address the whole usable RAM space on Saturn with just 27 bits, as the highest address you need is the end of HWRAM at 0x60FFFFF. That's why the SCU DMA level 0-2 start address 32 bit registers only require the 27 lower bits to be set.
Since the SCU DSP DMA start address must be aligned on 4 bytes, 25 bits are enough to address the full usable Saturn memory space.

Two 25.4 MHz CPUs may not be as good as one 51 MHz CPU, but they sure as hell are better than one 25.4 MHz CPU and a 12.7 MHz enigma machine. And we got two 25.4 MHz CPUs, and a 12.7 MHz busted enigma!

A fine conclusion, however it seems you're trimming a bit on the frequencies : SH2s run at 26.8 or 28.6 MHz, so the SCU DSP runs at 13.4 or 14.3 MHz.

Ponut

Oct 11, 2022

Since the SCU DSP DMA start address must be aligned on 4 bytes, 25 bits are enough to address the full usable Saturn memory space.

Should I edit that part? So you know, memory fuzzy and all.
The move immediate instruction is signed.
I did see in my code I only need to shift it left once to get the address I want.
Why didn't I just pass it through as if it were unsigned? I don't remember lol

EDIT: I remember now. I will edit the original post with an explanation.

Anyway, which processor does run at 25.4 mhz? i'm guessing none of them do

fafling

Dec 9, 2022

	Ponut said:

There could be an explanation for the B-bus access issue of the DSP (and maybe for the related crash) found in p. 48 of Sega Developers Conference Conference Proceedings ... :

So the assembler would be to blame, and you'd have to patch its output result to make it work.

		Ponut	Dec 9, 2022
		A+ level digging there, @fafling... ! That's sure to help some folks out. Editing a single instruction is not too difficult, the manual lays them out byte-wise.

		Ponut	Sep 9, 2023
		I would like to come back to this thread and report a few things. Firstly that I have wrote a new DSP program which achieves great results in improving the performance of a 3D game, on the order of ~4ms of improvements for 441 vertices tested. The program uses a chirality (winding) check algorithm to see if a vertex is in an on-screen space, or not. It can then apply user-specified clip flags whether IN or OUT of the area, to achieve portal IN (window) or portal OUT (occlusion). However, in writing and integrating this program, I have gone back and discovered that the SH2 code (particularly the Slave SH2 code) also had issues which were causing frames to miss the 29ms target (yes, 29ms, harsh). So it goes to show that you're going to need to work on profiling the code to grasp performance issues before proceeding with the DSP to try and improve things. Another issue was that while theoretically the DSP code would provide a 6-7ms performance boost, synchronization issues mean that significant time is lost over what might be theoretically possible; this is an issue that will always occur when using the DSP to perform a task in time-step parallel to the SH2. To be fair it exceeded my expectations after the synchronization code was entered. In addition to that, I was able to get things done a lot faster thanks to "The Purist of Greed"'s / @buhman new Windows-compatible DSP Assembler, that you can find here: GitHub - buhman/scu-dsp-asm at 3 Very big thanks to that.

srg320

Sep 10, 2023

	Ponut said:

Perhaps interrupts have not been disabled for the Slave SH2. The Slave SH2 interrupts are described in "Sega Saturn technical bulettin #28".

Dr.Wily

Jan 8, 2025

Bump this very interesting topic.

	Ponut said:

Are you sure of that ? I mean, we never really knew which processor(s) had been added (or changed) following the announcement of the PSX's capabilities.

First, adding a second CPU at the last time seems unlikely because of side effects in terms of electronics and development. Specifically devkits that are designed before the console, at least for Saturn. Second, why 32x also has 2 SH2 ?

In my opinion, both SH2 were already planned to replace the V60 that was originally planned. Two assumptions :

- First the "chip" added at the last minute corresponds to the "second" SH2, which was planned as a dual processor anyway. By this I mean that these 2 SH2s should be seen as a single chip and that an error of interpretation or translation has created the myth. In reality, the V60, which was planned as a single chip, has been replaced by 2 other chips, which are the two SH2s. The public saw this second SH2 as an additional chip added at the last minute, whereas the replacement for the V60 was already planned with 2 SH2s and not just one.

- Second, the SCU is the chip added after the announcement of the PSX's specifications. Why ? The SCU is the only chip that isn't needed for the Saturn to work. A game can work perfectly well without the help of the SCU. The SCU reminds me of the chips added to cartridges to help with certain processing (like the DSP-1/2/3 in Super Nintendo games). Even SCU implementation in the Saturn doesn't seem to fit in with the rest of the machine's architecture. What's more, it's the only chip specialising in matrix calculation, whereas the VDP1 is still a chip specialising in sprites (and their affine transformation) but not really a chip designed for polygonal 3D like the PSX's GTE. The SCU is the only chip that comes close to being a ‘dedicated’ 3D processor and added at the last time to compete with PSX.

These are just assumptions, but what I'm sure of is that it's impossible for there to have been a second SH2 added at the last time.

		TrekkiesUnite118	Jan 8, 2025
		We have interviews from Hitachi engineers, Hideki Sato, and Shoichiro Irimajiri that all point to the 2nd SH2 being the last minute change in response to the PS1. From those interviews we can piece together this timeline: September 1992 - Sega chooses to go with the SH2. Late summer of 1993 - Sega learns of Sony's specs and realizes Saturn needs to be beefed up. They investigate either beefing up the CPU or improving VDP1. They choose to beef up the CPU. Fall of 1993 - Sega asks Hitachi to increase the clock speed of the SH2, Hitachi isn't able to do this on short notice but suggests they add a 2nd SH2 to use the Master Slave configuration mode they had worked into the design. So Sega adds the 2nd SH2 to the Saturn. January 1994 - In designing the 32X Sega of America looks at the near final Saturn design and lifts the dual SH2 design from it to use in 32X. So we know by this point Saturn's design was pretty much near final. From the dev manuals not much changes after this point.

antime

Jan 8, 2025

	Dr.Wily said:

On the contrary, without the SCU the Saturn would not work at all. It is the glue that connects the various parts. Without it, the CPUs would only be able to communicate with the SMPC, the only other device on the C-bus.

	TrekkiesUnite118 said:

Well, we have the "Introduction to Saturn Game Development" document, and the developer presentation slideshow revisions from April and May 1994 that point to at least the memory being changed from 1.5MB SDRAM to 1MB SDRAM + 1MB DRAM. There's also the different SMPC command set in some old SDK headers, though I'm not sure when it was finalized.

Sega would have been able to keep iterating on the SCU fairly late, since it's a gate array and not a fully custom chip, and the amount of bugs in it certainly suggests that they did.

TrekkiesUnite118

Jan 8, 2025

	antime said:

Right, when I said near final I simply meant the main chips (2x SH-2s, SCU + DSP, VDP1, VDP2, M68k, SCSP, SH-1, etc.) and their arrangement was probably decided by then. The inner workings of some of those chips like the SMPC, VDP1, etc. still seemed to get updates after then. I think VDP1 got HSS sometime in early 1994?

Dr.Wily

Jan 9, 2025

	TrekkiesUnite118 said:

I've already read these itw and none of them say clearly the name of the added chip. Especialy this one... from Hitachi which says on the one hand that it was because of the Nintendo 64 and not the PSX that the SH2's performance would be inadequate, and on the other hand that the choice of a second SH-2 was not made at the last minute but in the midst of the Saturn's design.

Moreover, the Sato itw did not mention the name of the chip added at the last minute. He's simply talking about a chip added in response to the release of the PSX's capabilities. But he doesn't say specifically which chip.

	antime said:

I talk only about DSP part of the SCU not glue part. The DSP part may well have been designed at the last minute. Especially if, as you say, it is gate array and it could be very well reconfigured according to needs and the evolution of the market.

Facts and testimonies don't agree enough to be sure that a SH2 was added at the last time.

		antime	Jan 9, 2025
		Did Hitachi have any SMP-capable SH-1 CPUs? It would have seemed a bit out of their wheelhouse, since they were mostly making small embedded controllers with an emphasis on motor control. If the master/slave operating modes were created specifically for the SH7095 it's clear it's something that been planned for a long time.

		fafling	Jan 9, 2025
		The master/slave multiprocessor capability was added to SH-2 in response to a request by a side project at Hitachi, and it doesn't appear to have been planned in advance. This is told in paragraph "A Request from the Research Lab" in this story of the SH-2 development.... That article implies that this capability was added not long before Sega requested more CPU performance for the Saturn, a sort of nice coincidence which led to the addition of the 2nd SH-2 in the console.

Dr.Wily

Jan 10, 2025

	fafling said:

Yes, but that doesn't mean that Sega chose a single processor at first, and then after a few months of testing decided to add a second one, instead of looking for something more powerful and less difficult to integrate. Integrating 2 microprocessors in a master/slave configuration was unusual in the video game world at the time, especially if you consider, as this myth tells us, that only 1 SH2 was originally planned. It's just not credible.

In my opinion, both SH2s were already included in the console's architecture. The dual-processor function was already known and in place, since the request came from System Development Lab.

I doubt the engineers thought: “this microprocessor isn't powerful enough, so we'll add a second one”. Rather than looking for something equivalent in terms of MIPS. Electronics isn't that simple, you can't just double a microprocessor to get more power, especially when each of them shares the same bus.

Especially as the 32x didn't need 2 SH2s, a V810 would have done the trick. In my opinion, this double SH2 story has nothing to do with competition, but is more a matter of negotiation and, above all, the price of microprocessors at the time. The SH2 had to be the most competitive, and Hitahi needed high-volume production to impose its new chip. Sega was the driving force behind the SH2, and in return, Hatachi offered them a very low price per unit. I don't think Nintendo or Sony had anything to do with Sega's choice of dual SH2.

TrekkiesUnite118

Jan 10, 2025

	Dr.Wily said:

This Sato interview flat out states the Saturn just had one SH2 early on and that the 2nd SH2 was added at the last minute In response to the PS1:

Hideki Sato Discussing the Sega Saturn - Mega Drive Shock

This one from Irimajiri gives us the date around when this happened as he mentions they were debating between adding another CPU or adding a new GPU:

Irimajiri Speaks Out About the Saturn, the 32X, and SOA's Financial Troubles - Mega Drive Shock

So it's pretty obvious that the last minute decision was the 2nd SH2. And it makes a lot of sense when you see the Saturn's design. The SCU DSP suddenly makes more sense for why it's there if you only had one SH2.

As for the 32X, we know exactly why it has 2 SH2s. Sega of America lifted that from the Saturn when designing the 32X in 1994.

You need to remember "last minute change" here is referring to fall of 1993. Which for a system that's to be on store shelves in a year, that is a dramatic last minute change to the hardware. There are plenty of quotes and data to back up that the SH2 was the last minute change in response to Sony. So if you think it's a myth you need to provide some solid evidence of this, not speculation and conjecture.

Tiviat

Jan 10, 2025

	TrekkiesUnite118 said:

....or could the Saturn JUST be simply a 32x CD, like they did with the MegaDrive->MegaDrive CD ? I'm guessing either way we ended up with the Saturn

TrekkiesUnite118

Jan 10, 2025

	Tiviat said:

Saturn started development long before the 32X did.