| Home | Forums | What's new | Resources | |
| Saturn SCU DSP Notes |
| Ponut - Oct 10, 2022 |
| 1 | 2 | Next |
| Ponut | Oct 10, 2022 | ||||||
| I wanted to make this a formal write-up summary, going over the uses of the SCU-DSP & the things I've bothered to do with it, in detail. However, I'm running short on time in my life so I'll have to make it quick. First: Should you be reading this? What is your level of understanding? This is a question that I start with just to try and touch base with you, the reader. I am going to talk about computer science stuff: architecture, pipeline, bits, bytes, data types, memory bus width, even/odd addresses, and pointers. If you do not have a computer science background, this post will not be useful to you. Either close the tab now, or read on with caution. And even if you do have a computer science background, I have a further filter for you: Do you know what these are expressing? Don't have to answer in the comments, just think about it. If you get it, that's good. You have a level of C mastery that means you might actually be able to make use of the DSP. I am being pedantic about this just because I want to make a point: the SCU-DSP is difficult to use. It is so difficult that fine-tuning your program written for the SH2s in C is going to net you a far greater gain than learning the SCU-DSP. I would even argue that even at the point where your code is logically where you want to be, messing with compiler flags is a better use of your time. This of course assumes you just want to "GSD", as it were. If you want to learn something unique and fun in programming, by all means, go ahead. Second: What is the SCU-DSP? The Saturn actually has two processors referred to as "the DSP". There is the DSP inside the Saturn Control Unit / System Control Unit (SCU), and the DSP inside of the Saturn Custom Sound Processor (SCSP). These two processors are COMPLETELY DIFFERENT. They do not function even remotely the same way. It is extremely important to specify which DSP you are talking about. Today, we are talking about the SCU-DSP. The SCSP-DSP is a piece of hardware covered by a different discipline, that being a digital audio engineer. What you see above here is a logical diagram of the SCU-DSP. ... Yes, that's really it. That's the whole thing. I'm not kidding, it is that simple. Some of you of course are saying, "What?! SIMPLE?!" Well, trying to fit the logical diagram of something as slow as a M68K onto a single sheet of paper is a challenge. Go look up a diagram of one; it's probably going to have to span multiple entire pages. I think a bigger note is that the manual for a 68K might be hundreds of pages whereas the SCU-DSP is a part of the SCU manual taking up 86 pages. The point of this is that while using the SCU-DSP is complicated, it is not because the processor is complicated. It just requires intricate management by the programmer to function. The core feature highlights of the SCSP-DSP are as follows: A 48-bit MAC unit and ALU (Accumulator) unit with the processor in total running at about 12.7 MHz. A simplified RISC-like instruction set where one instruction is executed in one clock cycle. 1KB of program RAM, at four bytes per instructions, allowing a single program to be 256 instructions (program changing covered later). Technically 1KB of on-chip ""RAM"". This however is not RAM, it is indexed access memory, split across four banks of 64 x 4 byte entries. A very wide instruction bus, with the capacity for a single 4-byte instruction to execute five simultaneous actions. A "pipeline" wherein the instruction after the current one being executed is pre-loaded and prepared for execution-in-sequence. Due to the extremely wide instruction bus, the SCU-DSP can only be programmed for in its own assembly language. This is not Assembly like you will see for most of any other processor, however, if you've worked with a 56K before, it's probably familiar. Here is a primitive program example written using the SCU-DSP's Assembly, highlighted via a language plugin for notepad++ to color-code the various instruction bus of this processor. What you will notice in this primitive (which should run in the DSP simulator), is a conditional jump. Unlike the SCSP-DSP, the SCU-DSP has a feature of critical usefulness and that is a number of logic flags and conditional instructions. Of course, you probably haven't looked at the SCU-DSP's instruction set yet, so that screenshot doesn't make sense. I am just using it to highlight that the SCU-DSP is a fully functional logical processor. In other words, this chip is more like a CPU than a DSP. I will now insert a link to Antime's Saturn page, where you can find the SCU-DSP manual. Saturn documentation, libs & stuff ... I have no idea where the hell I found these, but they could be on Antime's page too, but these are DOS tools. These tools (being the dsp-simulator and the dsp-assembler) are needed tools to test and assemble DSP programs. They must be run in DOSBox. Also, you'll need their respective manuals. (See Attached Files) Okay, those are the basic footnotes about the SCU-DSP. I will not cover its explicit architecture as GameHut already did a video on that. I will also attach a file that is a sample program demonstrating the SCU-DSP's logical capabilities in a program that allows the DSP to divide numbers using recursive logic and a root seeking algorithm (i just googled the method, nothing special). If you intend to understand how it actually works & write code for it, bouncing back and forth between sample code and the manual whilst writing your own program is more helpful than what I could cram in here. | |||||||
| Ponut | Oct 10, 2022 | ||
| The second advantage of the SCU-DSP is actually the reason it exists to begin with. Behold! The fully unlocked potential of the DSP's wide instruction bus! This is an example primitive of what transforming a fixed-point vertex by a matrix might look like on the SCU-DSP. This is also why Sega first included the SCU-DSP in the Saturn in the first place; it was intended to perform this task to enable the system to have the grunt necessary to be a truly 3D system. Most developer partners of Sega, and most developers at Sega themselves, knew the SCU-DSP for this purpose and this purpose only. If a retail game used the SCU-DSP, this is most likely what they used it for: matrix transformation. Of course, the myth and the legend goes that after Sega learned of the Sony Playstation's specifications, they were alarmed. They knew the Saturn, at that point having only one SH2 and the SCU-DSP, could not compete with Sony's hardware. To improve the Saturn's power to be a better match for the PlayStation, Sega added a second SH2 to the Saturn. At that point, the SCU-DSP was obsolescent. It was also probably at that point that hardware development on the SCU-DSP was stopped; its feature-set showing that it was thought of as more than a maths unit... because a maths unit doesn't need logical branches or DMA access... but it wasn't developed to a point where it could serve nearly as well as a second SH2 simply due to how difficult it was to use & coordinate the rest of the program with. With its proclivity to crash the system and a few other bugs ( did I mention the DSP end interrupt, as triggered by ENDI, sometimes just doesn't work? ) I imagine the SCU-DSP was finalized right after the second SH2 was added. Maybe. I don't know. What I do know is that the SCU-DSP is fast at MACs, since it can do a 48-bit MAC in one cycle (or two SH2-equivalent cycles). It takes the SH2 an average of three cycles to do a 64-bit MAC. But listen, seriously. The SH2 can do its own set-up for said MAC. It can also do a 64-bit division in-line with the MAC thanks to the SH7604's DIVU. The SH2 can also arrange such that it can work with 8-bit or 16-bit vertices. Yadda yadda yadda. The complication means that your Master SH2 is very likely to lose more than one cycle per transformed vertex in simply setting up the DSP to do its work. In fairness, the DSP does not need to spend instructions rearranging a fixed-point operation back into 32-bits, whereas the SH2 will end up using the ``xtrct`` instruction a lot. The SH2 version of point-by-matrix: Though, none of this much matters when the main end goal of the SCU-DSP was always to add more MIPS, not to enable some specific new feature. Because of this, it is often the case where making use of the SCU-DSP will improve the performance of your software in CPU-bound scenarios, no matter what the SCU-DSP is doing. As long as it is taking enough of a load off of the SH2s that one or two milliseconds are saved, that might be the one or two milliseconds you needed to make that 33.3ms frametime. Of course, if you are not CPU-limited, the SCU-DSP will do literally nothing for you. The attached "dsp_bench" ZIP file demonstrates this. It contains two versions of a 'game' of sorts with an unlocked frame-rate. If you performance test them, you'll notice that they perform exactly the same, yet one version of the game is using the DSP whereas one version of the game is not using the DSP. This is because that build never runs into a CPU bottleneck. I do know of course that if I manufacture a CPU bottleneck, the DSP version runs faster, but those are tests that I ran long ago to come the conclusions I present here. Part What: Program Switching This is a short note. The SCU-DSP having only 256 instructions to be loaded in a single program is kind of a bummer. I thought it'd be dope if the SCU-DSP could run a program, load a new program at the end of that one, and continue running the new program that was loaded. And then that second program would re-load the first program and then enter a wait-state pending SH2 communication. I tried this, and it seemed buggy. Some emulators would let it run, but The Codex As- i mean Mednafen would not let it run more than a few times. Real hardware seemed to corroborate that the DSP would crash after running this loop back and forth a few times. Sometimes, it wouldn't run it more than once. It was confusing that it would sometimes work and sometimes not work. My memories of it are foggy, since isolated test-runs with simplified programs (not running next to the real game) would not work at all, not even loop through once. I deleted all traces of it... I kind of feel like this one demands more study for an interested party, but really, does it? The DSP is hard enough to use as it is. After reading the hardware manuals up and down for an explanation of why the DSP would stop itself after loading a new program, it seems this is actually the intended behavior, since loading a program into program RAM is supposed to always halt the PC. These rules are kind of implied by the DSP control port. Of course, you have sections like this (which I definitely read & tried) It's confusing, because it's like Sega intended you do to this, but don't tell you that the program stops after you load it. I suggest you research further if curious. Conclusion In an alternate universe, there exists a Saturn that released without a second SH2, in this same universe a 32X did not release. In this universe, Sega contracted with Motorola to build the SCU instead of Yamaha (or was it Hitachi?). In doing so, Motorola was able to include a 56K inside the SCU. In this universe, the Saturn was able to do this but at an even higher frame-rate with the added support of VDP1:
Of course, said Saturn cost a lot more to manufacture. /e: i got the clockspeeds wrong We don't live in that universe, and frankly, we should all be thankful we do not. As we will soon learn this coming January, the Saturn does not need a DSP to do that, if that wasn't clear already. Sega's inclusion of a second SH2 as a reaction to the PlayStation was successful if short-sighted because what Sega needed more than anything was a Saturn that was easy to use. Two 28.6 MHz CPUs may not be as good as one 57.2 MHz CPU, but they sure as hell are better than one 28.6 MHz CPU and a 14.3 MHz enigma machine. And we got two 28.6 MHz CPUs, and a 14.3 MHz busted enigma! | |||
| Ponut | Oct 11, 2022 | |||
Should I edit that part? So you know, memory fuzzy and all. The move immediate instruction is signed. I did see in my code I only need to shift it left once to get the address I want. Why didn't I just pass it through as if it were unsigned? I don't remember lol EDIT: I remember now. I will edit the original post with an explanation. Anyway, which processor does run at 25.4 mhz? i'm guessing none of them do | ||||
| fafling | Dec 9, 2022 | |||
There could be an explanation for the B-bus access issue of the DSP (and maybe for the related crash) found in p. 48 of Sega Developers Conference Conference Proceedings ... : So the assembler would be to blame, and you'd have to patch its output result to make it work. | ||||
| Ponut | Dec 9, 2022 | ||
| A+ level digging there, @fafling... ! That's sure to help some folks out. Editing a single instruction is not too difficult, the manual lays them out byte-wise. | |||
| Ponut | Sep 9, 2023 | ||
| I would like to come back to this thread and report a few things. Firstly that I have wrote a new DSP program which achieves great results in improving the performance of a 3D game, on the order of ~4ms of improvements for 441 vertices tested. The program uses a chirality (winding) check algorithm to see if a vertex is in an on-screen space, or not. It can then apply user-specified clip flags whether IN or OUT of the area, to achieve portal IN (window) or portal OUT (occlusion). However, in writing and integrating this program, I have gone back and discovered that the SH2 code (particularly the Slave SH2 code) also had issues which were causing frames to miss the 29ms target (yes, 29ms, harsh). So it goes to show that you're going to need to work on profiling the code to grasp performance issues before proceeding with the DSP to try and improve things. Another issue was that while theoretically the DSP code would provide a 6-7ms performance boost, synchronization issues mean that significant time is lost over what might be theoretically possible; this is an issue that will always occur when using the DSP to perform a task in time-step parallel to the SH2. To be fair it exceeded my expectations after the synchronization code was entered. In addition to that, I was able to get things done a lot faster thanks to "The Purist of Greed"'s / @buhman new Windows-compatible DSP Assembler, that you can find here: Very big thanks to that. | |||
| srg320 | Sep 10, 2023 | |||
Perhaps interrupts have not been disabled for the Slave SH2. The Slave SH2 interrupts are described in "Sega Saturn technical bulettin #28". | ||||
| Dr.Wily | Jan 8, 2025 | |||
Bump this very interesting topic.
Are you sure of that ? I mean, we never really knew which processor(s) had been added (or changed) following the announcement of the PSX's capabilities. First, adding a second CPU at the last time seems unlikely because of side effects in terms of electronics and development. Specifically devkits that are designed before the console, at least for Saturn. Second, why 32x also has 2 SH2 ? In my opinion, both SH2 were already planned to replace the V60 that was originally planned. Two assumptions : - First the "chip" added at the last minute corresponds to the "second" SH2, which was planned as a dual processor anyway. By this I mean that these 2 SH2s should be seen as a single chip and that an error of interpretation or translation has created the myth. In reality, the V60, which was planned as a single chip, has been replaced by 2 other chips, which are the two SH2s. The public saw this second SH2 as an additional chip added at the last minute, whereas the replacement for the V60 was already planned with 2 SH2s and not just one. - Second, the SCU is the chip added after the announcement of the PSX's specifications. Why ? The SCU is the only chip that isn't needed for the Saturn to work. A game can work perfectly well without the help of the SCU. The SCU reminds me of the chips added to cartridges to help with certain processing (like the DSP-1/2/3 in Super Nintendo games). Even SCU implementation in the Saturn doesn't seem to fit in with the rest of the machine's architecture. What's more, it's the only chip specialising in matrix calculation, whereas the VDP1 is still a chip specialising in sprites (and their affine transformation) but not really a chip designed for polygonal 3D like the PSX's GTE. The SCU is the only chip that comes close to being a ‘dedicated’ 3D processor and added at the last time to compete with PSX. These are just assumptions, but what I'm sure of is that it's impossible for there to have been a second SH2 added at the last time. | ||||
| TrekkiesUnite118 | Jan 8, 2025 | ||
| We have interviews from Hitachi engineers, Hideki Sato, and Shoichiro Irimajiri that all point to the 2nd SH2 being the last minute change in response to the PS1. From those interviews we can piece together this timeline: September 1992 - Sega chooses to go with the SH2. Late summer of 1993 - Sega learns of Sony's specs and realizes Saturn needs to be beefed up. They investigate either beefing up the CPU or improving VDP1. They choose to beef up the CPU. Fall of 1993 - Sega asks Hitachi to increase the clock speed of the SH2, Hitachi isn't able to do this on short notice but suggests they add a 2nd SH2 to use the Master Slave configuration mode they had worked into the design. So Sega adds the 2nd SH2 to the Saturn. January 1994 - In designing the 32X Sega of America looks at the near final Saturn design and lifts the dual SH2 design from it to use in 32X. So we know by this point Saturn's design was pretty much near final. From the dev manuals not much changes after this point. | |||
| antime | Jan 8, 2025 | ||||||
On the contrary, without the SCU the Saturn would not work at all. It is the glue that connects the various parts. Without it, the CPUs would only be able to communicate with the SMPC, the only other device on the C-bus.
Well, we have the "Introduction to Saturn Game Development" document, and the developer presentation slideshow revisions from April and May 1994 that point to at least the memory being changed from 1.5MB SDRAM to 1MB SDRAM + 1MB DRAM. There's also the different SMPC command set in some old SDK headers, though I'm not sure when it was finalized. Sega would have been able to keep iterating on the SCU fairly late, since it's a gate array and not a fully custom chip, and the amount of bugs in it certainly suggests that they did. | |||||||
| TrekkiesUnite118 | Jan 8, 2025 | |||
Right, when I said near final I simply meant the main chips (2x SH-2s, SCU + DSP, VDP1, VDP2, M68k, SCSP, SH-1, etc.) and their arrangement was probably decided by then. The inner workings of some of those chips like the SMPC, VDP1, etc. still seemed to get updates after then. I think VDP1 got HSS sometime in early 1994? | ||||
| Dr.Wily | Jan 9, 2025 | ||||||
I've already read these itw and none of them say clearly the name of the added chip. Especialy this one... from Hitachi which says on the one hand that it was because of the Nintendo 64 and not the PSX that the SH2's performance would be inadequate, and on the other hand that the choice of a second SH-2 was not made at the last minute but in the midst of the Saturn's design. Moreover, the Sato itw did not mention the name of the chip added at the last minute. He's simply talking about a chip added in response to the release of the PSX's capabilities. But he doesn't say specifically which chip.
I talk only about DSP part of the SCU not glue part. The DSP part may well have been designed at the last minute. Especially if, as you say, it is gate array and it could be very well reconfigured according to needs and the evolution of the market. Facts and testimonies don't agree enough to be sure that a SH2 was added at the last time. | |||||||
| antime | Jan 9, 2025 | ||
| Did Hitachi have any SMP-capable SH-1 CPUs? It would have seemed a bit out of their wheelhouse, since they were mostly making small embedded controllers with an emphasis on motor control. If the master/slave operating modes were created specifically for the SH7095 it's clear it's something that been planned for a long time. | |||
| fafling | Jan 9, 2025 | ||
| The master/slave multiprocessor capability was added to SH-2 in response to a request by a side project at Hitachi, and it doesn't appear to have been planned in advance. This is told in paragraph "A Request from the Research Lab" in this story of the SH-2 development.... That article implies that this capability was added not long before Sega requested more CPU performance for the Saturn, a sort of nice coincidence which led to the addition of the 2nd SH-2 in the console. | |||
| Dr.Wily | Jan 10, 2025 | |||
Yes, but that doesn't mean that Sega chose a single processor at first, and then after a few months of testing decided to add a second one, instead of looking for something more powerful and less difficult to integrate. Integrating 2 microprocessors in a master/slave configuration was unusual in the video game world at the time, especially if you consider, as this myth tells us, that only 1 SH2 was originally planned. It's just not credible. In my opinion, both SH2s were already included in the console's architecture. The dual-processor function was already known and in place, since the request came from System Development Lab. I doubt the engineers thought: “this microprocessor isn't powerful enough, so we'll add a second one”. Rather than looking for something equivalent in terms of MIPS. Electronics isn't that simple, you can't just double a microprocessor to get more power, especially when each of them shares the same bus. Especially as the 32x didn't need 2 SH2s, a V810 would have done the trick. In my opinion, this double SH2 story has nothing to do with competition, but is more a matter of negotiation and, above all, the price of microprocessors at the time. The SH2 had to be the most competitive, and Hitahi needed high-volume production to impose its new chip. Sega was the driving force behind the SH2, and in return, Hatachi offered them a very low price per unit. I don't think Nintendo or Sony had anything to do with Sega's choice of dual SH2. | ||||
| TrekkiesUnite118 | Jan 10, 2025 | |||
This Sato interview flat out states the Saturn just had one SH2 early on and that the 2nd SH2 was added at the last minute In response to the PS1: This one from Irimajiri gives us the date around when this happened as he mentions they were debating between adding another CPU or adding a new GPU: So it's pretty obvious that the last minute decision was the 2nd SH2. And it makes a lot of sense when you see the Saturn's design. The SCU DSP suddenly makes more sense for why it's there if you only had one SH2. As for the 32X, we know exactly why it has 2 SH2s. Sega of America lifted that from the Saturn when designing the 32X in 1994. You need to remember "last minute change" here is referring to fall of 1993. Which for a system that's to be on store shelves in a year, that is a dramatic last minute change to the hardware. There are plenty of quotes and data to back up that the SH2 was the last minute change in response to Sony. So if you think it's a myth you need to provide some solid evidence of this, not speculation and conjecture. | ||||
| Tiviat | Jan 10, 2025 | |||
....or could the Saturn JUST be simply a 32x CD, like they did with the MegaDrive->MegaDrive CD ? I'm guessing either way we ended up with the Saturn | ||||
| TrekkiesUnite118 | Jan 10, 2025 | |||
Saturn started development long before the 32X did. | ||||
| 1 | 2 | Next |