Home | Forums | What's new | Resources | |
Saturn's 3D capabilities |
VladR - Sep 28, 2018 |
VladR | Sep 28, 2018 | |||
I encountered this HW comparison between Saturn/PS1/N64, just not sure if I can fully trust it: https://segaretro.org/Sega_Saturn/Hardware_compari... I don't really care about the comparison itself (e.g. against, say, PS1), but I quite like that all the benchmark numbers are together at one page, giving me a really nice summary of Saturn's bandwidth and hardwired shading/texturing capabilities. Is that comparison a bogus fanboy site, or is it legit ? There is one area that doesn't appear to make much sense, however: Flatshading (Both 8-bit and 15-bit) shows identical value : 28 MPixels/s via VDP1. I don't know how it's internally implemented in a silicon, but there's half amount of data being written for the 8-bit color, so it should be roughly double the value of the 16-bit (minus the scanline traversal overhead, of course, which is identical for both cases), given that all other benchmarks show differences at different color depths. Another thing: Gouraud shading: 16 MPixels/s (10x10). According to note No 45, Saturn can shade 164,576 polygons/s (10x10 pixels). That would be roughly 2,743 quads at 60 fps. Given that these are theoretical numbers where all performance is spent just on that single feature alone, it sounds, kinda, low - no ? Those numbers, are obviously, still an awesome boost over Jaguar, let alone the fact that on jag you have to do scanline traversal, and compute endpoint data for Bliiter for every scanline (basically, killing 90% of RISC's performance on just that). Here, it appears, you just give the Saturn coordinates, colors, and it'll shade and interpolate whole polygon for you, automagically. Which is, obviously, pretty awesome, as you suddenly also gain 90% of GPU's performance for other stuff and effects, as you don't have to handhold Blitter for each scanline of the polygon... What's the most detailed Gouraud shaded game on Saturn ? |
Mr^Burns | Sep 29, 2018 | |||
Quake, maybe http://www.shinforce.com/saturn/information/3D-Cap... |
VladR | Sep 29, 2018 | |||||||||
That's fine, I am taking those numbers only in the context of a theoretical benchmark, outside of real-world gameplay scenario.
Wow! Full. Stop. Does it mean, that for CPU direct access of the framebuffer, 320x224 at 4-bit color actually stores 16-bits per every pixel ? So, if I was flatshading at 4-bit, one 16bit write wouldn't fill 4 pixels, but just one ? OMG... So, all the sub-16-bit coloring on Saturn is merely indirect, internally ? I am asking, as on Jaguar, framebuffer 320x200 at 4-bit color takes 32 KB, but at 16-bit color it takes 128 KB. That's a huge difference in terms of bandwidth, and how much performance is left for the sytem. There are many 16-color scenarios where its much higher performance would be quite useful, but if 16-bit is forced upon coder, that really sucks... |
VladR | Sep 29, 2018 | |||||||||||||||||
Yeah, but your engine is BSP/portal, right ? That's a lot of CPU overhead every single frame. Still, a very sexy number. How much usage of slave SH2 do you do ?
Wait, does it mean that internally VDP still processes each pixel of the bounding box of such quad ? Because in worst case (when a rectangle is skewed in 45' angle), about half of them are transparent, as they don't belong to the quad in the first place (they're outside of the edges). I was hoping VDP would internally do scanline-based rasterization via edge tracing...
Only 2000, eh ? OK, nice to know, but since it's likely textured, it's not a bad number...
Yeah, BSP and portals are a CPU killer unfortunately, which is why I stay away from those types of engines personally... |
Ponut | Sep 29, 2018 | |||||||||||||
don't worry about me, just idle conjecture of someone with far less programming experience than anyone else here...
That can actually be how it ends up working if you don't organize your commands right. I obviously don't, because I have no idea what I'm doing.
IIRC it's not a hardware limit, rather it is an SGL limit. If there are more polygons than the limit, they simply do not display with SGL default behavior. Other folks should know how to get past this limit, where the limit is, or if I am just seeing some other glitch. Other folks say there are games that exceed that and I certainly believe it.
2. Not really a game yet 1. No, it's not transform-bound. It's bound based on B-bus saturation (which is on the same communication bus as the SCSP [sound], VDP1, and VDP2). I think if you used MIDIs you could get good improvements. I'm the least experienced and least accomplished here, so please, defer to those with more experience. Thanks for reading. (I'm aware based on other's more detailed answers I should not have even posted |
antime | Sep 29, 2018 | |||||
No such thing. Each framebuffer pixel consists of 1 bit to indicate format (RGB or palette data), and then either a RGB555 colour, or a format-dependent mix of palette address and palette index. |
VladR | Sep 29, 2018 | |||||
Ouch, what a faceplant And here I was, thinking what kind of super hipoly scenes I could do at 25 % bandwidth like on jag. That would explain why that comparison web shows identical numbers for 8&16 bit. What a waste of performance. Now Saturn is fast, but not 4x as fast to compensate for 4x more data... So, basically one is indeed forced to 16-bit, unless in hires? So, jaguar is the last 32bit machine to offer fast flatshading at 4-bit color depth. It's a shame, as flatshading looks awesome clean and sharp at higher resolution. |
XL2 | Sep 29, 2018 | |||||
I didn't read all the posts after mine, but the framebuffer is 16 bpp, you just use color lookup tables and palettes to index actual 16 bits colors (or 8 bits in high res). The only speed you get using palettes is that the vdp1 doesn't need to do the lookup, just the vdp2 which is very fast (really small gain, if any). About my engine, it's currently just an octree with LOD and mipmaps, so it's not that hard on the cpu. I'm now working on a bsp compiler with pvs, but no portals ingame (I accept some overdraw to reduce cpu usage since the mipmapping and LOD help a lot). I'm not done writing it so I can't compare the performances yet. About why distorted sprites have huge overdraw issue, it's because of how it does some kind of antialiasing while writing pixels to prevent "holes" in the texture. But it means that in some extreme situation you draw all the pixels twice per sprite. It also means that you can forget transparency on distorted quads. Also, the Saturn, like the 3DO, has no notion of UV coordinates to all the quads have 4 vertices with implicit texture coordinates : 0,0 1,0 1,1 0,1 Of course, if you use a SW renderer you can forget all that. For the slave usage, SGL makes the slave process the transformations and drawing routine (on cpu side at least), so while it's not optimal it's quite good. But I suggest you use the hardware, you could pull way more this way. Just by playing a bit with SGL you will see how quickly you can get good results. But you could make good use of a small software renderer for effects like the transparency, just writing in a NBG0 or NBG1 bitmap layer and let the vdp2 pull the transparency. |
VladR | Sep 30, 2018 | |||||||||||||||||||||
Well, I guess Jaguar had a pretty smart design in that particular regard, as it was doing the translation to 16-bit color (from any bit depth : 1,2,4,8 bit) at runtime, during drawing of each picture line. It was a separate chip on ObjectProcessor. Especially for 4-bit, since it was natively reading 64-bits per one read (cycle, really), it meant reading 16 pixels per cycle, which is phenomenal throughput. And for flatshading, 16 colors can give you some nice base colors, so it is actually quite useable, and speed is just phenomenal. But, I'll shut up about 4-bit now...
Yeah, I did the same kind of thing on PC, around ~2002, but it wasn't ~30 MHz CPU (more like 600 Athlon at the time), so I'm not sure I wouldn't consider octree a pretty hard load on sub-30 MHz CPU
Yeah, my idea was to merge both HW&SW rasterizer, given that I already spent a great deal of effort on a RISC-based rasterizer on jaguar, so the code should be totally transferable. For generic texturing, I'd obviously leave it to VDP, but I also did some perspective-texturing stuff for axis-aligned quads (walls of buildings and floor/ceiling) that is running completely in SW, rasterizing picture line by line, preparing the current scanline within 4 KB cache, and in parallel drawing previous scanline via Blitter to Framebuffer. Given that it's not real texturing on Saturn, let alone perspective-correct, I suppose I would totally reuse that texturing code too. Still, for arbitrary textured polygons, I'd defer to VDP.
Wait, so the slave SH2 is not fully available ? I just assumed its load was zero. So, Sega actually had some baseline multithreaded codebase for developers ? WOW, that's quite a pipe dream in jaguar land...
Oh, yeah. The power of SW rasterizer, where you can do anything you imagine - just must be willing to pay the development cost Right this moment, I'm working on a 16-bit texturing that automatically applies antialiasing along the edges - as the code has to draw scanline by scanline, which is what takes majority of performance, you might as well do few additional reads and just apply antialiasing at a minimal cost. It also made me realize (as until now I was just working in 4-bit and 8-bit color space), that I can quickly adjust my line drawing routine to apply antialiasing there too. No such freedom with HW-based functionality, but then again - submitting an array of polys totally beats rolling your own 4 KB rasterizer in RISC |
XL2 | Sep 30, 2018 | |||||
For software rendering, just use a NBG0 or NBG1 bitmap layer at 4 bpp, you can do the same thing you did with the Jaguar. Thr VDP2 will take care of filling the framebuffer. No need to write directly to the framebuffer. |
David Gámiz Jiménez | Oct 5, 2018 | |||||
Absolutely agree whit this data. And add something important, the fact in the PSX side, is the same numbers. Really, SS and PSX was very similar machines in the time. The data in SegaRetro is generally amazing work. In some cases not are very accurate or put to win a one part. But I feel very thankful for the amount of data in this wiki. Finally, all this "war of numbers" is very boring. In fact, 3DO, Jaguar, SS and PSX. Are great pieces of Electronic technology for the time. And We cannot quantify yours values whit the today view of the 3D hardware. The TexelRate value is "almost" impossible to calculate in this Graphics Chips/systems. The effective PixelRate in the same way. Real polycount are there, in the games that were made. Is it possible to get some more number? Yes. Absolutely, but not a double or a millions of polygons that they said in the past. Because this numbers are been a big lie. For follow the upcoming updates: http://forum.jo-engine.org/index.php?topic=854.0... 258 games analyzed right now.... |