Home | Forums | What's new | Resources | |
My new project is up! |
mrkotfw - Oct 9, 2006 |
< Prev | 1 | 2 | 3 | 4 | 5 | ... | 9 | Next> |
Chilly Willy | Mar 9, 2012 | |||||
The crt0.s and linkerscript work together to init the data and bss (if needed for the platform) and are particularly important for C++ startup, handling the execution of constructors (which also requires another file I include with the C++ source called crtstuff.c). In general, if a platform I'm looking at already has a crt0.s and linkerscript, I'll work from those as needed. I find that the crt0.s file is usually pretty good already (other than not having .init/.fini support), but the linker scripts are woefully out of date. So I may or may not use the crt0.s and linker script you have... I figure they will probably need at least a few minor changes. Maybe they won't need any changes at all - I won't know until I check. What I have planned for the Saturn... well, a few of the "standard" ports I always do for a platform, like Wolf3D and Doom. While the Saturn version of Doom is pretty good, it's still based on the PSX version, which means it's not quite the same as the PC version everyone loves so. So I'll make a nice PC port. It will require the 4M ram cart, but given how damn cheap the ARP carts are, that shouldn't be a problem. I also plan to get my port of Tremor working. I figure it will be more useful on the Saturn than on the 32X. I was also looking at a port of OpenJazz (Jazz Jackrabbit 1/2). There's a number of things that are possible on the Saturn due to the extra ram and the CD that aren't on the 32X due to lack of ram. |
mrkotfw | Mar 10, 2012 | |||||
Hey, awesome idea! Add in C++ support to my linker script! That's what severely lacking. My crt0.S suboptimally clears the BSS/SBSS sections. At this point, the biggest limiting factor is the ease of testing. I have a USB Data Link, but testing (viewing) is what is stopping me. Any (cheap/good) capture cards or LCD (extremely cheap) that I can use? |
Chilly Willy | Mar 10, 2012 | |||||||||||||||||
Well, the way I do it is to have .init and .fini sections in the linker script along with CTOR/DTOR lists. Then in crtstuff.c I put a function into .init and .fini that calls a function to go through the CTOR/DTOR lists. Then crt0.s needs to call __INIT__ and __FINI__. crtstuff.c
Code:
then in sh2_crt0.s after clearing the bcc for the main SH2 startup
Code:
and finally, the ldscript looks like this
Code:
Hmm - I'd recommend hitting a few yard sales and look for old, used TVs. A 20" color TV would work nicely with the Saturn, and I'd bet you could get one at a yard sale for $20 or less. LCDs capable of RF, composite, or svideo in cost more than plain RGB LCD panels. I've seen plain RGB LCD panels on sale at NewEgg for $50, but then you'd need some way to convert the Saturn video to VGA for the panel. |
mrkotfw | Mar 10, 2012 | |||
Great. I'll add in support sometime later today. |
Chilly Willy | Mar 10, 2012 | ||||
On the linker script, this would clearly all just load into high ram. My 32X linker script (you may have noticed) put .text in rom and .data/.bss in sdram. For the Saturn, it would be more like
Code:
|
mrkotfw | Mar 10, 2012 | |||
Thanks! I had made those exact changes before your new post. Any particular copyright (your real name, e-mail?) you'd like? I'm also unsure of your licensing. I just found a small (fast) library for malloc()/free() so I'm working on that as well. I'll have to add newlib hooks sometime. I also have some ideas as to how to handle all VDP1 command tables (polygons, etc) using a simple mark/sweep garbage collector (for textures too). If you have any suggestions/ideas, I'd like to hear them. |
Chilly Willy | Mar 10, 2012 | |||||||||||
Anything I do is MIT unless otherwise stated, or based off an existing base with its own license (obviously, I can't change GPL code to something else). I've mentioned my licensing in a few places before, but I really should get around to sticking these things in the files. MIT or the new BSD license are fine with me - I want anything I do to be as useful to as many people as possible. Joe Fenton
Which one? I mostly use the standard allocator in libc, or msys - Simple Malloc & Free Functions - by malkia@digitald.com.... I made two versions of msys that are identical, but meant to be used by the separate SH2s - msys for the Master SH2, and ssys for the Slave SH2. That avoids any cache coherency and locking needed to try to share allocators between the two processors.
Make a third allocator from msys called vsys for the VDP1, then just alloc blocks as needed, and reinit the zone to clear everything at once. Maybe FOUR allocators from msys: one for the Master SH2, one for the Slave SH2, one for VDP1, and one for VDP2. I've thought of altering msys to take a zone for an input argument. Then you could have any number of zones. For VDP1, allocate a block from the main vram zone, then create a new zone using that block for tables and whatnot. Then you can clear everything associated with one zone without affecting the others. Zones inside zones... |
Chilly Willy | Mar 10, 2012 | |||||||
Here's MSYS in case you haven't seen it before. EDIT: This version has a new function I added - MSYS_Set() that allows you to set the zone. MSYS_Init() was altered to take a pointer to a zone struct to return the values of the initialized zone. So this MSYS allows you to make zones inside of zones using those features. EDIT 2: I also added MSYS_Realloc() - it takes care of both special cases that can occur with realloc(), but is otherwise pretty dumb, allocating a new block, copying the data, then freeing the old block. I also renamed MSYS_Alloc to MSYS_Malloc and added MSYS_Calloc. msys.c
Code:
msys.h
Code:
|
mrkotfw | Mar 12, 2012 | |||||||||||||
Done. I will commit as soon as I actually test it out with some STL examples.
I'm using the one from: http://tlsf.baisoku.org/... I haven't tested it, but I'm going to make the necessary changes. I would say that using locks would be best (that'll add in the framework for threads). Aside from that, why not make the code thread-safe by avoiding the use of global/static variables?
Yeah, I was thinking something similar except in a tree structure. You can have subtrees and such of command tables. The tree itself is kept in WORKRAM-H as well as all the command tables (there should be an upper bound on number of command tables in memory). Priority and order is based on how the tree is to be traversed. Before the entire tree of command tables is updated. That is, the only the ones that have changed in WORKRAM-H (essentially this tree is a cache of VDP1 VRAM) -- they're sorted properly by the LUT of transfers passed to either of the three SCU DMA levels. Or they're sorted by using the linked list which tells VDP1 what command table to access next. Chances are, it's going to be a mixture of both. Example: So considering adding an X number of command tables in WORKRAM-H at address W. Starting at offset Y in VDP1 VRAM where the first command table is stored, I could traverse the tree and create a LUT of transfers for SCU-DMA (indirect mode): src: W[X - 1], dst: Y[0], size: 32B src: W[X - 2], dst: Y[1], size: 32B src: W[0], dst: Y[2], size: 32B src: W[1], dst: Y[3], size: 32B src: W[4], dst: Y[4], size: 32B src: W[5], dst: Y[5], size: 32B src: W[6], dst: Y[6], size: 32B And so on. Now what if I want to update command table W[5] and delete W[6]? Update them in WORKRAM-H by writing a bit in W[6] that tells VDP1 to skip the command table. As for W[5], update whatever. Then do another SCU-DMA transfer of only two transfers: src: W[5], dst: Y[5], size: 32B src: W[6], dst: Y[6], size: 32B I'm going to have to keep track which have changed. As for allocating memory, yeah that should be done by the standard malloc/free. Textures on the other hand could be done through garbage collection. For example, if I delete W[6] and it used a texture then decrement the ref. counter. Then put it back on the free/used list. I could use that standard allocator just for this purpose! Speed it up since the smallest we'll go is for a 8x1 4-bit texture (padded to be a 8x2 4-bit texture). And this code is in the public domain or MIT/BSD licensed? What's difficult about texture allocation is the fact that we can allow texture sizes in the Y direction to be not in powers of 2. So we're going to waste some VDP1 VRAM by padding everything. What do you think? Do you think this is viable for 3D? |
mrkotfw | Mar 12, 2012 | |||||||||||
Looks like your standard K&R used/free list allocator. If things go down hill with TLSF, I'll go with this much simpler implementation. By the way, if there is any code you want to add, just clone and I'll be sure to merge your work in! |
mrkotfw | Mar 12, 2012 | |||
One more update. I'm really itching on just writing a DSP assembler in Python. All I need is to flesh out the BNF grammar, write my own lexer and parser (LL(1) parser because I'm lazy). Or I can just get that shit done by using a lexer/parser packaged in a nice module. antime, if you're lurking... do you have an errata of the errors in the SCU manual found by you? Maybe I could just take a peek inside Yabause's disassembler! It also has a nice list of games using the DSP. |
ExCyber | Mar 12, 2012 | |||||
Most assemblers that I've used don't seem to even bother with a proper grammar; they seem to just go line-by-line and have fairly brittle parsing of each line. A lot of them will barf if an instruction line doesn't start with enough whitespace, for example. |
Chilly Willy | Mar 12, 2012 | |||||||||||||||||||||||
One thing I found from my own C++ example - do not include iostream! The linker doesn't seem able to tell what code is used and what isn't due to the binary bits at the start of console binaries (at least not on the MD and 32X). That means that whatever you include is left in its entirety in the binary at the end... which means my 8K TicTacToe ballooned to several HUNDRED K. The iostream is HUGE, mainly because it deals with text in and out.
Oh, that's nice! I saw that in RockBox, but the license was different. I didn't see this one.
With old, slow consoles (and computers), it's often best to ignore as much of that as possible. Always locking/unlocking can kill your performance, and having the extra overhead of actual threads can cut speeds by a third or worse. That said, sometimes you NEED locking between two CPU (32X or Saturn). For example, my sound mixer for the 32X: the Master SH2 sets/changes the entries in the voice table, but the Slave goes through the table to do the mixing. Clearly, I need to lock the list from one side or the other for changes/mixing. The interface in the 32X is not designed handle TAS atomic bus cycles, so I wound up using one of the communications registers like this:
Code:
That worked well - the communications registers are fast and uncached, and can be read and written by both CPUs at the same time. Since they CAN both write at the same time, it's not guaranteed which CPU will actually have it's data stored; hence the race condition check. The Saturn doesn't have communications registers. I'm also not sure if any of the blocks of ram are capable of TAS - I haven't seen anything about that in any of the manuals yet. Code like above can be done on uncached ram, but would be slower if the ram is wired for burst read access.
If you need extra info on how to go through a list of data, a linked list or tree is better. If you don't have that, just allocating blocks is probably better. Sounds like a list/tree is what you want here, from the example.
Back when I was using it on the 32X for Tremor, I hunted around until I found a post from malkia where he told someone the code can be used any way they wish. By the way, I'm sure you're familiar with this forum? https://mollyrocket.com/forums/viewforum.php?f=16... That's one of the best sites for PD code for things. stb_image.c is one of the most widely used pieces of PD code out.
Sounds good so far. I'll go over it more thoroughly when you have more to go over. If the scheme gets too wasteful, people can always just allocate a large block to cover all their data instead of handling it individually. Something to keep in mind - in 3D many folks put a BUNCH of different textures in the same texture block. That's because many GPUs require textures to be powers of two in both directions, meaning lots of waste for single textures in many cases... unless you pack more than one texture in the same block. |
Chilly Willy | Mar 12, 2012 | ||||||||
Yes, it's VERY simple - very small code base with extremely low ram overhead. It was perfect for the 32X given you need to keep both of those to a minimum. I used it on the Slave SH2 for allocations made by Tremor. Between songs, I'd just reinit the heap to free all the memory. The Tremor lowmem branch leaks ram, so the current recommendation is to use your own allocator for the Tremor allocations and reinit the heap after each song to make sure the leak doesn't propagate. If you don't, it runs out of ram on the third or fourth song depending on the allocations and the size of the heap.
I'm not as up on git in this area... too used to svn and cvs. I need to review the git manual on that. |
antime | Mar 13, 2012 | |||||||||
The only one I found was the alternative encoding for X-bus NOPs which I documented in my old disassembler. Yabause and MAMEs sources are a much better reference.
The only warning given in the documentation regarding atomic operations is a prohibition on using the MC68000's TAS instruction. Sega's own libraries include a SYS_TASSEM function which is described as using the TAS instruction, but I haven't checked the actual implementation. Additionally memory locations 0x1000000-0x17FFFFF and 0x1800000-0x1FFFFFF are connected to the slave and master SH2's FRT input capture, respectively. |
mrkotfw | Mar 13, 2012 | ||||||||||||||||||||
Yeah, I'm not going to do that. That just opens the possibility for more bugs. The grammar really shouldn't be difficult. I just need a better delimiter for when there are parallel instructions. If I can't find a good parser, then a simple LL(1) will do granted that my grammar has no left-recursion (shouldn't be a problem).
...binary bits? Not sure what you mean. Thanks for the link. I never heard of that place. And thanks for the lock code. As for textures, that's what I was thinking. A large collection of unique textures could be allocated by simply allocating a large single texture. There's the upside of not having to do more than 1 allocation (that is if all the textures you need are within that large collection of textures). Yeah, a tree would be best since
If you have Github, they have documentation on how you can set this up and track my repos. If you commit/push in your own, I can merge your work into mine and vice versa.
Got it. I'll be looking at both for reference/documentation as well as your disassembler. |
mrkotfw | Mar 13, 2012 | |||
antime, I'm interested in getting your transfer tool to work. http://koti.kapsi.fi/~antime/sega/usb.html... Do you have a pin-out of what needs to be soldered onto the Saturn serial port? Images of your set up and where/what you soldered would be great. Basically, any schematics? I'd like to incorporate this into libyaul and hopefully get a GDB-stub going as well. I'm not particularly fond of the AR cartridge. I can't seem to find the original thread (if there ever was one) about your transfer tool. |
antime | Mar 13, 2012 | ||||
Looking at the bottom of the Saturn mainboard, the link port's pinout is
Code:
I just soldered wires directly to the corresponding TxD/RxD/SCK pins of a DLP-2232M... module. Unfortunately the FT2232 IC uses incompatible pinouts in asynchronous and synchronous serial modes, so you have to decide which one you want to use on channel A. I used synchronous mode because of the faster transfer speed (~1.2Mbit/s vs. ~224Kbit/s), but async is a lot easier to work with. |
Chilly Willy | Mar 13, 2012 | |||||||||||
The rom header, mainly. The 32X has two headers - the normal MD rom header, and the 32X header for the SH2s. It also has an exception jump table that replaces the regular exception vectors.
If you need some general purpose jpg/png code, stb_image.c is really good. Your alternative is libjpeg and libpng, both of which are rather complicated to use - they don't have a single entry point that returns an image, you have to read in the image one line at a time. I haven't used his TTF code yet, but it looks better/easier than corresponding TTF library.
I'll look into that. |
< Prev | 1 | 2 | 3 | 4 | 5 | ... | 9 | Next> |