Author Topic: CD functions without the system card (Read 1203 times)

Bonknuts · « **on:** June 02, 2016, 09:18:25 AM »

I know some people have duplicated or written their own CD access code for the units, outside of using the system card, but most of them are probably lost over the ages.

Notably, Charles MacDonald had a ADPCM source lib that could be used for hucard projects (enchanced hucards).

I'm really interested in something like this. A game that's on the hucard, but is meant for the CD system (you supply the CD to for the audio, etc).

Anyone else experiment in do this? Or interested in this?

elmer · « **Reply #1 on:** June 02, 2016, 10:22:47 AM »

Quote from: Bonknuts on June 02, 2016, 09:18:25 AM

Notably, Charles MacDonald had a ADPCM source lib that could be used for hucard projects (enchanced hucards).

Having an ADPCM library could be useful.

Then you could just run a huge game off of a Turbo Everdrive v2 and still have some ADPCM sound ... but I don't think that you could make it stream the ADPCM, and so that would limit its usefulness.

As for running a CD game off of a custom HuCard ... I can't see the advantage myself, but there's obviously something that you're seeing in the idea that I'm missing.

It would just seem to bump up the costs in duplicating a homebrew game for little real advantage.

Now ... having Hudson's fast-access CD code disassembled so that we could all use it in CD games ... that is something that I can see being a huge benefit to homebrew.

Bonknuts · « **Reply #2 on:** June 02, 2016, 10:55:21 AM »

Quote

Now ... having Hudson's fast-access CD code disassembled so that we could all use it in CD games ... that is something that I can see being a huge benefit to homebrew.

That I already have and use it from time to time, if you want it. It's 8k because it has other new functions. I didn't strip them out.

Quote

I can't see the advantage myself

Hacking hucard games for ADPCM sound FX and to use an audio CD if present for music. Soo much easier than trying to convert a hucard to fit to CDRAM form factor, bank layout, etc. Just through in the hucard and audio CD, and be done with it

Or make bi-compatible hucards. Function as normal hucards, or enhanced if used on a CD system (the base 64k is available to hucards, which is nice too).

TheOldMan · « **Reply #3 on:** June 02, 2016, 11:33:27 AM »

Quote

That I already have and use it from time to time, if you want it.

Yes, please.

Quote

Or make bi-compatible hucards. Function as normal hucards, or enhanced if used on a CD system

Or add extra levels on the cd...
(I'd love to see tuokos chuck no-rice game with loadable characters to use)

Bonknuts · « **Reply #4 on:** June 02, 2016, 12:07:59 PM »

Here's the lib that I ripped from Seiya Monogatari: http://pcedev.net/CD_read_lib/CD_read_lib.zip

Just to note: For some reason the CD lib routine zeroes out the third upper most byte for the LBA offset. So you're limited to 128megabyte address range within a data track. Can be fixed/altered, but I didn't bother changing it.

The CD read lib is fast; much faster than the system card one. I listed what I figured out what the other library entries do, but I didn't document their arguments, since I was only really interested in the CD read routines (CD to VRAM one is documented).

For the curious, here's the list of lib functions:
4000 initialize system environment
4003 play audio track (

)
4006 internal (reinitialize

)
4009 wait_vblank (ver1)
400c LZ decompress to ram
400f mem->mem DMA
4012 setup for 404b
4015 NULL (jumps to itself)
4018 LZ decompress to vram
401b header/prep setup (

)
401e LZ decomp ADPCM to vram
4021 bram wait loop (

)
4024 LZ decomp ADPCM to mem
4027 MEM->VRAM DMA
402a 402b + wait_vsync(ver2)
402d VCE pal arg update
4030 VCE pal write
4033 wait_vsync(ver2)
4036 update satb
4039 CD_READ to vram
403c CD_READ to ram
403f another CD_READ
4042 CD_audio/ADPCM related
4045 NULL
4048 NULL
404b (long) time delay loop
404e internal reinitialize
4051 internal update (

)
4054 interal check (

)
4057 get status (

)
405a internal reinitialize
405d update INT jmp addr(hsync branch)
4060 PSG reinitialize

TheOldMan · « **Reply #5 on:** June 02, 2016, 06:00:57 PM »

Thank you tom. And I don't see 128M per data track as a problem. Yet.

Bonknuts · « **Reply #6 on:** June 06, 2016, 11:30:43 AM »

Cool. The data limit is easy to fix. Not sure why they did that, but long seek ranges probably present a problem (stall more?).

Also, the LZ routines are decent. Especially the ring buffer stuff (port to port, or local to port). I could probably adapt LZSS compressor for it. Though I've been using PuCrunch (aplib, and one other one), since they compress better - but decompress slower. That, and I saw some other things that elmer and touko posted about fast decompress routines (LZ4 ?).

touko · « **Reply #7 on:** June 07, 2016, 07:52:15 AM »

There is a PCe game which don't use a sat_b transfert routine and write sprite attributes directly in VRAM ??

elmer · « **Reply #8 on:** June 07, 2016, 06:45:20 PM »

Quote from: Bonknuts on June 06, 2016, 11:30:43 AM

Also, the LZ routines are decent. Especially the ring buffer stuff (port to port, or local to port). I could probably adapt LZSS compressor for it. Though I've been using PuCrunch (aplib, and one other one), since they compress better - but decompress slower. That, and I saw some other things that elmer and touko posted about fast decompress routines (LZ4 ?).

Hmmm ... I'll have to take a look to see what they're doing.

It's always fun to steal good ideas from other people's code!

IMHO, you can avoid looking too hard at LZ4 ... but then, I'm somewhat biased. O:)

The current version of SWD5 that's in the Xanadu games is compromised in terms of performance by having to fit into the minimum space (approx 256 bytes), and to avoid zero-page optimizations because I didn't know what Falcom was using or relying on.

Allowing it to use $F8-$FF for a temporary block-transfer instruction call makes a number of other optimizations possible.

Anyway ... getting back to the System Card ... I finally got pissed-enough with the PDF of the Hu7 CD documentation that's on Tom's PCEDEV site, that I used Acrobat Pro to disassemble the thing back into individual pages, then edited every single page in Photoshop to remove the most nasty of the artifacts from the original scan to leave almost-clean pages, and finally reloaded it all back into Acrobat to be deskewed and OCR'd.

So, there's now a copy of the CD documentation that looks clean and reasonably-horizontal, and that contains "searchable text" so that you can quickly find particular function call documentation with a simple search.

It's still a fairly crappy scan, there's nothing that I can do about that, particularly in comparison to the rest of the system docs that are "out-there", but it's a million times better than not having the documentation at all!

Anyone that's interested can get it from here (for a while) ...

https://www.dropbox.com/s/yqxxee0893378nv/Hu7%20CD%20System%20Development%20Manual.pdf?dl=0

TheOldMan · « **Reply #9 on:** June 07, 2016, 07:02:33 PM »

I'm not 100% positive (I could be wrong), but I Think there's already a block transfer instruction area at $26cc in the user area. It might be something HuC does, though.
The other 'neat' option I saw was setting it up in an empty area in the card memory. The code tom posted has a tia (?) instruction that gets modifed in place, iirc

Bonknuts · « **Reply #10 on:** June 08, 2016, 04:15:20 AM »

Quote from: TheOldMan on June 07, 2016, 07:02:33 PM

I'm not 100% positive (I could be wrong), but I Think there's already a block transfer instruction area at $26cc in the user area. It might be something HuC does, though.
The other 'neat' option I saw was setting it up in an empty area in the card memory. The code tom posted has a tia (?) instruction that gets modifed in place, iirc

That's a pretty common practice with PCE devs (from what I've seen). Turns the Txx instructions into DMA instructions instead of fixed instructions. Txx, src,dest,len,rts.

I was just mentioning the compression stuff, because sometimes you want the slower, better compression ratio, and sometimes you need fast decompression with an OK ratio.

elmer: Is it ok if I host this at my site?

elmer · « **Reply #11 on:** June 08, 2016, 05:08:19 AM »

Quote from: TheOldMan on June 07, 2016, 07:02:33 PM

I'm not 100% positive (I could be wrong), but I Think there's already a block transfer instruction area at $26cc in the user area. It might be something HuC does, though.

Yep, that's just HuC, I'm afraid, and so not something that I want to count on. Although any compiler library for HuCard usage is going to have the same thing somewhere in memory.

In my case, I need the TII in zero-page memory while a file is decompressing in order to get a speed improvement.

That would allow the decompressor to use the TII's src and dst addresses as indirect pointers for short 1 byte (and maybe 2 byte) copies in order to avoid the subroutine call and TII overheads.

Quote from: Bonknuts on June 08, 2016, 04:15:20 AM

That's a pretty common practice with PCE devs (from what I've seen). Turns the Txx instructions into DMA instructions instead of fixed instructions. Txx, src,dest,len,rts.

Yep, Hudson really added some very useful stuff to the PCE's CPU.

Quote

I was just mentioning the compression stuff, because sometimes you want the slower, better compression ratio, and sometimes you need fast decompression with an OK ratio.

Yep, I agree.

I'll have to take a look at that code more deeply, but I'll be a bit surprised if it's a lot faster than a speed-optimized version of SWD.

We're just talking about variants of the same LZSS concept. They both end up doing lots of memory-to-memory copies.

I'm curious if they're really keeping a separate ring-buffer ... that's pretty slow!

But what I'm more curious about, is what compromises they made in order to decompress directly into VRAM or ADPCM memory.

If they're keeping a 4KB ring-buffer in RAM, then that may negate a lot of their potential performance improvements over just decompressing into RAM and then copying to VRAM.

OTOH ... it is probably the most sensible way to do it if you're dealing with the limited 8KB RAM in the base PCE.

Quote

elmer: Is it ok if I host this at my site?

Sure, that would be great. I have no idea how long it will last on my dropbox.

I should really write out a 2nd PDF file with just the section about the BIOS calls ... that's what I want to refer to all the time, not the stuff at the start or the end of the document.

Bonknuts · « **Reply #12 on:** June 08, 2016, 09:50:37 AM »

Quote

If they're keeping a 4KB ring-buffer in RAM, then that may negate a lot of their potential performance improvements over just decompressing into RAM and then copying to VRAM.

I'd rather go for a smaller ring buffer like 4k, even for a CD project, if decompressing to a port. 256k is already tight as it is, and keeping free work mem for other things is important (speeds things up).

While I only tested Pucrunch, which IS more than just LZSS, the LZ window size didn't impact much from the tests that I did. Even windows as small as 512bytes were very efficient. While I realize plain LZSS wouldn't have the same results for a window that small, 4k is decent for LZss. Note: I have no idea how big the LZ window is in that CD lib.

I wonder how much performance increase one could get with specialized code using the arcade card registers, for C.

elmer · « **Reply #13 on:** June 08, 2016, 05:26:38 PM »

Quote from: Bonknuts on June 08, 2016, 09:50:37 AM

I'd rather go for a smaller ring buffer like 4k, even for a CD project, if decompressing to a port.

Sometimes I feel like Morhpeus in "The Matrix".

I can only say ...

To paraphrase ... "There Is No Ring Buffer".

The ring-buffer concept is an artifact of Haruhiko Okumura's need to find a way for his LZSS.EXE program to LZSS compress floppy-disk sized files within the confines of a 64KB-or-less computer.

There is no need for the concept within the actual LZSS algorithm, or any of the derivative compressors/decompressors that don't share the same limitations.

It is just a way to access the last 4KB of decompressed data ... the "window".

If you're decompressing directly into RAM, then you automatically have that "window" available to you without the need for any ring-buffer processing.

****************

My SWD code isn't either unique, or particularly smart.

I've followed the same kind of logic paths that lead to PuCrunch ... how to more-efficiently store the basic LZSS repeat-count & offset pairs.

You can just decompress directly into the destination buffer ... no extra memory is needed, and there's particularly no need for that 4KB ring-buffer.

The only exception that I can see is when decompressing to VRAM/ADPCM RAM, and then, IMHO, it's trivially easy (and much faster) to decompress directly into RAM and then copy the data to VRAM afterwards.

Since the LZSS "window" is usually only 4KB anyway, it doesn't really hurt you much to just split your data-to-be-compressed into 8KB-maximum chunks, and to process them separately. That also means that the output will fit within a single bank if started at the beginning, or 2 banks if you want a "stream" of decompressed data.

That's the approach that Falcom take in Xanadu 1 and Xanadu 2 ... and I agree with them.

The LZSS window size has only a limited effect upon the compression (statistically), which is why longer offset codes are allocated to matches that are further away, and shorter offset codes are allocated to closer matches.

The question is whether you even bother processing RLE sequences at all.

Since an RLE sequence is both rather rare, and easily compressed within the regular LZ77 scheme, I decided "no".

Bonknuts · « **Reply #14 on:** June 09, 2016, 08:56:25 AM »

Quote from: elmer on June 08, 2016, 05:26:38 PM

To paraphrase ... "There Is No Ring Buffer".

The ring-buffer concept is an artifact of Haruhiko Okumura's need to find a way for his LZSS.EXE program to LZSS compress floppy-disk sized files within the confines of a 64KB-or-less computer.

The ring buffer very much exists, but only if you write the decode to use a ring buffer instead of a sliding window. The two work slightly differently. I modified Pucrunch's sliding LZ window to a ring buffer function so that I could decompress large chunks of data to vram without needing to decompress the whole thing into ram first (which I could for hucard projects because of the ram limitations). The ring buffer concept didn't exist inside the original decompressor.

Quote

If you're decompressing directly into RAM, then you automatically have that "window" available to you without the need for any ring-buffer processing.

Yes, to ram. Which is why I added "if decompressing to a port".

I rarely ever need to decompress anything LZ level compression to ram. Almost all assets are vram bound (tiles, sprites, but not tilemaps). Even if I did use it for text blocks, they'd be pretty small (easily 4k-8k). I wouldn't want waste sys card ram for more than that (have a larger buffer just to avoid using a ring buffer, for vram stuffs). One size doesn't fit all, but if you're decompressing to the ring buffer and vram at the same time, it's going to be faster since you're writing directly to vram - the ring buffer is just for history access. Faster than decompressing to the whole thing to local ram, and then re-copying it to vram. Of course if its a background process, and the objects are small enough, local ram decompression would be the better choice for obvious reasons (to avoid vram write pointer corruption).

Author Topic: CD functions without the system card (Read 1203 times)

Bonknuts

CD functions without the system card

elmer

Re: CD functions without the system card

Bonknuts

Re: CD functions without the system card

TheOldMan

Re: CD functions without the system card

Bonknuts

Re: CD functions without the system card

TheOldMan

Re: CD functions without the system card

Bonknuts

Re: CD functions without the system card

touko

Re: CD functions without the system card

elmer

Re: CD functions without the system card

TheOldMan

Re: CD functions without the system card

Bonknuts

Re: CD functions without the system card

elmer

Re: CD functions without the system card

Bonknuts

Re: CD functions without the system card

elmer

Re: CD functions without the system card

Bonknuts

Re: CD functions without the system card