Author Topic: CD functions without the system card  (Read 1209 times)

Bonknuts

  • Hero Member
  • *****
  • Posts: 3292
CD functions without the system card
« on: June 02, 2016, 09:18:25 AM »
I know some people have duplicated or written their own CD access code for the units, outside of using the system card, but most of them are probably lost over the ages.

 Notably, Charles MacDonald had a ADPCM source lib that could be used for hucard projects (enchanced hucards).

 I'm really interested in something like this. A game that's on the hucard, but is meant for the CD system (you supply the CD to for the audio, etc).

 Anyone else experiment in do this? Or interested in this?

elmer

  • Hero Member
  • *****
  • Posts: 2153
Re: CD functions without the system card
« Reply #1 on: June 02, 2016, 10:22:47 AM »
Notably, Charles MacDonald had a ADPCM source lib that could be used for hucard projects (enchanced hucards).

Having an ADPCM library could be useful.  :-k

Then you could just run a huge game off of a Turbo Everdrive v2 and still have some ADPCM sound ... but I don't think that you could make it stream the ADPCM, and so that would limit its usefulness.

As for running a CD game off of a custom HuCard ... I can't see the advantage myself, but there's obviously something that you're seeing in the idea that I'm missing.

It would just seem to bump up the costs in duplicating a homebrew game for little real advantage.

Now ... having Hudson's fast-access CD code disassembled so that we could all use it in CD games ... that is something that I can see being a huge benefit to homebrew.  :wink:

Bonknuts

  • Hero Member
  • *****
  • Posts: 3292
Re: CD functions without the system card
« Reply #2 on: June 02, 2016, 10:55:21 AM »
Quote
Now ... having Hudson's fast-access CD code disassembled so that we could all use it in CD games ... that is something that I can see being a huge benefit to homebrew.  :wink:
That I already have and use it from time to time, if you want it. It's 8k because it has other new functions. I didn't strip them out.

Quote
I can't see the advantage myself
Hacking hucard games for ADPCM sound FX and to use an audio CD if present for music. Soo much easier than trying to convert a hucard to fit to CDRAM form factor, bank layout, etc. Just through in the hucard and audio CD, and be done with it ;)

 Or make bi-compatible hucards. Function as normal hucards, or enhanced if used on a CD system (the base 64k is available to hucards, which is nice too).

TheOldMan

  • Hero Member
  • *****
  • Posts: 958
Re: CD functions without the system card
« Reply #3 on: June 02, 2016, 11:33:27 AM »
Quote
That I already have and use it from time to time, if you want it.

Yes, please.

Quote
Or make bi-compatible hucards. Function as normal hucards, or enhanced if used on a CD system

Or add extra levels on the cd...
(I'd love to see tuokos chuck no-rice game with loadable characters to use)

Bonknuts

  • Hero Member
  • *****
  • Posts: 3292
Re: CD functions without the system card
« Reply #4 on: June 02, 2016, 12:07:59 PM »
Here's the lib that I ripped from Seiya Monogatari: http://pcedev.net/CD_read_lib/CD_read_lib.zip



 Just to note: For some reason the CD lib routine zeroes out the third upper most byte for the LBA offset. So you're limited to 128megabyte address range within a data track. Can be fixed/altered, but I didn't bother changing it.

 The CD read lib is fast; much faster than the system card one. I listed what I figured out what the other library entries do, but I didn't document their arguments, since I was only really interested in the CD read routines (CD to VRAM one is documented).



For the curious, here's the list of lib functions:
  4000  initialize system environment
  4003  play audio track (???)
  4006  internal (reinitialize ???)
  4009  wait_vblank (ver1)
  400c  LZ decompress to ram
  400f  mem->mem DMA
  4012  setup for 404b
  4015  NULL (jumps to itself)
  4018  LZ decompress to vram
  401b  header/prep setup (???)
  401e  LZ decomp ADPCM to vram
  4021  bram wait loop (???)
  4024  LZ decomp ADPCM to mem
  4027  MEM->VRAM DMA
  402a  402b + wait_vsync(ver2)
  402d  VCE pal arg update
  4030  VCE pal write
  4033  wait_vsync(ver2)
  4036  update satb
  4039  CD_READ to vram
  403c  CD_READ to ram
  403f  another CD_READ
  4042  CD_audio/ADPCM related
  4045  NULL
  4048  NULL
  404b  (long) time delay loop
  404e  internal reinitialize
  4051  internal update (???)
  4054  interal check (???)
  4057  get status (???)
  405a  internal reinitialize
  405d  update INT jmp addr(hsync branch)
  4060  PSG reinitialize
« Last Edit: June 02, 2016, 12:12:43 PM by Bonknuts »

TheOldMan

  • Hero Member
  • *****
  • Posts: 958
Re: CD functions without the system card
« Reply #5 on: June 02, 2016, 06:00:57 PM »
Thank you tom. And I don't see 128M per data track as a problem. Yet.

Bonknuts

  • Hero Member
  • *****
  • Posts: 3292
Re: CD functions without the system card
« Reply #6 on: June 06, 2016, 11:30:43 AM »
Cool. The data limit is easy to fix. Not sure why they did that, but long seek ranges probably present a problem (stall more?).

 Also, the LZ routines are decent. Especially the ring buffer stuff (port to port, or local to port). I could probably adapt LZSS compressor for it. Though I've been using PuCrunch (aplib, and one other one), since they compress better - but decompress slower. That, and I saw some other things that elmer and touko posted about fast decompress routines (LZ4 ?).
« Last Edit: June 06, 2016, 11:33:36 AM by Bonknuts »

touko

  • Hero Member
  • *****
  • Posts: 953
Re: CD functions without the system card
« Reply #7 on: June 07, 2016, 07:52:15 AM »
There is a PCe game which don't use a sat_b transfert routine and write sprite attributes directly in VRAM ??

elmer

  • Hero Member
  • *****
  • Posts: 2153
Re: CD functions without the system card
« Reply #8 on: June 07, 2016, 06:45:20 PM »
Also, the LZ routines are decent. Especially the ring buffer stuff (port to port, or local to port). I could probably adapt LZSS compressor for it. Though I've been using PuCrunch (aplib, and one other one), since they compress better - but decompress slower. That, and I saw some other things that elmer and touko posted about fast decompress routines (LZ4 ?).

Hmmm ... I'll have to take a look to see what they're doing.

It's always fun to steal good ideas from other people's code!  :wink:

IMHO, you can avoid looking too hard at LZ4 ... but then, I'm somewhat biased.  O:)

The current version of SWD5 that's in the Xanadu games is compromised in terms of performance by having to fit into the minimum space (approx 256 bytes), and to avoid zero-page optimizations because I didn't know what Falcom was using or relying on.

Allowing it to use $F8-$FF for a temporary block-transfer instruction call makes a number of other optimizations possible.

Anyway ... getting back to the System Card ... I finally got pissed-enough with the PDF of the Hu7 CD documentation that's on Tom's PCEDEV site, that I used Acrobat Pro to disassemble the thing back into individual pages, then edited every single page in Photoshop to remove the most nasty of the artifacts from the original scan to leave almost-clean pages, and finally reloaded it all back into Acrobat to be deskewed and OCR'd.

So, there's now a copy of the CD documentation that looks clean and reasonably-horizontal, and that contains "searchable text" so that you can quickly find particular function call documentation with a simple search.

It's still a fairly crappy scan, there's nothing that I can do about that, particularly in comparison to the rest of the system docs that are "out-there", but it's a million times better than not having the documentation at all!  :)

Anyone that's interested can get it from here (for a while) ...

https://www.dropbox.com/s/yqxxee0893378nv/Hu7%20CD%20System%20Development%20Manual.pdf?dl=0

TheOldMan

  • Hero Member
  • *****
  • Posts: 958
Re: CD functions without the system card
« Reply #9 on: June 07, 2016, 07:02:33 PM »
I'm not 100% positive (I could be wrong), but I Think there's already a block transfer instruction area at $26cc in the user area. It might be something HuC does, though. 
The other 'neat' option I saw was setting it up in an empty area in the card memory. The code tom posted has a tia (?) instruction that gets modifed in place, iirc :)

Bonknuts

  • Hero Member
  • *****
  • Posts: 3292
Re: CD functions without the system card
« Reply #10 on: June 08, 2016, 04:15:20 AM »
I'm not 100% positive (I could be wrong), but I Think there's already a block transfer instruction area at $26cc in the user area. It might be something HuC does, though. 
The other 'neat' option I saw was setting it up in an empty area in the card memory. The code tom posted has a tia (?) instruction that gets modifed in place, iirc :)


That's a pretty common practice with PCE devs (from what I've seen). Turns the Txx instructions into DMA instructions instead of fixed instructions. Txx, src,dest,len,rts.

 I was just mentioning the compression stuff, because sometimes you want the slower, better compression ratio, and sometimes you need fast decompression with an OK ratio.

 elmer: Is it ok if I host this at my site?
« Last Edit: June 08, 2016, 04:17:48 AM by Bonknuts »

elmer

  • Hero Member
  • *****
  • Posts: 2153
Re: CD functions without the system card
« Reply #11 on: June 08, 2016, 05:08:19 AM »
I'm not 100% positive (I could be wrong), but I Think there's already a block transfer instruction area at $26cc in the user area. It might be something HuC does, though.

Yep, that's just HuC, I'm afraid, and so not something that I want to count on. Although any compiler library for HuCard usage is going to have the same thing somewhere in memory.

In my case, I need the TII in zero-page memory while a file is decompressing in order to get a speed improvement.

That would allow the decompressor to use the TII's src and dst addresses as indirect pointers for short 1 byte (and maybe 2 byte) copies in order to avoid the subroutine call and TII overheads.


That's a pretty common practice with PCE devs (from what I've seen). Turns the Txx instructions into DMA instructions instead of fixed instructions. Txx, src,dest,len,rts.

Yep, Hudson really added some very useful stuff to the PCE's CPU.


Quote
I was just mentioning the compression stuff, because sometimes you want the slower, better compression ratio, and sometimes you need fast decompression with an OK ratio.

Yep, I agree.  :wink:

I'll have to take a look at that code more deeply, but I'll be a bit surprised if it's a lot faster than a speed-optimized version of SWD.

We're just talking about variants of the same LZSS concept. They both end up doing lots of memory-to-memory copies.

I'm curious if they're really keeping a separate ring-buffer ... that's pretty slow!

But what I'm more curious about, is what compromises they made in order to decompress directly into VRAM or ADPCM memory.

If they're keeping a 4KB ring-buffer in RAM, then that may negate a lot of their potential performance improvements over just decompressing into RAM and then copying to VRAM.

OTOH ... it is probably the most sensible way to do it if you're dealing with the limited 8KB RAM in the base PCE.


Quote
elmer: Is it ok if I host this at my site?

Sure, that would be great. I have no idea how long it will last on my dropbox.  :)

I should really write out a 2nd PDF file with just the section about the BIOS calls ... that's what I want to refer to all the time, not the stuff at the start or the end of the document.

Bonknuts

  • Hero Member
  • *****
  • Posts: 3292
Re: CD functions without the system card
« Reply #12 on: June 08, 2016, 09:50:37 AM »
Quote
If they're keeping a 4KB ring-buffer in RAM, then that may negate a lot of their potential performance improvements over just decompressing into RAM and then copying to VRAM.
I'd rather go for a smaller ring buffer like 4k, even for a CD project, if decompressing to a port. 256k is already tight as it is, and keeping free work mem for other things is important (speeds things up).

 While I only tested Pucrunch, which IS more than just LZSS, the LZ window size didn't impact much from the tests that I did. Even windows as small as 512bytes were very efficient. While I realize plain LZSS wouldn't have the same results for a window that small, 4k is decent for LZss. Note: I have no idea how big the LZ window is in that CD lib.

 I wonder how much performance increase one could get with specialized code using the arcade card registers, for C.

elmer

  • Hero Member
  • *****
  • Posts: 2153
Re: CD functions without the system card
« Reply #13 on: June 08, 2016, 05:26:38 PM »
I'd rather go for a smaller ring buffer like 4k, even for a CD project, if decompressing to a port.

Sometimes I feel like Morhpeus in "The Matrix".

I can only say ...



To paraphrase ... "There Is No Ring Buffer".

The ring-buffer concept is an artifact of Haruhiko Okumura's need to find a way for his LZSS.EXE program to LZSS compress floppy-disk sized files within the confines of a 64KB-or-less computer.

There is no need for the concept within the actual LZSS algorithm, or any of the derivative compressors/decompressors that don't share the same limitations.

It is just a way to access the last 4KB of decompressed data ... the "window".

If you're decompressing directly into RAM, then you automatically have that "window" available to you without the need for any ring-buffer processing.

****************

My SWD code isn't either unique, or particularly smart.

I've followed the same kind of logic paths that lead to PuCrunch ... how to more-efficiently store the basic LZSS repeat-count & offset pairs.

You can just decompress directly into the destination buffer ... no extra memory is needed, and there's particularly no need for that 4KB ring-buffer.

The only exception that I can see is when decompressing to VRAM/ADPCM RAM, and then, IMHO, it's trivially easy (and much faster) to decompress directly into RAM and then copy the data to VRAM afterwards.

Since the LZSS "window" is usually only 4KB anyway, it doesn't really hurt you much to just split your data-to-be-compressed into 8KB-maximum chunks, and to process them separately. That also means that the output will fit within a single bank if started at the beginning, or 2 banks if you want a "stream" of decompressed data.

That's the approach that Falcom take in Xanadu 1 and Xanadu 2 ... and I agree with them.

The LZSS window size has only a limited effect upon the compression (statistically), which is why longer offset codes are allocated to matches that are further away, and shorter offset codes are allocated to closer matches.

The question is whether you even bother processing RLE sequences at all.

Since an RLE sequence is both rather rare, and easily compressed within the regular LZ77 scheme, I decided "no".
« Last Edit: June 09, 2016, 04:57:48 AM by elmer »

Bonknuts

  • Hero Member
  • *****
  • Posts: 3292
Re: CD functions without the system card
« Reply #14 on: June 09, 2016, 08:56:25 AM »
To paraphrase ... "There Is No Ring Buffer".

The ring-buffer concept is an artifact of Haruhiko Okumura's need to find a way for his LZSS.EXE program to LZSS compress floppy-disk sized files within the confines of a 64KB-or-less computer.

 The ring buffer very much exists, but only if you write the decode to use a ring buffer instead of a sliding window. The two work slightly differently. I modified Pucrunch's sliding LZ window to a ring buffer function so that I could decompress large chunks of data to vram without needing to decompress the whole thing into ram first (which I could for hucard projects because of the ram limitations). The ring buffer concept didn't exist inside the original decompressor.



Quote
If you're decompressing directly into RAM, then you automatically have that "window" available to you without the need for any ring-buffer processing.
Yes, to ram. Which is why I added "if decompressing to a port".

 I rarely ever need to decompress anything LZ level compression to ram. Almost all assets are vram bound (tiles, sprites, but not tilemaps). Even if I did use it for text blocks, they'd be pretty small (easily 4k-8k). I wouldn't want waste sys card ram for more than that (have a larger buffer just to avoid using a ring buffer, for vram stuffs). One size doesn't fit all, but if you're decompressing to the ring buffer and vram at the same time, it's going to be faster since you're writing directly to vram - the ring buffer is just for history access. Faster than decompressing to the whole thing to local ram, and then re-copying it to vram. Of course if its a background process, and the objects are small enough, local ram decompression would be the better choice for obvious reasons (to avoid vram write pointer corruption).