Just in case anyone is interested ... I took a break and had a look at the original Legend of Xanadu 1 again.
The scripting language is almost identical.
The code "markers" that identify where the scripts are located are all different, but it doesn't take too long to find them and to fix up the search algorithm.
I've found and extracted 143 script chunks from it so far.
Hmmm ... well Xanadu 1 is simultaneously both an "easier" and a "harder" game to hack than Xanadu 2.
The nice thing is that the architecture is basically the same as Xanadu 2, with a "permanent" set of boot/utility code from $2000-$3fff in memory, and then "overlay" code from $4000-$9fff, and "script" chunks that get decompressed as needed into $a000-$bfff.
The game "overlay" code is the same for every top-down level, and the game just loads in a different 176KB META-BLOCK compressed data file that contains the level's graphics and scripts.
So far, so good.
Common scripts, such as item-names are located in the game overlay to save memory.
But then they were still running out of memory ... a couple of the levels have only a few bytes free in the 176KB allowed for each level.
So they also mapped another block semi-permanently into $c000-$dfff and started putting some scripts into that area.
And they were still running out, so that last level splits the script area into 2 4KB chunks and hacks the loading system to decompress 2 different scripts into $a000-$afff and $b000-$bfff. This let them get some more reuse out of the code in that level.
Yuk!
So finding all the scripts has been a bit nasty ... particularly the side-view Weapon Shop scripts which are done in a very different method to everything else. (BTW ... up to 181 scripts with text, now.)
Anyway ... the conclusion from all of this is that Xanadu 1 is pretty short on free memory for the translation.
***********************
Falcom already compresses the original SJIS text by encoding the 192 most-common katankana/kanji into a single byte, and this really works out well for them.
Xanadu 1 has 235,523 SJIS glyphs stored using 271,679 bytes.
Xanadu 2 has 96,139 SJIS glyphs stored using 116,346 bytes.That's an approx 1.2 multiplier,
much better than the 2.0 multiplier of pure SJIS.
Apart from showing that Xanadu 2 really
is a much shorter story than Xanadu 1, it shows that we've got a problem.
In order to get a good English translation, SamIAm estimates that we're going to need approx 1.5 to 2.0 times the amount of English characters as Kanji glyphs.
This gives me 2 problems ... how to fit all this English text into a level's compressed 176KB META-BLOCK that gets loaded into memory ... and then how to actually free up enough memory so that a large English script-chunk can be decompressed and accessed in the game.
***********************
Luckily, the first part is easy(ish) ... Xanadu 1 stores all it's data compressed in what I've called the FALCOM1 data format.
If I recompress ALL the data in the game in with SWD, it'll shrink each level's compressed META-BLOCK so that there will be enough memory to store all the extra English text.
The expectation that we were going to hit this sort of problem is one of the reasons for spending so much time messing around with compression earlier.
So here are the results of recompressing each of the 12 levels in different formats (the numbers in braces are the Falcom1 compressed and decompressed sizes).
Blk $00d9800 71 Chk (161,912 / 313,390), Fal2 135,530, Swd4 131,437, Swd5 130,601
Blk $0105800 69 Chk (150,455 / 304,021), Fal2 126,807, Swd4 123,600, Swd5 122,706
Blk $0131800 85 Chk (179,364 / 346,665), Fal2 150,903, Swd4 146,514, Swd5 145,484
Blk $015d800 83 Chk (177,218 / 344,515), Fal2 148,163, Swd4 143,686, Swd5 142,635
Blk $0189800 76 Chk (168,666 / 334,124), Fal2 141,733, Swd4 137,780, Swd5 136,851
Blk $01b5800 78 Chk (175,358 / 333,670), Fal2 146,457, Swd4 142,661, Swd5 141,742
Blk $01e1800 80 Chk (169,941 / 329,813), Fal2 142,395, Swd4 138,714, Swd5 137,754
Blk $020d800 79 Chk (179,178 / 334,334), Fal2 147,208, Swd4 143,309, Swd5 142,467
Blk $0239800 67 Chk (160,874 / 302,719), Fal2 136,316, Swd4 132,410, Swd5 131,443
Blk $0265800 84 Chk (178,110 / 338,782), Fal2 146,571, Swd4 142,236, Swd5 141,287
Blk $0291800 60 Chk (133,598 / 266,361), Fal2 113,042, Swd4 109,641, Swd5 108,871
Blk $02bd800 102 Chk (177,269 / 361,138), Fal2 144,124, Swd4 137,984, Swd5 137,007The game allows for 180,224 bytes (176KB) for the compressed block.
Taking a look at the largest level ...
FAL1 compressed 179,364
SWD5 compressed 145,484Which means we
should be able to afford not only to have SamIAm do the best translation, but I should also be able to afford to leave 8KB of that space free to use for decompressing the text.
It's definitely going to be a pain to completely replace Xanadu 1's original compression code/data, but I really don't think that there's much of an alternative.
***********************
The second part, allowing decompressed scripts to be larger, is going to be tricky.
If the Xanadu games had just used a nice-and-simple text-printing routine, then it would have been easy to hack the code to switch in a new bank of text at the start of the routine, and then switch it back again at the end.
Unfortunately, since the text is contained within the game's scripting language, and those scripts are located in nearly every possible banked-region in memory, and the script itself is read from multiple different pieces of code ... I don't think that I can get away with anything that simple.
The only solution that I can come up with at the moment is going to mean switching out the CD BIOS code that's mapped into $e000-$ffff.
If I map the 8KB bank of RAM that I've freed up from the 176KB of compressed level data into that area, then I'm going to have a lot of extra space for the translation and for the English font code.
It'll mean hacking the loading code to decompresses scripts into both $a000-$bfff and $f000-$ffef, but that's not
too horrible.
The idea would be to map the CD BIOS vectors and interrupt vectors to new code that switches to the original BIOS and executes the original BIOS functions, and then switches back to my RAM bank afterwards.
IIRC, Bonknuts suggested doing something like this on his blog, but I'm not sure if anyone has done this yet in practice.
If they have, then I'd
love to hear about it, and about what the potential problems are!