Author Topic: Xanadu II Translation Development Blog (Read 47477 times)

elmer · « **Reply #45 on:** September 24, 2015, 02:25:52 PM »

Quote from: Bonknuts on September 24, 2015, 05:38:55 AM

Offtopic:

Not really, I'd say ... it's a "blog" about the programmer's side of trying to get a translation done.

I'm trying to give people an insight into what-it-is that the programmer has to do to get the "translator's" work into a game.

That might give people a better idea of why most translations need a programmer's help, and why these things often get stalled or abandoned.

IMHO anything programming-related is fair-game. Even if it doesn't directly apply to a game, it'll probably apply to the tools that may need to be written.

Quote

I learned C++ first, or C with OOP (back in '96), and then switched over to straight C. To this day, I still don't know what all the fuss is about with classes and objects. I should know this, being a computer science major, but I won't be taking any CS courses into after my gen ed are out of the way.

I'm sure that the you'll be taught all sorts of wonderful stories about how OOP is wonderful, and how to program "properly" ... usually by someone that's never worked on a large software project, or had to maintain someone else's code.

Be skeptical!

IMHO, there are some really good things that you can do with OOP techniques, but if you get "religious" about it, you can easily over complicate things.

This site is a fun read after you've had someone evangelizing the benefits of C++ to you ... http://yosefk.com/c++fqa/

Personally, I mostly write C code with the C++ compiler, but there are just some things that are much easier to express with classes.

But, like most old game devs, I avoid std:: and templates like the plague!

Quote

The only thing that annoyed me with C (C99), was that I couldn't created a struct, which contained an array and a set of pointers with offsets into that array. The runtime thingy in C, or whatever it's called, won't initialize the pointers. I don't see why not; those pointers should be relative to the array in which they all belong to (the struct).

I'd be curious to see what you were trying to accomplish with that struct?

TailChao · « **Reply #46 on:** September 25, 2015, 07:00:19 AM »

Quote from: elmer on September 24, 2015, 02:25:52 PM

Not really, I'd say ... it's a "blog" about the programmer's side of trying to get a translation done.

Just wanted to chime in and say I'm really enjoying this.

It's extremely helpful to "compare notes" with how others structure their games, especially in regards to scripting and compression.

dshadoff · « **Reply #47 on:** September 25, 2015, 01:39:54 PM »

Quote from: elmer on September 24, 2015, 02:25:52 PM

Personally, I mostly write C code with the C++ compiler, but there are just some things that are much easier to express with classes.

Same here.
A good enough programmer is going to write code that has a beginning, a middle and an end, and is divided into all the necessary pieces neatly - no matter what language. Kind of like the dream that C++ tries to sell.

On the other hand, a careless C++ programmer will write mangled crap anyway, no matter how much of the Kool-Aid they drank.

Quote

But, like most old game devs, I avoid std:: and templates like the plague!

Or the clap. "STD" only meant "sexually transmitted disease" when I was younger. Then C++ came along and made it mean something equally unpalatable.

-Dave

NightWolve · « **Reply #48 on:** September 26, 2015, 03:25:23 PM »

Quote from: dshadoff on September 25, 2015, 01:39:54 PM

Or the clap. "STD" only meant "sexually transmitted disease" when I was younger. Then C++ came along and made it mean something equally unpalatable.

-Dave

HAHAHAHAHAHA!

flame · « **Reply #49 on:** September 29, 2015, 02:28:49 PM »

Quote from: elmer on September 16, 2015, 09:58:24 AM

I looked on RomHacking, and someone there was asking about hacking one of Falcom's PSP games, and it still seemed to be using the FALCOM2 compression scheme.

I guess I did that. It's like halfway between a straight copy of the MIPS code and a functional copy of the algorithm. I'm not smart enough to do anything more. I worked on Nayuta which used a little more complicated version of what you're calling FALCOM2. Trails in the Sky 3 PC was using straight FALCOM2.

I don't claim to be a good programmer. I can only do simple stuff. I like Python and can't understand C code. It has all those brackets {} and things. Also you have to understand C builtin functions and I'm not sure I do. I get that Python is slow. Unless you're doing a decryption algorithm though, it doesn't have to be fast, at least for Romhacking work.

Beginning, middle and end: My programs tend to follow: input, data processing, output. Is that what's meant?

elmer · « **Reply #50 on:** September 30, 2015, 06:08:43 AM »

Quote from: Dicer on September 24, 2015, 06:18:33 AM

I have no idea what most of this means, but it's interesting reading...

I'm just glad if it's not sending everyone to sleep!

***********************

Quote from: TailChao on September 25, 2015, 07:00:19 AM

It's extremely helpful to "compare notes" with how others structure their games, especially in regards to scripting and compression.

I totally agree ... it's one of the big reasons that I'm doing this.

I've often learned a lot from looking at how other people put their games together.

Everyone used to do that back-in-the-day when someone came up with a particularly "clever" effect.

***********************

Quote from: dshadoff on September 25, 2015, 01:39:54 PM

Or the clap. "STD" only meant "sexually transmitted disease" when I was younger. Then C++ came along and made it mean something equally unpalatable.

It's amazing, to me, at just how all that templated junk and the overuse of class inheritance totally kills compiler performance.

I took a look at the Unreal Engine when they made it cheaply available a year or so back.

There's definitely some powerful stuff in there ... but OMG, it was so slow to compile!

It took my old quad-core PC nearly 60 minutes to do a clean compile of their codebase.

It takes the same machine approx 3 minutes to do a clean compile of my last X360 game.

***********************

Quote from: flame on September 29, 2015, 02:28:49 PM

I worked on Nayuta which used a little more complicated version of what you're calling FALCOM2. Trails in the Sky 3 PC was using straight FALCOM2.

Welcome!

Congratulations on getting those games decompressing, it's interesting to hear about the progression of Falcom's compression over the years.

Are those translations released, yet?

Quote

Beginning, middle and end: My programs tend to follow: input, data processing, output. Is that what's meant?

I'm going to guess that Dave is talking about classic "procedural" or "functional" programming.

That's where you can generally look at the source code and see the flow of the program execution.

Once you get into too deep into some of the tricks that "object-oriented" programming makes look simple, like event handlers, and lists/trees/etc of objects running update/collision/etc methods, then things quickly become hard to follow, and harder to debug.

When you add in multiple threads on modern systems, then the complexity ramps up by orders-of-magnitude.

Things very quickly get to the point that only the person/people that originally wrote it have any chance to debugging it. And as modern games show, even they can't catch a lot of the problems.

NightWolve · « **Reply #51 on:** September 30, 2015, 06:59:28 AM »

Quote from: flame on September 29, 2015, 02:28:49 PM

I worked on Nayuta which used a little more complicated version of what you're calling FALCOM2. Trails in the Sky 3 PC was using straight FALCOM2.

That's interesting, so they went back to their LZSS stuff and didn't use the general zlib like they did for Ys VI/Felghana/Origin ? Huh. The nice thing with zlib as a standard was you grab the public DLL and you've got both encode/decode functions, you wouldn't just be looking at their decode function in x86 ASM and then have to study it enough to reverse it...

Another thing that Falcom are bastards for was their decision after Ys II Compete to begin splitting the base script file into over a thousand! It varied between 1300 to 1600 text files when it came to Ys VI/Felghana/Origin/etc. ED6 FC used about 600 as I recall. But yeah, it seemed a nightmare at first given you'd have to rebuild all of them and it took me a while to come up with a clever solution to handle it. Which then was not so much clever seemingly as it was obvious, just that I'm a slow learner. Heh. I paid a great many penalties in lost time just starring at the screen.

elmer · « **Reply #52 on:** September 30, 2015, 08:05:03 AM »

Quote from: NightWolve on September 30, 2015, 06:59:28 AM

But yeah, it seemed a nightmare at first given you'd have to rebuild all of them and it took me a while to come up with a clever solution to handle it.

I'd be curious to see a snippet from one of those script files, if you'd like to post one.

***********************

I'm beginning the process of dumping the Xanadu 2 scripts now, and it's already showing that the script really is a fully-fledged programming language.

Much to my disappointment, they're also interleaving "script" code with "assembler" code.

That means that I may have to add a complete HuC6280 assembler/disassembler into the translation tool!

At this point I'm very curious as to how Falcom originally wrote this whole thing.

I'd guess that it was all done with a good macro-assembler ... but if so, it was either a custom-developed one in order to deal with their text-encoding system, or they did a lot of cut-n-paste with both SJIS and "encoded" text string in the same file.

Anyway, here's just a small section of the very first script chunk in the game ...

$a6a3 .scriptA6A3: $a6a3 _enable_8x12_font() $a6a5 _set_pen_then_call_then_eol( orange, .scriptAE05 ) $a6a9 _disable_8x12_font() $a6ab _tst_2b03_x_bnz( $01, $02, .scriptA877 ) $a6b0 _tst_2b03_x_bnz( $01, $10, .scriptA748 ) $a6b5 _tst_2b03_x_bnz( $01, $20, .scriptA71B ) $a6ba _tst_2b03_x_beq( $20, $01, .scriptA8AB ) $a6bf _tst_2b03_x_beq( $20, $02, .scriptA8AB ) $a6c4 _tst_2b03_x_beq( $20, $04, .scriptA8AB ) $a6c9 _tst_2b03_x_beq( $20, $08, .scriptA8AB ) $a6ce {アリオスさま、準備が整ったようですな。そういえば、航海長が} $a6f1 _eol() $a6f2 {お話があるとのことです。} $a6ff _wait_for_keypress_then_clear() $a700 {後部甲板に行ってみてはいかがですか？} $a717 _set_bits_2b03_x( $01, $20 ) $a71a _wait_for_keypress_then_end()

NightWolve · « **Reply #53 on:** September 30, 2015, 08:54:38 AM »

Quote from: elmer on September 30, 2015, 08:05:03 AM

Quote from: NightWolve on September 30, 2015, 06:59:28 AM
But yeah, it seemed a nightmare at first given you'd have to rebuild all of them and it took me a while to come up with a clever solution to handle it.
I'd be curious to see a snippet from one of those script files, if you'd like to post one.

Sure, let's pick one from Felghana. So the script was expanded out to 1,766 .XSO files when they used to use 1 or 2 files for Ys I & II Complete... Not every XSO has S-JIS text in it though, so you can eliminate a hundred or so that don't have it. But yeah, I always wondered why they did that, if it was intentional to possibly make the job of fan translation tougher...

So that's after it was ZLIB decoded/decompressed - the files had a .Z extension if compressed.

Here's the whole file:

http://www.mediafire.com/download/hqzh8xd9hedzmdu/TALKRANDOLF.XSO

I dunno if you have a hex editor with S-JIS-to-Unicode mapping to allowing easy viewing so I took that little snapshot.

So, I took the easy route here in the aftermath, just scanned for S-JIS lead byte and 2nd byte pairs and loaded that as a string till null, repeat, etc. I escaped having to rebuild any of these files since I came up with the idea of intercepting the print function to crunch the current Japanese string to a CRC32, take that as a 4 byte index and match it to these but return the English replacement I would have next to it in a database record, etc.

So in the database, you'd store the FileID, Offset, CRC32, Japanese string, and English string (after your translator did his/her job), etc. and then output that data as arrays in a "C" header file for compilation/usage. One array for the CRC32 and another for the English string, both sorted by CRC32. So when you search for a CRC32 based on what the print function was about to do, the index you find it at is the same index that'll fetch the English string in the other array. Blah blah, you get the idea.

It was pretty cool how it all worked out like a charm. I wondered if there'd be a detectable slowdown in implementing this though, but you couldn't tell the difference in the slightest bit! I did sort the CRC32, English String pairs by the CRC32 so I could use binary search instead of linear 1-to-n max iteration searches as a novice would do, but I'd bet even if I cheaped out and did a basic for loop for a linear search, I still wouldn't have noticed any difference because something like that with only 4,000 to 6,000 4-byte elements shouldn't have been much of a big deal.

I DID cause a detectable slowdown in another area though for image replacement, which I never got to correct in a released patch! But that's another paragraph or so in your thread.

elmer · « **Reply #54 on:** October 01, 2015, 05:29:05 AM »

Quote from: NightWolve on September 30, 2015, 08:54:38 AM

But yeah, I always wondered why they did that, if it was intentional to possibly make the job of fan translation tougher...

Hahaha ... they could care less!

They'll have done it for their own reasons, because it made sense at the time.

Probably so that they could have multiple designers working on different parts of the game at the same time.

Quote

I dunno if you have a hex editor with S-JIS-to-Unicode mapping to allowing easy viewing so I took that little snapshot.

Thanks, I took a quick look at that .xso file.

So there's a bunch of data (and script code?) at the start, and the whole thing ends with the string data.

The string data consists of a table of offsets to each string, and then the strings themselves in regular C format.

That seems like a very standard sort-of-thing for the 32-bit era when writing the game in "C".

It's nice that all of the string data is right at the end of the file ... that would have made it about as trivial to hack/replace as you can possibly get!

But your DLL hack is a really nice solution that avoids changing too many of the original files and bloating up the size of the patch.

Unfortunately, the Xanadu 2 patch is likely to be another "windows-executable" style patch, since almost all of the game data is going to get re-compressed.

I'm still curious what the PCE YsIV data looked like ... the 16-bit era was when developers were still coming up with "creative" solutions in order to fit things into the limited RAM/ROM.

Bonknuts · « **Reply #55 on:** October 02, 2015, 06:04:19 AM »

Do any of the Xanadu games prime the LZ window/buffer before decompression? I can't remember if it was Dracula X or Gate of thunder, or some other PCECD game, but the game would prime the buffer with a series of values before running the decompression routine. Beginning/leading referencing strings would rely on the presence of these values (not just cleared or zero'd data in the buffer).

elmer · « **Reply #56 on:** October 02, 2015, 09:06:35 AM »

Quote from: Bonknuts on October 02, 2015, 06:04:19 AM

Do any of the Xanadu games prime the LZ window/buffer before decompression?

Not unless I'm totally missing something!

From what I'm seeing, the game loads up a complete 128KB META_BLOCK into RAM, and then when it wants to decompress an 8KB DATA_CHUNK, it maps the appropriate section of the META_BLOCK into $8000-$BFFF, and then decompresses it into $C000-$DFFF.

That memory layout pretty much stops then from using the preload trick.

elmer · « **Reply #57 on:** October 08, 2015, 10:01:10 AM »

It's been a while, so time for an update.

The "script" code seems to be all extracted ... but that's not much use if it can't be modified and replaced.

The problem is that there's a lot of interleaved script code and assembly code ... there's even some bits of "dead" code and script in there!

That makes me absolutely certain that this was all created with a macro-assembler and not a "level editor".

I've written an HuC6280 disassembler and am now running that as part of the script-extraction.

It was actually quite fun to go "old-skool" with that and try to get it as small as possible so that I can have a version of it that runs in-game on the PCE, just like Chris Covell's excellent PCEmon. I think that it should fit into approx 1024 bytes (hopefully less) on a PCE, including instruction cycle counts.

***********************

AFAIK (and I'd love to know if I'm missing some other alternative), there are only 3 basic strategies for changing the text in a translation ...

[uldecimal][li]Just overwrite the existing text and only allow strings the same size or shorter than the original.[/li][li]Change the "pointer" to the string to point to your translated string that's somewhere else.[/li][li]Reassemble the original code/script from "source" with the new translated strings, just like the original developers would have done.[/li][/ul]
Given the lack of free memory in the PCE, I've been thinking that option 3 is probably the best thing to do, especially since it imposes the least limits on the translator.

But the way that Falcom are mixing code and script makes this problematic ... for a start, I've actually got to reverse-engineer the script chunks back into a "source" format that I can either feed to PCEAS, or assemble/compile myself.

That's complicated by not knowing exactly where the code/script/data is in a chunk, and having to try to figure it out from various clues.

***********************

Which leads us on to this example from the very first script chunk.

We've got script that calls an assembly language function, that's next to other code that references a data table, and is followed by yet more script.

That's ugly ... but it's just about OK.

It does mean that I need to output this all in a format that some macro-assembler can handle.

$ad89 _set_pen_then_call_then_eol( orange, .scriptAE17 ) $ad8d {アイアイサー！} $ad94 _call_asm_from_script( .codeADFF ) $ad97 _wait_for_keypress_then_end() ..... $ade6 .codeADE6: $ade6 lda .dataADFA,y $ade9 sta $2700,x $adec lda #$20 $adee jsr $8a63 $adf1 iny $adf2 cpy #$05 $adf4 bcc .codeADE6 $adf6 jsr $7feb $adf9 rts $adfa .dataADFA: $adfa _byte( $08 ) $adfb _byte( $00 ) $adfc _byte( $0a ) $adfd _byte( $00 ) $adfe _byte( $0d ) $adff .codeADFF: $adff lda #$01 $ae01 trb $2c00 $ae04 rts $ae05 .scriptAE05: ; 8x12 font $ae05 {ダイモス} $ae09 _end()

***********************

Next up, here's some old-fashioned self-modifying code with a jump table.

This disassembly has been hand-tweaked, because it's something that I still need to write a specific disassembler-helper function to actually get it into a usable format.

$a442 .codeA442: $a442 lda $26c0,x $a445 asl a $a446 tay $a447 lda .tableA487+0,y $a44a sta .dataA454 $a44d lda .tableA488+1,y $a450 sta .dataA455 $a453 jmp $0000 $a487 .tableA487: $a487 _eptr( .codeA456 ) $a489 _eptr( .codeA460 ) $a48b _eptr( .codeA469 ) $a48d _eptr( .codeA473 )

***********************

So ... there's definitely progress, but it's slow going.

NightWolve · « **Reply #58 on:** October 08, 2015, 12:48:14 PM »

Yeah, games like Xak III had 16-bit pointers, sometimes a bit before the text block, sometimes after, so I would load the array after spotting it, and could then recompute each pointer to pack as much English text back into the text block, so you weren't limited by the original string size, you just had to mind the whole text block size and not go over it. In this way, you pretty much were able to fit accurate translations for every string in the block and not have to trim them to the point where loss of quality had to occur. (At least, that was the experience with Xak III.)

With a compressed text block, it's a whole other beast in how it operates. As far as I know, the way it works is the game code specifies an index based on the string that it wants at the time. So, if it wants the 5th string in the block, it specifies say 4 (if we're starting at 0) and so it keeps decompressing while counting the 0/null terminators, so when you've counted the 4th null terminator, that's the end of string 4, the start of string 5, and then it knows to finish off with that string and stop further decompression into the block. Something like that.

EDIT:

Quote from: elmer on October 01, 2015, 05:29:05 AM

I'm still curious what the PCE YsIV data looked like ... the 16-bit era was when developers were still coming up with "creative" solutions in order to fit things into the limited RAM/ROM.

Oh right, about your Ys IV question, you basically saw it in that image of S-JIS. A decompressed text block was just null-terminated S-JIS text, that's it! No switching tricks with half-width characters, hiragana, etc. and what not. Just all S-JIS all the time... The game that uses switching tricks is Emerald Dragon and David did extensive work to decode it all to where what I and what SamIAm sees is S-JIS which was converted to Unicode for easier viewing on a Windows desktop.

elmer · « **Reply #59 on:** October 09, 2015, 06:33:37 AM »

Quote from: NightWolve on October 08, 2015, 12:48:14 PM

Yeah, games like Xak III had 16-bit pointers, sometimes a bit before the text block, sometimes after, so I would load the array after spotting it, and could then recompute each pointer to pack as much English text back into the text block, so you weren't limited by the original string size, you just had to mind the whole text block size and not go over it.

It's so nice when a developer uses a nice-and-simple scheme like that, it really makes a programmer's life so much easier.

The Zeroigar scripts were basically like that.

Quote

With a compressed text block, it's a whole other beast in how it operates. As far as I know, the way it works is the game code specifies an index based on the string that it wants at the time.

Now that's just plain slow and fugly!

I've not seen that trick done before.

I can just-about imagine that being used for a HuCard game on the PCE (becuase of it's limited memory), but it's horrible!

At least it should be fairly easy to translate since you've only got to worry about overall size of the complete block of compressed data.

Quote

Oh right, about your Ys IV question, you basically saw it in that image of S-JIS. A decompressed text block was just null-terminated S-JIS text, that's it! No switching tricks with half-width characters, hiragana, etc. and what not. Just all S-JIS all the time...

That was nice of them.

Quote

The game that uses switching tricks is Emerald Dragon and David did extensive work to decode it all to where what I and what SamIAm sees is S-JIS which was converted to Unicode for easier viewing on a Windows desktop.

Haha ... yes, you definitely want to hide the behind-the-scenes lunacy away from the poor translator.

Xanadu 2 uses a byte-to-sjis conversion table ... actually 2 of them, 1 for 12x12 glyphs and 1 for 8x12 glyphs.

***********************

Anyway ... back to Xanadu 2.

There are 8 really large script-chunks that I've been concerned about, because they're nearly 8KB big, but only seemed to contain about 1KB of "script".

That immediately made me concerned that I was missing something important.

Now that I've finally been able to disassemble the whole chunk, it turns out that I wasn't missing much, and that there really is a lot of (ugly) code in those particular chunks.

They're the ones that handle the 8 different Weapon Shops in the game.

The good news is that this all means that it's time to write the insertion tools and start testing a chunk with some real translated text.

Author Topic: Xanadu II Translation Development Blog (Read 47477 times)

elmer

Re: Xanadu II Translation Development Blog

TailChao

Re: Xanadu II Translation Development Blog

dshadoff

Re: Xanadu II Translation Development Blog

NightWolve

Re: Xanadu II Translation Development Blog

flame

Re: Xanadu II Translation Development Blog

elmer

Re: Xanadu II Translation Development Blog

NightWolve

Re: Xanadu II Translation Development Blog

elmer

Re: Xanadu II Translation Development Blog

NightWolve

Re: Xanadu II Translation Development Blog

elmer

Re: Xanadu II Translation Development Blog

Bonknuts

Re: Xanadu II Translation Development Blog

elmer

Re: Xanadu II Translation Development Blog

elmer

Re: Xanadu II Translation Development Blog

NightWolve

Re: Xanadu II Translation Development Blog

elmer

Re: Xanadu II Translation Development Blog