Author Topic: PC-FX homebrew development.  (Read 17472 times)

elmer

  • Hero Member
  • *****
  • Posts: 2153
Re: PC-FX homebrew development.
« Reply #210 on: March 22, 2016, 03:04:44 PM »
Using eris_low_scsi_status() freezes the system. Every. Single. Time. No matter what's going on... even if nothing's going on.

All that I can say is that Alex's SCSI and SCSI_DMA examples seem to work ... well, they do in Mednafen, I've not tried them on my PC-FXGA.

That suggests that he's got the basics right ... but he may not have provided all the functions that you need (i.e. a REQUEST SENSE function).


On my side of things ... GCC is now correctly passing the varargs to "sprintf" ... where it all just dies.  ](*,)

So, it turns out that I'd fixed the varargs stack allocation, but I still had an error in the code-generation that actually saved the varargs themselves to the stack.  #-o

With that fixed, then modifying Alex's "hello" example's printing code to ...

        printstr("Hello World!", 10, 0x20, 1);
        printstr("Love, NEC", 11, 0x38, 0);
        i = sprintf(str, "Eat %X!", 0xdeadbeef);
        printstr(str, ((32 - i) / 2), 0x48, 0);

... gives ...




That's using "strlen" and "sprintf" ... and "sprintf" requires a "malloc", so we've got functional memory-allocations, too!  :D

It's "Miller Time" ... but with something that actually has some flavor, instead.  :wink:

nodtveidt

  • Guest
Re: PC-FX homebrew development.
« Reply #211 on: March 22, 2016, 03:43:56 PM »
Yes, they work... but unfortunately, they explain nada. He never once uses eris_low_scsi_status() in his examples, although I am seeing that the function is being called in the assembly source in several other functions.

I am not sure what eris_low_scsi_data_in() does. Perhaps this is how to get the return values from things like REQUEST SENSE. Of course, the docs offer absolutely no context... as usual. Oh well... only one way to find out, I guess.

That's using "strlen" and "sprintf" ... and "sprintf" requires a "malloc", so we've got functional memory-allocations, too!  :D

It's "Miller Time" ... but with something that actually has some flavor, instead.  :wink:
Haha :D well that's awesome... good string functions are always nice to have. Hopefully this doesn't introduce too much overhead. And root beer, please... :lol:

EDIT: Because REQUEST_SENSE requires me to use eris_low_scsi_command(), of course actually using it crashes the machine after I've already sent 0x48. Something is clearly jamming up SCSI when I tell it to play an audio track.

EDIT2: ...and if I sent a REQUEST_SENSE before I send the audio play command, it *also* jams up the system. What. The. f*ck.

EDIT3: I commented out the eris_low_scsi_data_in() line... which was returning 0 anyway so I am pretty sure that this is *not* how to get return results. The system crashes when I send a second SCSI command. I'm no expert here but I am positive at this point that this is exactly where the flaw is. Something to do with using eris_low_scsi_command() from within the C code is the culprit. Using eris_low_scsi_reset() does, of course, reset the SCSI subsystem so additional commands can be sent... but I am 99.9999% positive that this is not the way you're supposed to have to do things. The only other thing I can think of is something to do with the SCSI phase... although it appears that eris_low_scsi_command() is already waiting for the correct phase... AAAAAAAAAAAAAA
« Last Edit: March 22, 2016, 04:04:54 PM by The Old Rover »

elmer

  • Hero Member
  • *****
  • Posts: 2153
Re: PC-FX homebrew development.
« Reply #212 on: March 22, 2016, 05:09:27 PM »
Yes, they work... but unfortunately, they explain nada. He never once uses eris_low_scsi_status() in his examples, although I am seeing that the function is being called in the assembly source in several other functions.

I just had a quick look, and the impression that I'm getting is that the "eris_low_*" functions really aren't supposed to be called by themselves from C.

It's hard to tell because the assembly code is very hard-to-read because he hasn't bothered to use constants, or sensible labels, or to actually add comments to explain what's going on.

I can see that there's going to have to be some major re-writing going on.

Arkhan

  • Hero Member
  • *****
  • Posts: 14142
  • Fuck Elmer.
    • Incessant Negativity Software
Re: PC-FX homebrew development.
« Reply #213 on: March 22, 2016, 10:16:31 PM »
This reminds me, that I really need to make some sort of effort to setup that PCFXGA crap.

lol.

[Fri 19:34]<nectarsis> been wanting to try that one for awhile now Ope
[Fri 19:33]<Opethian> l;ol huge dong

I'm a max level Forum Warrior.  I'm immortal.
If you're not ready to defend your claims, don't post em.

nodtveidt

  • Guest
Re: PC-FX homebrew development.
« Reply #214 on: March 23, 2016, 12:26:16 AM »
I can see that there's going to have to be some major re-writing going on.
I'll take the plunge and attempt to learn V810 assembly better then... looks like we've got a massive project on our hands here and it's gonna take some combined brainpower.

elmer

  • Hero Member
  • *****
  • Posts: 2153
Re: PC-FX homebrew development.
« Reply #215 on: March 23, 2016, 03:31:06 AM »
This reminds me, that I really need to make some sort of effort to setup that PCFXGA crap.

Bueller? Bueller? Bueller?


I'll take the plunge and attempt to learn V810 assembly better then... looks like we've got a massive project on our hands here and it's gonna take some combined brainpower.

To be fair to Alex ... I suspect that at-least-some of the reason for his sparse assembly-code was just working with the old GAS assembler from binutils-2.10.

The GNU folks put a lot of work into making GAS more "programmer-friendly" over the last 15 years.

If you've programmed any CPU in assembler before, then I think that you'll find it really pleasant, especially if you've tried to read other early-RISC assembly, like MIPS or SH2.

The big thing that initially seems "weird" to folks that are used to 6502/Z80/68000/x86 is that there are no addressing-modes in the instructions. Everything is just register-to-register, and you have to load/save registers to memory explicitly.

It makes-up for that by having lots of registers, so that you really don't need to load/save "temporary" stuff inside a function very often. And the "big-win" is that most instructions run in 1-cycle (effective throughput, with a 5-cycle pipeline) .

With the CPU running at 21Mhz, with mostly-single-cycle instructions, it's one-heck-of-a-lot faster than the 7MHz HuC6280 with it's 2-to-7-cycle instructions.

Well ... until you hit a pipeline-stall, anyway. The docs from the Nintendo Seminar that I sent you do a good job of explaining the basic theory, and the (few) pipeline-stall conditions.

I already posted an example of what my V810 assembly-code looks like, taking advantage of running the source-code through the C-preprocessor to provide "macro" capability to the GNU assembler ...

http://www.pcenginefx.com/forums/index.php?topic=19619.msg421022#msg421022

elmer

  • Hero Member
  • *****
  • Posts: 2153
Re: PC-FX homebrew development.
« Reply #216 on: March 27, 2016, 05:44:35 AM »
FWIW, I'd forgotten what it was like to program in C on some of these old architectures.

GCC on the PC-FX really, really, really doesn't like working with 8-bit or 16-bit local/global/struct variables.

You're much, much, much better off declaring everything as an "int" or "unsigned", or "int32_t" and "uint32_t" if you care about the sizes.

This is one of those cases where "portable" C code ... isn't. It'll cripple your performance.  #-o

nodtveidt

  • Guest
Re: PC-FX homebrew development.
« Reply #217 on: March 27, 2016, 08:13:44 AM »
In cycle-critical applications, such as a really busy action game, a good coder knows that to get top performance, you use the most CPU-efficient variable type and you give not one shit about how much space it takes up in memory. If using ints is the fastest, then you use ints... bottom line. :)

elmer

  • Hero Member
  • *****
  • Posts: 2153
Re: PC-FX homebrew development.
« Reply #218 on: March 30, 2016, 12:37:19 PM »
I took a quick look at the VirtualBoy's "libgccvb" source code, and was surprised to see so many uses of "u8" and "u16" in the code.

The V810 CPU was designed to handle 32-bit variables ... and it doesn't do any arithmetic operations on 16-bit or 8-bit values.
That means that the compiler needs to do a lot of masking/sign-extending when it's asked to deal with 16-bit or 8-bit variables, just so that it keeps the results correct within the limits of 16-bit or 8-bit rounding.

You really should be using "int" and "unsigned" as much as possible, and avoid "short" and "char" variables.

I thought that it would be interesting to see how the different GCC compiler versions compile a couple of simple C functions.

In each case, the original libgccvb version is first, and then 1 or 2 versions replacing the "u16" and "u8" variables with "unsigned" instead.

It seems strange to me that GCC 4.4.2 is doing such a relatively-poor job compared to GCC 2.9.5 or GCC 4.7.4, I wonder what went wrong?

All examples are compiled with "-O2 -fomit-frame-pointer".


Code: [Select]
****************************************************************************************
****************************************************************************************

void copymem (u8* dest, const u8* src, u16 num)
{
  u16 i;
  for (i = 0; i < num; i++) {
    *dest++ = *src++;
  }
}

********* GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********

_copymem: andi 65535,r8,r8    _copymem: andi 65535,r8,r8    _copymem: andi 65535,r8,r8
          be .L1                        mov 0,r10                     be .L4
          addi -1,r8,r11                cmp r8,r10                    mov 0,r10
          andi 65535,r11,r11            bnl .L4             .L3:      mov r7,r11
          add 1,r11           .L6:      add 1,r10                     add r10,r11
          add r6,r11                    ld.b 0[r7],r11                ld.b 0[r11],r12
.L3:      ld.b 0[r7],r10                andi 65535,r10,r10            mov r6,r11
          add 1,r7                      add 1,r7                      add r10,r11
          st.b r10,0[r6]                st.b r11,0[r6]                add 1,r10
          add 1,r6                      add 1,r6                      st.b r12,0[r11]
          cmp r11,r6                    cmp r8,r10                    andi 65535,r10,r11
          bne .L3                       bl .L6                        cmp r11,r8
.L1:      jmp [r31]           .L4:      jmp [r31]                     bh .L3
                                                            .L4:      jmp [r31]

********* GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********


****************************************************************************************
****************************************************************************************

void copymem2 (u8* dest, const u8* src, unsigned num)
{
  unsigned i;
  for (i = 0; i < num; i++) {
    *dest++ = *src++;
  }
}

********* GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********

_copymem2:mov r6,r11          _copymem2:mov 0,r11           _copymem2:cmp r0,r8
          add r8,r11                    cmp r8,r11                    be .L10
          cmp 0,r8                      bnl .L10                      mov 0,r10
          be .L7              .L12:     ld.b 0[r7],r10      .L9:      mov r7,r11
.L11:     ld.b 0[r7],r10                add 1,r11                     add r10,r11
          add 1,r7                      add 1,r7                      ld.b 0[r11],r12
          st.b r10,0[r6]                st.b r10,0[r6]                mov r6,r11
          add 1,r6                      add 1,r6                      add r10,r11
          cmp r11,r6                    cmp r8,r11                    st.b r12,0[r11]
          bne .L11                      bl .L12                       add 1,r10
.L7:      jmp [r31]           .L10:     jmp [r31]                     cmp r10,r8
                                                                      bh .L9
                                                            .L10:     jmp [r31]

********* GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********


****************************************************************************************
****************************************************************************************

void addmem (u8* dest, const u8* src, u16 num, u8 offset)
{
  u16 i;
  for (i = 0; i < num; i++) {
    *dest++ = (*src++ + offset);
  }
}

********* GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********

_addmem:  andi 65535,r8,r8    _addmem:  andi 65535,r8,r8    _addmem:  andi 65535,r8,r8
          andi 255,r9,r9                mov 0,r11                     andi 255,r9,r9
          cmp 0,r8                      andi 255,r9,r9                cmp r0,r8
          be .L13                       cmp r8,r11                    be .L20
          addi -1,r8,r11                bnl .L22                      mov 0,r10
          andi 65535,r11,r11  .L24:     mov r9,r10          .L19:     mov r7,r11
          add 1,r11                     add 1,r11                     add r10,r11
          add r6,r11                    ld.b 0[r7],r12                ld.b 0[r11],r12
.L15:     ld.b 0[r7],r10                andi 65535,r11,r11            mov r6,r11
          add 1,r7                      add r12,r10                   add r10,r11
          add r9,r10                    add 1,r7                      add r9,r12
          st.b r10,0[r6]                st.b r10,0[r6]                add 1,r10
          add 1,r6                      add 1,r6                      st.b r12,0[r11]
          cmp r11,r6                    cmp r8,r11                    andi 65535,r10,r11
          bne .L15                      bl .L24                       cmp r11,r8
.L13:     jmp [r31]           .L22:     jmp [r31]                     bh .L19
                                                            .L20:     jmp [r31]

********* GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********


****************************************************************************************
****************************************************************************************

void addmem2 (u8* dest, const u8* src, unsigned num, u8 offset)
{
  unsigned i;
  for (i = 0; i < num; i++) {
    *dest++ = (*src++ + offset);
  }
}

********* GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********

_addmem2: mov r6,r11          _addmem2: mov 0,r12           _addmem2: andi 255,r9,r9
          andi 255,r9,r9                andi 255,r9,r9                cmp r0,r8
          add r8,r11                    cmp r8,r12                    be .L20
          cmp 0,r8                      bnl .L22                      mov 0,r10
          be .L18             .L24:     mov r9,r10          .L19:     mov r7,r11
.L22:     ld.b 0[r7],r10                ld.b 0[r7],r11                add r10,r11
          add 1,r7                      add 1,r12                     ld.b 0[r11],r12
          add r9,r10                    add r11,r10                   mov r6,r11
          st.b r10,0[r6]                add 1,r7                      add r10,r11
          add 1,r6                      st.b r10,0[r6]                add r9,r12
          cmp r11,r6                    add 1,r6                      st.b r12,0[r11]
          bne .L22                      cmp r8,r12                    add 1,r10
.L18:     jmp [r31]                     bl .L24                       cmp r10,r8
                              .L22:     jmp [r31]                     bh .L19
                                                            .L20:     jmp [r31]

********* GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********


****************************************************************************************
****************************************************************************************

void addmem3 (u8* dest, const u8* src, unsigned num, unsigned offset)
{
  unsigned i;
  for (i = 0; i < num; i++) {
    *dest++ = (*src++ + offset);
  }
}

********* GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********

_addmem3: cmp 0,r8            _addmem3: mov 0,r12           _addmem3: cmp r0,r8
          be .L24                       cmp r8,r12                    be .L25
          andi 255,r9,r9                bnl .L28                      andi 255,r9,r9
          add r6,r8           .L30:     mov r9,r10                    mov 0,r10
.L26:     ld.b 0[r7],r10                ld.b 0[r7],r11      .L24:     mov r7,r11
          add 1,r7                      add 1,r12                     add r10,r11
          add r9,r10                    add r11,r10                   ld.b 0[r11],r12
          st.b r10,0[r6]                add 1,r7                      mov r6,r11
          add 1,r6                      st.b r10,0[r6]                add r10,r11
          cmp r8,r6                     add 1,r6                      add r9,r12
          bne .L26                      cmp r8,r12                    st.b r12,0[r11]
.L24:     jmp [r31]                     bl .L30                       add 1,r10
                              .L28:     jmp [r31]                     cmp r10,r8
                                                                      bh .L24
                                                            .L25:     jmp [r31]

********* GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********


****************************************************************************************
****************************************************************************************
« Last Edit: March 30, 2016, 12:40:36 PM by elmer »

elmer

  • Hero Member
  • *****
  • Posts: 2153
Re: PC-FX homebrew development.
« Reply #219 on: April 01, 2016, 06:56:21 AM »
I think that I have figured-out how to let GCC know that "ld" instruction sign-extends variables into an int.

Here are a coupe of examples of how it effects the code with newlib's "strlen" function, and then some variations on it.

The variations show how the generated code changes when things get a little bit more complex when modifying "strlen" to change the comparison so that the compiler can't just short-cut the check for zero.

The thing to pay particular attention to is the number of instructions in the inner loop.

It shows, again, that if you choose to use C on a processor like the V810, then there are definitely tricks to know that will improve the code-generation.

Code: [Select]
****************************************************************************************
****************************************************************************************

ORIGINAL FUNCTION FROM NEWLIB 2.2.0

size_t strlen (const char *str)
{
  const char *start = str;
  while (*str)
    str++;
  return str - start;
}

********* GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********

_strlen:  ld.b 0[r6],r10      _strlen:  ld.b 0[r6],r10      _strlen:  ld.b 0[r6],r10
          cmp 0,r10                     mov r6,r11                    shl 24,r10
          be .L42                       cmp r0,r10                    sar 24,r10
          mov r6,r10                    be .L46                       be .L39
.L41:     add 1,r10           .L47:     add 1,r6                      mov r6,r10
          ld.b 0[r10],r11               ld.b 0[r6],r10      .L40:     add 1,r10
          cmp 0,r11                     cmp r0,r10                    ld.b 0[r10],r11
          bne .L41                      bne .L47                      shl 24,r11
          sub r6,r10          .L46:     mov r6,r10                    bne .L40
          jmp [r31]                     sub r11,r10                   sub r6,r10
.L42:     mov 0,r10                     jmp [r31]           .L39:     jmp [r31]
          jmp [r31]


****************************************************************************************
****************************************************************************************

MARK THE END-OF-STRING WITH A NON-ZERO CONSTANT

size_t strlen2 (const char *str)
{
  const char *start = str;
  while (*str != 1)
    str++;
  return str - start;
}

********* GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********

_strlen2: ld.b 0[r6],r10      _strlen2: ld.b 0[r6],r10      _strlen2: ld.b 0[r6],r11
          cmp 1,r10                     mov r6,r11                    shl 24,r11
          be .L47                       cmp 1,r10                     sar 24,r11
          mov r6,r10                    be .L51                       cmp 1,r11
.L46:     add 1,r10           .L52:     add 1,r6                      be .L49
          ld.b 0[r10],r11               ld.b 0[r6],r10                mov r6,r10
          cmp 1,r11                     cmp 1,r10           .L46:     add 1,r10
          bne .L46                      bne .L52                      ld.b 0[r10],r11
          sub r6,r10          .L51:     mov r6,r10                    shl 24,r11
          jmp [r31]                     sub r11,r10                   sar 24,r11
.L47:     mov 0,r10                     jmp [r31]                     cmp 1,r11
          jmp [r31]                                                   bne .L46
                                                                      sub r6,r10
                                                                      jmp [r31]
                                                            .L49:     mov 0,r10
                                                                      jmp [r31]


****************************************************************************************
****************************************************************************************

PASS THE END-OF-STRING MARKER IN AS A "char" PARAMETER

int strlen3 (const char *str, char eos)
{
  const char *start = str;
  while (*str != eos)
    str++;
  return str - start;
}

********* GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********

_strlen3: shl 24,r7           _strlen3: shl 24,r7           _strlen3: ld.b 0[r6],r10
          sar 24,r7                     sar 24,r7                     shl 24,r7
          ld.b 0[r6],r10                ld.b 0[r6],r10                mov r7,r12
          cmp r7,r10                    mov r6,r11                    shl 24,r10
          be .L52                       cmp r7,r10                    sar 24,r12
          mov r6,r10                    be .L56                       cmp r7,r10
.L51:     add 1,r10           .L57:     add 1,r6                      be .L56
          ld.b 0[r10],r11               ld.b 0[r6],r10                mov r6,r10
          cmp r7,r11                    cmp r7,r10          .L53:     add 1,r10
          bne .L51                      bne .L57                      ld.b 0[r10],r11
          sub r6,r10          .L56:     mov r6,r10                    shl 24,r11
          jmp [r31]                     sub r11,r10                   sar 24,r11
.L52:     mov 0,r10                     jmp [r31]                     cmp r12,r11
          jmp [r31]                                                   bne .L53
                                                                      sub r6,r10
                                                                      jmp [r31]
                                                            .L56:     mov 0,r10
                                                                      jmp [r31]


****************************************************************************************
****************************************************************************************

PASS THE END-OF-STRING MARKER IN AS AN "int" PARAMETER

int strlen4 (const char *str, int eos)
{
  const char *start = str;
  while (*str != eos)
    str++;
  return str - start;
}

********* GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********

_strlen4: ld.b 0[r6],r10      _strlen4: ld.b 0[r6],r10      _strlen4: ld.b 0[r6],r10
          cmp r7,r10                    mov r6,r12                    shl 24,r10
          be .L57                       cmp r7,r10                    sar 24,r10
          mov r6,r10                    be .L61                       cmp r7,r10
.L56:     add 1,r10           .L62:     add 1,r6                      be .L63
          ld.b 0[r10],r11               ld.b 0[r6],r10                mov r6,r10
          cmp r7,r11                    mov r10,r11         .L60:     add 1,r10
          bne .L56                      cmp r7,r11                    ld.b 0[r10],r11
          sub r6,r10                    bne .L62                      shl 24,r11
          jmp [r31]           .L61:     mov r6,r10                    sar 24,r11
.L57:     mov 0,r10                     sub r12,r10                   cmp r7,r11
          jmp [r31]                     jmp [r31]                     bne .L60
                                                                      sub r6,r10
                                                                      jmp [r31]
                                                            .L63:     mov 0,r10
                                                                      jmp [r31]


****************************************************************************************
****************************************************************************************

elmer

  • Hero Member
  • *****
  • Posts: 2153
Re: PC-FX homebrew development.
« Reply #220 on: April 08, 2016, 06:54:14 AM »
Just a quick (technical) update on the PC-FX toolchain ...


The Good:

A new stack-frame layout is implemented, and R2 is now the permanent-frame-pointer instead of the compiler just using R29 whenever a frame-pointer is needed.

*****************************

GCC 1999-ABI V850 STACK FRAME (old PC-FX GCC 2.9.5 compiler)

CALLER
          incoming-arg4
ap->      16-bytes-reserved

CALLEE
          saved-lp
          saved-??
fp->      saved-fp
          local-variables
          outgoing-arg?
          outgoing-arg4
sp->      16-bytes-reserved

*****************************

GCC 2016-ABI V810 STACK FRAME (new PC-FX GCC 4.7.4 compiler)

CALLER
fp-> ap-> incoming-arg4

CALLEE
          saved-fp
          saved-lp
          saved-??
          local-variables
          outgoing-arg?
sp->      outgoing-arg4

*****************************


"-mprolog-function" is working, but I've stopped it from being automatically-enabled whenever any optimization is requested.

The new stack frame layout reduces the code-size of the prolog functions so that there's a good chance that they'll stay in the V810's instruction cache more often. Note: the new prolog functions always save the FP and the LP when they're used.

A stack backtrace is now possible when either "-fno-omit-frame-pointer" or "-mprolog-function" is used.

Any C "leaf" functions (i.e. functions that don't call other functions) will omit the prolog function if they don't destroy any callee-saved register, and so small-fast-utility-code will still run as-fast-as-possible.

The NEC-standard register conventions are still the same, except for R2 now being the FP.

Any assembly langauge code that reads arguments off the stack will need to subtract 16 from their offset.


The Bad:

Any C "interrupt-handler" functions are probably broken at the moment, until I get around to fixing them.

Does anyone actually write interrupt-handlers in C???

The compiler generates some pretty slow register-saving code for them, so I sort-of assume that folks just write then in assembly. Am I wrong?


The Future: (long term - i.e. not until Xanadu is finished)

I'd like to add a few compiler intrinsics for some of the V810 opcodes, particularly the string opcodes and the in/out opcodes. That would allow the compiler to easily in-line some stuff that people have to drop into assembly to do.

It would also be a thought to contemplate changing the standard register usage so that R26-R29 are not callee-saved registers, and so avoid the compiler from having to save them on the stack whenever someone wants to use a string opcode. But doing so would break all current assembly-language code, and I suspect that people wouldn't want that. "Yes", the change in stack-offset in the new ABI also breaks things ... but that's an easy thing to find/fix. Changing ALL the registers would be a much more complicated thing to fix.
« Last Edit: April 09, 2016, 04:38:17 AM by elmer »

elmer

  • Hero Member
  • *****
  • Posts: 2153
Re: PC-FX homebrew development.
« Reply #221 on: April 10, 2016, 11:44:37 AM »
I fixed the "interrupt_handler" to where it's working again, although I'm not using the helper-functions anymore, because I really can't see the point.

I could make the code a tiny bit smarter ... but IMHO it's already a little bit better than GCC's V850 code, so any further work on it can wait.

Now ... this is it for me for a while on the PC-FX, or else I'll be in trouble!  :wink:

Code: [Select]
************************************

volatile int __attribute__ ((zda)) zda_frame_count = 0;

__attribute__ ((interrupt_handler)) void my_irq1 (void)
{
  for (int i = 0; i < 100; i++)
    zda_frame_count++;
}

_my_irq1: add -4,sp
          st.w r1,0[sp]
          add -8,sp
          st.w r10,0[sp]
          movea 100,r0,r10
          st.w r11,4[sp]
.L7:      ld.w zdaoff(_zda_frame_count)[r0],r11
          add -1,r10
          add 1,r11
          st.w r11,zdaoff(_zda_frame_count)[r0]
          cmp 0,r10
          bne .L7
          ld.w 0[sp],r10
          ld.w 4[sp],r11
          add 8,sp
          ld.w 0[sp],r1
          add 4,sp
          reti

************************************

volatile int sda_frame_count = 0;

__attribute__ ((noinline)) void increment_sda_frame_count (void)
{
  sda_frame_count++;
}

__attribute__ ((interrupt_handler)) void my_irq2 (void)
{
  for (int i = 0; i < 100; i++)
    increment_sda_frame_count();
}

_increment_sda_frame_count:
          ld.w sdaoff(_sda_frame_count)[gp],r10
          add 1,r10
          st.w r10,sdaoff(_sda_frame_count)[gp]
          jmp [r31]

_my_irq2: add -4,sp
          st.w r1,0[sp]
          mov sp,r1
          addi -72,sp,sp
          st.w r29,-12[r1]
          st.w fp,-4[r1]
          movea 100,r0,r29
          mov r1,fp
          st.w r6,-72[r1]
          st.w r7,-68[r1]
          st.w r8,-64[r1]
          st.w r9,-60[r1]
          st.w r10,-56[r1]
          st.w r11,-52[r1]
          st.w r12,-48[r1]
          st.w r13,-44[r1]
          st.w r14,-40[r1]
          st.w r15,-36[r1]
          st.w r16,-32[r1]
          st.w r17,-28[r1]
          st.w r18,-24[r1]
          st.w r19,-20[r1]
          st.w r30,-16[r1]
          st.w lp,-8[r1]
.L3:      add -1,r29
          jal _increment_sda_frame_count
          cmp 0,r29
          bne .L3
          ld.w -4[fp],r1
          ld.w -72[fp],r6
          ld.w -68[fp],r7
          ld.w -64[fp],r8
          ld.w -60[fp],r9
          ld.w -56[fp],r10
          ld.w -52[fp],r11
          ld.w -48[fp],r12
          ld.w -44[fp],r13
          ld.w -40[fp],r14
          ld.w -36[fp],r15
          ld.w -32[fp],r16
          ld.w -28[fp],r17
          ld.w -24[fp],r18
          ld.w -20[fp],r19
          ld.w -16[fp],r30
          ld.w -12[fp],r29
          ld.w -8[fp],lp
          mov fp,sp
          mov r1,fp
          ld.w 0[sp],r1
          add 4,sp
          reti

************************************

NightWolve

  • Hero Member
  • *****
  • Posts: 5277
Re: PC-FX homebrew development.
« Reply #222 on: April 11, 2016, 04:31:20 PM »
To continue on the ADPCM kick, I figure I should add in some more details.
...
So... getting the ADPCM data into your source code... well, this is where having some coding knowledge worked out well for me. I coded a simple utility called any2arr which takes any file you give it and creates a header with the data of that file as a u16 array.

http://www.frozenutopia.com/pcfx/any2arr.7z

Making ADPCM files is also easy... just snag a copy of sox. To make the samples for Asteroid Challenge FX, I used sox like so:

Code: [Select]
sox -r 16000 boom.wav boom.vox
The -r 16000 tells it to use 16kHz, the .wav is obviously my source audio, and the .vox is the output file. sox knows to make an ADPCM file based on the .vox extension. So, just convert your file to a .vox with sox, run the .vox file through any2arr, and you've got your ADPCM data, ready to include into your program.

EDIT: Forgot to mention... since we're using words here, take the filesize of your .vox file and divide it in half to get the length. Take that divided value and add it to your starting address to get the ending address that you need. I think I did mention this briefly in the first post about this but it bears mentioning again, because reasons.

I gained some experience when I switched to SOX (http://sox.sourceforge.net/) for the Ys IV dub work, so wanted to contribute some more to your tangent should it be useful.

We found that extracting ADPCM and converting to wave sometimes introduces a crazy DC offset to the wave which looks like this when you open it with Audacity:



It *should* instead look like this, properly centered:



To fix it, you could 1) open a wave every time in Audacity, select it all (CTRL+A) and use the Normalize effect's "Remove DC offset" without the Amplitude gain option, or 2) add a switch with SOX to handle it/prevent its introduction right on the spot.

The Ys IV Dub kit which I made available way back demos SOX usage in the universal batch files.

http://www.ysutopia.net/downloads/ys4/YS4_DUB_KITv2.zip

Of all the batch files, the YS4_CONVERT_VOX_TO_WAVE.bat has the proper command line to eliminate the DC offset should you notice its appearance in the tracks of whatever game you're extracting.

Code: [Select]
@ECHO This Ys IV batch file needs 2000/XP/Vista/7++ to work.
@ECHO Won't work on old non-NT platforms: Win98/ME, etc.

FOR %%I IN (*.vox) DO sox.exe -r 16000 -e oki-adpcm "%%I" "%%~nI.wav"

REM Append below to the above commandline to eliminate dcshift
REM highpass 10
REM E.g. : sox.exe -r 16000 -e oki-adpcm "%%I" "%%~nI.wav" highpass 10

It's not used by default, and I recommend only use that highpass switch when you check the waves in Audacity and witness a crazy DC shift as shown. So, the trick is you'd simply append "highpass 10" to the command:

Code: [Select]
FOR %%I IN (*.vox) DO sox.exe -r 16000 -e oki-adpcm "%%I" "%%~nI.wav" highpass 10
That above assumes you neatly extracted all the Japanese APDCM clips to .VOX files.

For completion, a universal command to convert new English wave files to VOX would look like this (the counterpart batch command):

Code: [Select]
@ECHO This Ys IV batch file needs 2000/XP/Vista/7++ to work.
@ECHO Won't work on old non-NT platforms: Win98/ME, etc.

FOR %%I IN (*.wav) DO sox.exe -G "%%I" -r 16000 -e oki-adpcm "%%~nI.vox"

Well, that's what I wanted to share.



I also wanted to look into your problem with SCSI issues, but that might get me more involved than I want. I could help given my experience building up TurboRip, but the lack of experience in low level assembly for consoles is the issue.

I'll try though. You were asking about a struct member where status info from the CD drive is returned to determine success/failure. That's the SENSE_DATA structure you want to look at.

Code: [Select]
struct SRB_ExecSCSICmd                   // Offset
{                                        // HX/DEC
    BYTE        SRB_Cmd;                 // 00/000 ASPI command code = SC_EXEC_SCSI_CMD
    BYTE        SRB_Status;              // 01/001 ASPI command status byte
    BYTE        SRB_HaId;                // 02/002 ASPI host adapter number
    BYTE        SRB_Flags;               // 03/003 ASPI request flags
    DWORD       SRB_Hdr_Rsvd;            // 04/004 Reserved
    BYTE        SRB_Target;              // 08/008 Target's SCSI ID
    BYTE        SRB_Lun;                 // 09/009 Target's LUN number
    WORD        SRB_Rsvd1;               // 0A/010 Reserved for Alignment
    DWORD       SRB_BufLen;              // 0C/012 Data Allocation Length
    LPBYTE      SRB_BufPointer;          // 10/016 Data Buffer Pointer
    BYTE        SRB_SenseLen;            // 14/020 Sense Allocation Length
    BYTE        SRB_CDBLen;              // 15/021 CDB Length
    BYTE        SRB_HaStat;              // 16/022 Host Adapter Status
    BYTE        SRB_TargStat;            // 17/023 Target Status
    VOID        FAR *SRB_PostProc;       // 18/024 Post routine
    BYTE        SRB_Rsvd2[20];           // 1C/028 Reserved, MUST = 0
    BYTE        CDBByte[16];             // 30/048 SCSI CDB
    SENSE_DATA_FMT  SenseArea;           // 50/064 Request Sense buffer
};

I don't know how it's defined in your PCFX situation, can't help you there, but it's normally defined as "SenseArea" after the CDB and here are its members:

Code: [Select]
typedef struct _SENSE_DATA_FMT {

    BYTE    ErrorCode;          // Error Code (70H or 71H)
    BYTE    SegmentNum;         // Number of current segment descriptor
    BYTE    SenseKey;           // Sense Key(See bit definitions too)
    BYTE    InfoByte0;          // Information MSB
    BYTE    InfoByte1;          // Information MID
    BYTE    InfoByte2;          // Information MID
    BYTE    InfoByte3;          // Information LSB
    BYTE    AddSenLen;          // Additional Sense Length
    BYTE    ComSpecInf0;        // Command Specific Information MSB
    BYTE    ComSpecInf1;        // Command Specific Information MID
    BYTE    ComSpecInf2;        // Command Specific Information MID
    BYTE    ComSpecInf3;        // Command Specific Information LSB
    BYTE    AddSenseCode;       // Additional Sense Code
    BYTE    AddSenQual;         // Additional Sense Code Qualifier
    BYTE    FieldRepUCode;      // Field Replaceable Unit Code
    BYTE    SenKeySpec15;       // Sense Key Specific 15th byte
    BYTE    SenKeySpec16;       // Sense Key Specific 16th byte
    BYTE    SenKeySpec17;       // Sense Key Specific 17th byte
    BYTE    AddSenseBytes;      // Additional Sense Bytes
BYTE    PaddByte;           // Make it an even DWORD-padded 20-byte structure

} SENSE_DATA_FMT;

For getting errors out of this thing, well, my post would get much longer... I'll wait for more input if you really need help and haven't made further progress on this since your last post. I built a function to convert all SCSI error/status codes to readable strings from the MMC/SCSI-3 docs, and I don't want to paste that in here, but perhaps that's something you'd want/could be used.
« Last Edit: April 12, 2016, 05:46:28 PM by NightWolve »

elmer

  • Hero Member
  • *****
  • Posts: 2153
Re: PC-FX homebrew development.
« Reply #223 on: April 12, 2016, 07:11:53 AM »
I gained some experience when I switched to SOX (http://sox.sourceforge.net/) for the Ys IV dub work, so wanted to contribute some more to your tangent should it be useful.

We found that extracting ADPCM and converting to wave sometimes introduces a crazy DC offset to the wave which looks like this when you open it with Audacity:



Thanks!  :D

As pointed-out in the other thread, this is exactly the problem that I'm having with the Xanadu tracks.

My creaky old brain has finally figured out what's going on, and I don't believe that we should have the same problem on the PC-FX.

That's because the PC-FX is using a later generation of ADPCM chip that supports sample clipping/saturation.

The very-old OKI MSM5205 that the PCE uses didn't support that, and its math ends up wrapping around and causing nasty audio glitches ... there are warnings about it in the OKI manual where they recommend that you only use 80% of the dynamic range in order to avoid problems (i.e. +/-29191 instead of +/-32767).

AFAIK, that's almost-certainly the problem that you're hearing when you're using Dave's tools to convert your stuff to the PCE.

I expect that his stuff is working 100% correctly.

But SOX is written for newer OKI ADPCM chips which do support sample clipping/saturation (like the PC-FX), and so it gets the decoding/encoding math wrong when you're trying to convert sounds for the OKI MSM5205 in the PCE ... and those errors would look exactly like what we're seeing.

I've just checked the SOX source code ... it definitely shouldn't be used to encode/decode a .VOX file for the PCE.


Quote
I also wanted to look into your problem with SCSI issues, but that might get me more involved than I want. I could help given my experience building up TurboRip, but the lack of experience in low level assembly for consoles is the issue.


You're absolutely right about the SCSI SENSE command ... the problem that we're having on the PC-FX is that the SCSI interface is extremely low-level ... it actually looks a lot like Hudson's "fast" CD routines that I've been disassembling and trying to understand.

We're not getting any data back from the SENSE command at all ... which I believe is a liberis problem that's just because Alex never handled anything other than DATA read commands.

It's just another thing to add to the huge list of things to fix.
« Last Edit: April 12, 2016, 07:15:28 AM by elmer »

Mednafen

  • Full Member
  • ***
  • Posts: 140
Re: PC-FX homebrew development.
« Reply #224 on: April 12, 2016, 08:58:31 AM »
PC-FX uses a slight modification of the OKI ADPCM algorithm, so if you were to encode audio as OKI ADPCM, it'll sound noisy and have weird clipping when played back on the PC-FX.  And IIRC, the ADPCM encoder in MPCONV2 is buggy and uses a slightly different encoding algorithm(including a typo'd LUT value) than the PC-FX, so even it will tend to produce noisy ADPCM, particularly on source audio that uses full dynamic range.