Author Topic: The new fork of HuC  (Read 14174 times)

elmer

  • Hero Member
  • *****
  • Posts: 2153
Re: The new fork of HuC
« Reply #120 on: November 12, 2016, 06:34:00 AM »
But I think the 2k method should have been in there to begin with (such a brilliant method).

Dang it, but that's a *really* fast multiply routine!  :shock:

I wish that I'd known about that one back in the 1980s!  #-o

It's such a nice way of dealing with fixed-point numbers, too.

HuCard users wouldn't care about the 2KB of tables, but CD users might prefer the smaller 1.5KB version.

It's also interesting that the bottom 16-bits of a signed 16x16 multiply is *exactly* the same as an unsigned 16x16 multiply.

So there's no need for an "smul" routine if you're just wanting a 16-bit result.


As a noob programmer, would I even notice these being used up? I'm pretty sure HuC's target audience would benefit more from increased performance, than they'd ever notice how you are changing these aspects.

Additionally, most HuC programmers don't set what is in the zero page versus not. I just declare my variables, and they go in ram wherever they go. Maybe more advanced HuC users, such as Arkhan or Cabbage have more input.

There's so much about *how* people are using HuC in practice that I don't know about.

AFAIK, people are avoiding using local variables and parameters to functions as much as they can, and just use global variables instead.

So I *think* that would put all the variables into main RAM and not in fast zero-page.

Actually ... I've just been looking for zero-page usage in HuC, but can't find it.

Does anyone know how you can specify variables in zero-page in HuC?

Bonknuts

  • Hero Member
  • *****
  • Posts: 3292
Re: The new fork of HuC
« Reply #121 on: November 12, 2016, 06:54:35 AM »
Does anyone know how you can specify variables in zero-page in HuC?


Code: [Select]
#asm
   .zp
  var1:  .ds 1
  ptr1:  .ds 2
  tinyarray:  .ds 8
 #endasm

 No need to worry about addresses. Just use the label names. This is what I've always done in HuC for ZP variables. And of course .BSS for non ZP variables.

Sunray

  • Newbie
  • *
  • Posts: 18
Re: The new fork of HuC
« Reply #122 on: November 12, 2016, 08:11:18 PM »
Using local variables isn't as bad in this version if you compile with -fno-recursive -msmall.

Sunray

  • Newbie
  • *
  • Posts: 18
Re: The new fork of HuC
« Reply #123 on: November 12, 2016, 08:52:37 PM »
In my game I don't use any explicit multiplications at all, so personally I don't want the 2k table in my ROM. But there are probably a bunch of implicit ones from array accesses though.

I use "16bitvalue >> 4" a lot though, I haven't looked into optimizing the generated code for that. Is that worth doing?

elmer

  • Hero Member
  • *****
  • Posts: 2153
Re: The new fork of HuC
« Reply #124 on: November 13, 2016, 04:04:22 AM »
Code: [Select]
#asm
   .zp
  var1:  .ds 1
  ptr1:  .ds 2
  tinyarray:  .ds 8
 #endasm

 No need to worry about addresses. Just use the label names. This is what I've always done in HuC for ZP variables. And of course .BSS for non ZP variables.

Hahaha ... Thanks, that makes sense!  :)

I keep on forgetting that HuC is really just a pre-processor for PCEAS, and that you can just drop down into assembly like that.


Using local variables isn't as bad in this version if you compile with -fno-recursive -msmall.

Yep, that's right, "-fno-recursive" just makes every local variable into a global variable, and so they go through the existing semi-fast processing for globals.

Then "-msmall" just drops the high-byte adjustment of the stack pointer, leading to shorter and faster stack code.

But stack usage is still *slow*, and it's used *constantly* for intermediate results and so it would benefit from being faster, even if you don't use true local variables or parameters.


In my game I don't use any explicit multiplications at all, so personally I don't want the 2k table in my ROM. But there are probably a bunch of implicit ones from array accesses though.

It would be interesting to streamline the library so that some functions aren't included if they're not used.

You could easily to a search for "jsr muls" and "jsr mulu" in your output file to see if and where they're used.

OTOH ... not sure why you're so worried about the 2KB table. Are you really bumping up against the 1MB ROM limit?


Quote
I use "16bitvalue >> 4" a lot though, I haven't looked into optimizing the generated code for that. Is that worth doing?

Uli's code in the new HuC is definitely faster than the old code for that, but a shift by 4 is still going through a function and a loop rather than being inlined as fast code, so you can definitely improve the speed if you want to.

Another big thing is whether you're shifting a signed or unsigned value ... the unsigned shift is faster.

Shifts by 4 could certainly be inlined if you do them a lot. The code will be a lot faster, but it'll cost you 11 bytes each time that you use it, so it's a tradeoff.

You can just modify the __asrwi and __lsrwi macros in huc.inc to try it and see how it works for you.

Bonknuts

  • Hero Member
  • *****
  • Posts: 3292
Re: The new fork of HuC
« Reply #125 on: November 13, 2016, 08:10:56 AM »
elmer: This was the very reason why I wanted to do a single inline #asm for HuC. So a macro could be used inside something like this:

 var = index[U_shift_left_int(idx,4)];

 With U_shift_left_int being a macro to inline asm code.

elmer

  • Hero Member
  • *****
  • Posts: 2153
Re: The new fork of HuC
« Reply #126 on: November 13, 2016, 08:34:36 AM »
elmer: This was the very reason why I wanted to do a single inline #asm for HuC. So a macro could be used inside something like this:

 var = index[U_shift_left_int(idx,4)];

 With U_shift_left_int being a macro to inline asm code.

That kind of capability would be nice ... I just have no current idea of how it would be implemented in practice, or how you'd know the current state of the expression parsing so that you didn't get in the way.

Did you get it working? I'd be happy to add the patch for it into the current code.

The easiest way to *currently* get some control of the code generated for those shifts would just be to add a #pragma or something like that to enable/disable the fast-inlined shift-by-4 code.

The current __asrwi and __lsrwi macros could be easily modified to look at a global symbol to decide what to do.

<EDIT>

I'm also a bit peeved at the restrictions of trying to keep everything working for HuCard development.

The 6502/6280 can sometimes really benefit from the self-modifying code that the CD format allows for.  ](*,)
« Last Edit: November 13, 2016, 08:41:08 AM by elmer »

DarkKobold

  • Hero Member
  • *****
  • Posts: 1200
Re: The new fork of HuC
« Reply #127 on: November 13, 2016, 11:05:25 AM »
; load_vram(0x7E00,DogHead+0x200,0x80);
;                                    ^
;******  can't get farptr  ******

"#incspr(DogHead,"spr/spr_dog_head.pcx",0,0,2,9)"


getting my own farptr errors. These didn't occur before Cabbage's fix.

Also, the IRQ_TIMER and IRQ_VYSNC, I couldn't find them in any other file in Squirrel.
Hey, you.

Bonknuts

  • Hero Member
  • *****
  • Posts: 3292
Re: The new fork of HuC
« Reply #128 on: November 13, 2016, 05:12:34 PM »
A nice little macro would fix that: get_far_ptr(label,index)

 That aside, internally if HuC supported 24bit primitives - all labels could be kept as linear addresses. Then when used as a far pointer, simply converted to bank:local address on the fly. In the load_vram case, 0x200 would be added to the 24bit linear address of DogHead, then an internal macro would convert that to bank:local address. In this case, it would be a compile time calculation, but for something like DogHead+j it would still work as I described.

elmer

  • Hero Member
  • *****
  • Posts: 2153
Re: The new fork of HuC
« Reply #129 on: November 14, 2016, 03:13:31 AM »
getting my own farptr errors. These didn't occur before Cabbage's fix.

Also, the IRQ_TIMER and IRQ_VYSNC, I couldn't find them in any other file in Squirrel.

Thanks for testing it out!  :D

Congratulations, you've definitely found a problem.

It looks like the "fix" that I put in for the farptr wasn't good enough to deal with Uli's "symbol+offset" optimization.  #-o

That's the one that was causing a lot of problems with cabbage's code.

I thought that I'd found where Uli was doing that, and that his code was incorrectly setting the symbol type to '0'.

I was wrong ... I've finally found exactly where he's doing the optimization, and it's an "uninitialized-variable" problem.

That's why it's tripping up on different symbols randomly. It certainly shows that the optimization is getting used in a lot of places, which is good.

I've fixed it now, but it's not the most "elegant" fix, so I'm going to see if I can switch it to use Bonknuts's idea of the macro instead.


A nice little macro would fix that: get_far_ptr(label,index)

Yep, Uli is not-really-creating a fake new-symbol with the "symbol+offset" as the new name.

It's certainly one way of doing it, but the macro way would be cleaner, if it can be done.


Quote
That aside, internally if HuC supported 24bit primitives - all labels could be kept as linear addresses. Then when used as a far pointer, simply converted to bank:local address on the fly. In the load_vram case, 0x200 would be added to the 24bit linear address of DogHead, then an internal macro would convert that to bank:local address. In this case, it would be a compile time calculation, but for something like DogHead+j it would still work as I described.

Ahhh ... but that would be a huge change.

HuC doesn't really seem to deal with symbol values, just strings of text.

All of the actual numerical stuff for symbols is deferred until PCEAS.

Bonknuts

  • Hero Member
  • *****
  • Posts: 3292
Re: The new fork of HuC
« Reply #130 on: November 14, 2016, 05:43:20 AM »
Ahhh ... but that would be a huge change.

HuC doesn't really seem to deal with symbol values, just strings of text.

All of the actual numerical stuff for symbols is deferred until PCEAS.

 Maybe not that big. If HuC isn't actually handling this, but passing it along to PCEAS, then why not add a linAddrOf() directive to PCEAS? Honestly, it would make dealing with arcard card defines much easier as well (of course we still need a .bssAC directive).

 But yeah, let PCEAS handle it with linear address support.
« Last Edit: November 14, 2016, 05:49:23 AM by Bonknuts »

elmer

  • Hero Member
  • *****
  • Posts: 2153
Re: The new fork of HuC
« Reply #131 on: November 14, 2016, 06:34:24 AM »
Maybe not that big. If HuC isn't actually handling this, but passing it along to PCEAS, then why not add a linAddrOf() directive to PCEAS? Honestly, it would make dealing with arcard card defines much easier as well (of course we still need a .bssAC directive).

But yeah, let PCEAS handle it with linear address support.

I look forward to receiving your patch to add this capability.  :wink:

In the meantime ... I'm still trying to find a "clean" solution for the current farptr problem.

Using the macro to do it has potential issues that will need some thinking about.

Uli's solution was the "safe" one, if a bit wasteful. I might have to keep on using it.

Bonknuts

  • Hero Member
  • *****
  • Posts: 3292
Re: The new fork of HuC
« Reply #132 on: November 14, 2016, 08:10:16 AM »
So if I do this for PCEAS, you'll handle the HuC side? Deal. Best deal ever.

elmer

  • Hero Member
  • *****
  • Posts: 2153
Re: The new fork of HuC
« Reply #133 on: November 14, 2016, 11:17:09 AM »
So if I do this for PCEAS, you'll handle the HuC side? Deal. Best deal ever.

Hahaha!  :lol:

That would be an overly optimistic, and wildly inaccurate interpretation of what I wrote.  :wink:

nodtveidt

  • Guest
Re: The new fork of HuC
« Reply #134 on: November 14, 2016, 11:38:49 AM »
If anyone downloads this and gives it a try, even perhaps, you know, like the folks involved in a certain high profile Kickstarter ... then I'd love to get some feedback on whether there are any problems (I hope not).  :-"
Said folks will give this all a spin soon enough. ;)