#asm
.zp
var1: .ds 1
ptr1: .ds 2
tinyarray: .ds 8
#endasm
No need to worry about addresses. Just use the label names. This is what I've always done in HuC for ZP variables. And of course .BSS for non ZP variables.
Hahaha ... Thanks, that makes sense!
I keep on forgetting that HuC is really just a pre-processor for PCEAS, and that you can just drop down into assembly like that.
Using local variables isn't as bad in this version if you compile with -fno-recursive -msmall.
Yep, that's right, "-fno-recursive" just makes every local variable into a global variable, and so they go through the existing semi-fast processing for globals.
Then "-msmall" just drops the high-byte adjustment of the stack pointer, leading to shorter and faster stack code.
But stack usage is still *slow*, and it's used *constantly* for intermediate results and so it would benefit from being faster, even if you don't use true local variables or parameters.
In my game I don't use any explicit multiplications at all, so personally I don't want the 2k table in my ROM. But there are probably a bunch of implicit ones from array accesses though.
It would be interesting to streamline the library so that some functions aren't included if they're not used.
You could easily to a search for "jsr muls" and "jsr mulu" in your output file to see if and where they're used.
OTOH ... not sure why you're so worried about the 2KB table. Are you really bumping up against the 1MB ROM limit?
I use "16bitvalue >> 4" a lot though, I haven't looked into optimizing the generated code for that. Is that worth doing?
Uli's code in the new HuC is definitely faster than the old code for that, but a shift by 4 is still going through a function and a loop rather than being inlined as fast code, so you can definitely improve the speed if you want to.
Another big thing is whether you're shifting a signed or unsigned value ... the unsigned shift is faster.
Shifts by 4 could certainly be inlined if you do them a lot. The code will be a lot faster, but it'll cost you 11 bytes each time that you use it, so it's a tradeoff.
You can just modify the __asrwi and __lsrwi macros in huc.inc to try it and see how it works for you.