Said folks will give this all a spin soon enough.
Excellent!
I should have the current bug fixed soon ... I think that the quick-and-dirty fix is safer at this point than changing the macros to allow for an offset parameter. The behavior of those macros is relied upon in too much of the optimizer code.
The next question is ... are you open to a change in the HuC register usage?
HuC currently uses A:X for the 16-bit accumulator, and Y is used mostly for slow stack accesses (being set to the value 1).
Changing that to have the 16-bit accumulator in Y:A, and using X to be the data-stack-pointer offers a considerable performance and size benefit when matched with a zero-page stack.
That stack can be tiny (probably 8-bytes or less) if you're using all of the current HuC tricks to have global variables and avoid the stack ... but if you give it a 32-byte or 64-byte stack, then you can probably avoid needing to use most of those tricks and just rely upon the compiler to auto-magically use the stack for local variables, and have it run as fast as putting everything into zero-page.
Remember that on the PCE, the "zero-page,x" addressing mode is just as fast as the basic "zero-page" addressing mode.
It'll need some improvement to the HuC optimizer to take full advantage of the capability ... but the main assembly-language code generation and libraries are looking good so far.