Just to clarify for the slowness. Is it the compilation with HUC or macros that are not optimized.
Or maybe the two ...
It's honestly not the macros. It goes much deeper than that. Part of it is code generation (pointer, array, and far data access) and part of it is higher level design issues on this style of processor.
Part of it is HuC and how it handles pointers and array access, as well as binary shifts - more so than your standard C output to asm (which tends to be slower). The other part is the actual libs (sprites, tilemaps, etc). They're written to be general and flexible, but they are slow by design (compared to commercial games where almost every game uses its own map routines). And then lastly, you're now support two VDCs.
If you want to get back speed, write your town tilemap routines in asm and optimize accordingly. Write your own sprite routines as well. You'll need to write new routines to access array data (even near array data is treated as far data, which is very slow). Same for shifts. Use global variables where ever you can because private vars are slow. I have another lib that allows full screen hsync effects; Y and X line scroll, BG color #0 change, sprite/bg layer on/off, etc. It's fairly quick, and interfaces like a
display list to the user side. But since HuC is horrendously slow with array access, you could spend almost a whole frame's worth of cycles with simply updating each entry.
It's just that HuC isn't built for speed. It's needs asm support to prop it up just for PCE stuffs. Adding an extra VDC just compounds that issue. You also don't have access to the additional SGX 24k of ram with HuC either. And you end up writing a
lot of assembly to fix these issues.
This is why I don't post links or do anymore support lib stuff with HuC anymore. It's a losing battle. Writing new routines to support a specific project is one thing, but writing a generalized lib for anyone to use - is not optimal. And the people that use it, (Ark, Rov, touko), already have it figured out with their specific fixes.
It comes down to the philosophy of 65x coding, which huc6280 is a branch of; every change or design usually requires re-optimization. The 65x is fast, but it isn't a strong
general purpose processor like the 68k for which very little optimization needs to be applied for good results or changes to code. If you understand this at the ISA level, then you can begin to see why something as generalized as graphic libraries lose speed for what they gain in flexibility. And in HuC, writing new libs means writing in ASM. And if you can write in ASM, why do you even need HuC? I struggled to find a solution to this dilemma for a while, until I finally gave up. It's not just HuC, or its libraries, but this higher level concept. Other 65x machines, with better compilers like CC65, coders run into the same issues eventually. And ASM is the fix, or you limit the scope and capability of your project.
If it's the macros, I can begin to optimize them. But if this is the HUC compilation, I can just call the macro in asm.
If you are experienced with assembly, and the 65x ISA, then you could begin to add or replace support for HuC for your specific project. The first thing I would start with is point/array/far_data access. HuC has
pragma fastcall function support, which allows you force argument passing via ZP or registers, instead of the internal slow stack. It allows you to use functions in asm that called in C like normal C directives and such. So instead of x=array[index]; , you could write something like x=NearDataArray(array,index);. I.e. something that sits in fixed ram bank.
Pragma fastcall also allows argument overloading, so one function (depending on the number of arguments) can be used in different ways. I could also do NearDataArray(array,index,x); so the value of X is written into the array instead of vice versa. It'd be wise do to the same for shifts as well.
I would also write all sprite handling and map handling in asm, from scratch. Once you get to know the PCE graphic hardware, you begin to realize sprite and map routines don't need to be restricted to vblank - which can speed things up if done right. The VDC interrupt service is also terribly slow in HuC, not to mention limited in capability. I would
completely remove/replace it.
But in my opinion, that's quite a bit of down time learning the internals of HuC and how to interface asm stuff with it, and you still have to deal with the forced mapping layout of how HuC manages banks and pages.
I don't want to discourage you, but you should know what you're getting into.