Its not how HuC generates the code; it basically generates macro calls to do things, iirc.
It's in the macros themselves.
For example: if HuC generates a __phw call, it eventually resolves to something that checks the size
of the operand to generate different versions of the function. See huc_opt.inc for that macro, and others that do a similar thing.
Sure, that macro is one of the ones that looks at the variable size. It's in huc/include/pce/huc_opt.inc
But that's just another one of the load/store macros.
From what I'm seeing, all the
math macros like compare, add, subtract, shift, multiply, etc are all hard-coded for 16-bit values in X:A.
That's not horrible ... but it sure would be nice to save the cycles.
Off the top of my head, some quick answers to your questions (I didn't verify against source code):
Thanks Dave, that's very helpful.
HuC was based on a C compiler which assumed 'int' = 16-bits, even though 8-bits would be the 'native' unit for the 6502.
As I recall, 'C' came out when 16-bit processors were standard, on the way to 32-bit processors (1984-85ish).
C itself was created in 1972 for the PDP-7 & PDP-11 minicomputers, which were definitely 16-bit machines.
HuC is based on Ron Cain's free Small-C compiler that was published as a source-code listing in the May 1980 issue of the wonderful and sadly-missed "Dr Dobb's Journal" magazine.
That was back in the days before the Internet, and before the common use of modems and BBS's, back when folks actually typed in the listings in magazines by hand.
There's an interesting story about its creation on Ron Cain's web page ...
http://www.svipx.com/pcc/PCCminipages/zc9b6ec9e.htmlPromoting chars to ints for expression-evaluation isn't required by C, but it sure makes the compiler itself smaller and easier!
It's true that if you know your values don't need to be larger than char-sized, using them would be substantially faster.
Yep, if (and again, it's still "if") I decide to spend some time looking trying to optimize one of the 'C' compilers for the PCE, I really, really, really want it to understand the difference between a byte and a word.
The classic Small-C based compilers like HuC just don't bother about it.
It was nice to see that CC65 actually does, and that its code-generation allows for different paths for 8, 16 and 32 bit values.
It's one of the things pushing me towards hacking CC65 rather than HuC.
IMHO, it's not worth my time and energy to mess with one of the compilers unless I believe that I can get an end-result that dramatically improves things, and that gives me something that I'd be willing to use myself to make my homebrew coding faster.
As such ... I'm willing to limit things to a subset of C, and to have to write C code that lets the compiler do a good job (i.e. lots of global variables), but I would expect the end-result to be semi-respectable assembly code, or else it's just not worth it.
One obvious thing to do is to follow one of the suggestions on Dave Wheeler's page
http://www.dwheeler.com/6502/a-lang.txtIf we limit the parameter stack to 256-entries and refer to it as "__stack,X" instead of HuC's "(__stack),Y", then all local variable accesses become fast operations, including access to
any stack-based argument or parameter.
If we choose to put the stack in zero-page, then C's local variables become as fast as hand-optimized code.
It's just a case of accepting that if we make that choice, then there will be consequences in other areas.
For instance ... it might be best to disallow arrays or structs as local variables in a function (but you'd still be able to allocate them on the heap and have pointers to them).
It would also simplify things if you weren't allowed to take the address of a local variable (just static and global variables).
I can live with a lot of restrictions like these if they make fast code possible.
Can everyone else?
Here's another idea ... if you no longer have a C data stack that grows through memory, then you can afford to have a really simple heap instead.
The "classic" simple heap scheme in games is to basically have an area of memory, and then push and pop memory allocations from both ends of that memory area.
"Temporary" allocations might start from top, and go downwards; and "Long-lived/Permanent" allocations might start at the bottom and go upwards.
That incredibly-simple scheme lets you do a lot of useful dynamic allocation with very little overhead.