Author Topic: HuC questions. (Read 2518 times)

elmer · « **on:** May 24, 2016, 07:21:48 AM »

Can someone tell me how HuC arranges it's banks and it's memory usage?

It looks like there's a bank for constants ... not sure why. What else? Where do they go?

I'm trying to figure out if it would be easier to hack HuC, or to hack CC65 to try some optimization ideas.

The work that Ulrich Hecht and Artemio Urbina have done in the last couple of years to improve HuC is quite impressive.

That and the fact that CC65 wants to use subroutines for everything that HuC does with macros ... which means that a lot of performance gains would get swallowed-up in its JSR/RTS overhead.

TheOldMan · « **Reply #1 on:** May 24, 2016, 08:13:17 AM »

Quote

It looks like there's a bank for constants ... not sure why. What else? Where do they go?

There's a bank for constants; I assume its because they can be mapped in and out when needed. I believe thats bank 3, but i could be wrong.
There are 2 banks for the 'stock' libraries; banks 1 and 2, iirc.
Then there is a bank for the startup code. Bank 0, which gets mapped to $e000-ffff.
...
If I remember correctly, the startup code contains some of the library functions, while another bank
contains the rest. The 'secondary' library functions get mapped in/out via stub calls in the first library.

What I am fairly sure of is that HuC will leave a window of 3 banks for mapping code in and out for CDs. If you're building a HuCard image, I -think- the page at $c000-$dfff is available, also. Somewhere in the HuC docs there is a file explaining how things are set up.

Other than that, HuC generates code into the next available bank, as needed. Beware of this though - if a function won't fit into the available bank, it gets put in the next one. The unassigned space is not recovered.
And, if you define a user bank (ie, for assembler routines), it becomes the next available bank, and any unfilled banks before it are ignored. That is why re-arranging functions can sometimes lead to a huge reduction in memory useage.

Bonknuts · « **Reply #2 on:** May 24, 2016, 10:33:24 AM »

I don't remember if it was ca65 or another small c compiler (for z80 actually), but I remember another small c compiler having the ability for the programmer defining the bank and address of a C function. I thought that concept was incredibly powerful.

If you do some work with HuC, adding these feature would really help it out IMO.

elmer · « **Reply #3 on:** May 24, 2016, 11:47:21 AM »

Quote from: TheOldMan on May 24, 2016, 08:13:17 AM

There's a bank for constants; I assume its because they can be mapped in and out when needed.

Thanks for all the info!

I'm not sure if the constants bank ever gets paged-out ... I don't see anything to do that in the example compiled ".s" file that's included in the official distribution.

Quote from: Bonknuts on May 24, 2016, 10:33:24 AM

I don't remember if it was ca65 or another small c compiler (for z80 actually), but I remember another small c compiler having the ability for the programmer defining the bank and address of a C function. I thought that concept was incredibly powerful.

Yes, CC65 supports a lot of stuff like that that's passed down to CA65/LK65.

I have the crazy idea that if I were to choose dig around inside the Ulrich's HuC, it might be a good idea to change its source-code output format so that it can be assembled with CA65, and then people could have the familiarity of HuC, but with the powerful capabilities of CA65/LK65.

TheOldMan · « **Reply #4 on:** May 24, 2016, 12:28:17 PM »

Quote

If you do some work with HuC, adding these feature would really help it out IMO.

I'd settle for it remembering how much space is empty in each bank, and trying to fit functions in where
there is empty space.

Gredler · « **Reply #5 on:** May 24, 2016, 02:09:13 PM »

Quote from: elmer on May 24, 2016, 11:47:21 AM

I have the crazy idea that if I were to choose dig around inside the Ulrich's HuC, it might be a good idea to change its source-code output format so that it can be assembled with CA65, and then people could have the familiarity of HuC, but with the powerful capabilities of CA65/LK65.

Further HuC support? This sounds so awesome.

Quote from: elmer on May 24, 2016, 07:21:48 AM

The work that Ulrich Hecht and Artemio Urbina have done in the last couple of years to improve HuC is quite impressive.

Please excuse my terrible ignorance, but is there another place to read about and follow HuC development?

elmer · « **Reply #6 on:** May 24, 2016, 03:46:00 PM »

Quote from: Gredler on May 24, 2016, 02:09:13 PM

Further HuC support? This sounds so awesome.

Sorry, don't get your hopes up on my account ... I'm afraid that I've done a little bit of digging in the CC65 source code, and its code-generation is a lot simpler (and a lot better) than I'd thought.

I thought that it was doing everything as function-calls, but it's not ... there's a whole bunch of stuff that's either inlined or done as function calls depending upon your optimization settings at the time.

That's actually exactly how I'd want a compiler to work. So you can choose smaller-and-slower for functions that don't get called often, and bigger-but-faster when you want the speed.

Finding that out swings me back in that direction again.

Quote

Please excuse my terrible ignorance, but is there another place to read about and follow HuC development?

It's not ignorant at all ... they don't hang around here, so we don't get to hear what they've done.

I've actually no idea where they do hang out.

I just remembered a post here talking about "Uli's" HuC improvements and searched around until I found them on GitHub.

https://github.com/uli/huc

It looks like Ulrich finished what he wanted to do about 2 years ago, and then Artemio forked the project and added some of his own improvements.

From what I can see, Ulrich's improvements were inspired by the code in SmallC-85 here ...

https://github.com/ncb85/SmallC-85

elmer · « **Reply #7 on:** May 25, 2016, 06:42:44 AM »

BTW ... when I was looking through the HuC code generation and the macros, and it looks like everything is done as 16-bit values.

From what I can see, the only things that are done as 8-bit are loads and stores to "char" variables.

Am I correct, or am I missing something?

TheOldMan · « **Reply #8 on:** May 25, 2016, 08:47:25 AM »

Quote

it looks like everything is done as 16-bit values.

Not quite. Some (most?) of the macros check the size of the given value, and omit the high byte code if a char gets passed.
In general, though, HuC only deals with ints. And it won't cast a char to an int, either

elmer · « **Reply #9 on:** May 25, 2016, 09:10:25 AM »

Quote from: TheOldMan on May 25, 2016, 08:47:25 AM

Quote
it looks like everything is done as 16-bit values.

Not quite. Some (most?) of the macros check the size of the given value, and omit the high byte code if a char gets passed.

Hmmm ... can you point me out an example? Perhaps I'm looking at an old version of HuC?

When I look at huc.inc, I think that I'm seeing that the load and store macros take a 1-or-2-byte parameter, but that any math (expression evaluation in particular) is always on 16-bit values.

Should I be looking in the code-generator itself?

dshadoff · « **Reply #10 on:** May 25, 2016, 09:17:51 AM »

Quote from: elmer on May 25, 2016, 06:42:44 AM

BTW ... when I was looking through the HuC code generation and the macros, and it looks like everything is done as 16-bit values.

From what I can see, the only things that are done as 8-bit are loads and stores to "char" variables.

Am I correct, or am I missing something?

Off the top of my head, some quick answers to your questions (I didn't verify against source code):

Bank 0 = pinned to hardware bank
Bank 1 = pinned to RAM ($2000)
Bank 2 & 3 = user data; I believe one is pinned as constant, and the other is automatically manipulated as I recall, due to complexity in handling the mapping.
Bank 4 & 5 = user code
Bank 6 = mapped in/out for system functionality (ie. system card or replacement)
Bank 7 = pinned for CDROM-like functions (or replacement if HuCard output is generated)

HuC was based on a C compiler which assumed 'int' = 16-bits, even though 8-bits would be the 'native' unit for the 6502.

As I recall, 'C' came out when 16-bit processors were standard, on the way to 32-bit processors (1984-85ish). The concept of 'native-size' was a compromise to allow that expansion to 32-bits to occur, as forcing 16-bit operations would have actually slowed things down. I don't recall ever seeing a version of 'C' where 'int' was smaller than 16-bits (what would a 16-bit value be called if int=char ?)

It's true that if you know your values don't need to be larger than char-sized, using them would be substantially faster.

-Dave

Bonknuts · « **Reply #11 on:** May 25, 2016, 11:59:47 AM »

I've noticed this about HuC as well. Even if the operation turns out a macro that performs an 8bit operation, the results are still returned in A:X (with a SAX, CLA to clear the MSB of the 16bit result).

TheOldMan · « **Reply #12 on:** May 25, 2016, 01:36:32 PM »

Quote

Should I be looking in the code-generator itself?

Its not how HuC generates the code; it basically generates macro calls to do things, iirc.
It's in the macros themselves.

For example: if HuC generates a __phw call, it eventually resolves to something that checks the size
of the operand to generate different versions of the function. See huc_opt.inc for that macro, and others that do a similar thing.

I found that example in a .lst file. The macros seem to be used to get pceas to generate the right code. I don't know how to explain it, but it works.

elmer · « **Reply #13 on:** May 25, 2016, 03:01:24 PM »

Quote from: TheOldMan on May 25, 2016, 01:36:32 PM

Its not how HuC generates the code; it basically generates macro calls to do things, iirc.
It's in the macros themselves.

For example: if HuC generates a __phw call, it eventually resolves to something that checks the size
of the operand to generate different versions of the function. See huc_opt.inc for that macro, and others that do a similar thing.

Sure, that macro is one of the ones that looks at the variable size. It's in huc/include/pce/huc_opt.inc

But that's just another one of the load/store macros.

From what I'm seeing, all the math macros like compare, add, subtract, shift, multiply, etc are all hard-coded for 16-bit values in X:A.

That's not horrible ... but it sure would be nice to save the cycles.

Quote from: dshadoff on May 25, 2016, 09:17:51 AM

Off the top of my head, some quick answers to your questions (I didn't verify against source code):

Thanks Dave, that's very helpful.

Quote

HuC was based on a C compiler which assumed 'int' = 16-bits, even though 8-bits would be the 'native' unit for the 6502.

As I recall, 'C' came out when 16-bit processors were standard, on the way to 32-bit processors (1984-85ish).

C itself was created in 1972 for the PDP-7 & PDP-11 minicomputers, which were definitely 16-bit machines.

HuC is based on Ron Cain's free Small-C compiler that was published as a source-code listing in the May 1980 issue of the wonderful and sadly-missed "Dr Dobb's Journal" magazine.

That was back in the days before the Internet, and before the common use of modems and BBS's, back when folks actually typed in the listings in magazines by hand.

There's an interesting story about its creation on Ron Cain's web page ... http://www.svipx.com/pcc/PCCminipages/zc9b6ec9e.html

Promoting chars to ints for expression-evaluation isn't required by C, but it sure makes the compiler itself smaller and easier!

Quote

It's true that if you know your values don't need to be larger than char-sized, using them would be substantially faster.

Yep, if (and again, it's still "if") I decide to spend some time looking trying to optimize one of the 'C' compilers for the PCE, I really, really, really want it to understand the difference between a byte and a word.

The classic Small-C based compilers like HuC just don't bother about it.

It was nice to see that CC65 actually does, and that its code-generation allows for different paths for 8, 16 and 32 bit values.

It's one of the things pushing me towards hacking CC65 rather than HuC.

IMHO, it's not worth my time and energy to mess with one of the compilers unless I believe that I can get an end-result that dramatically improves things, and that gives me something that I'd be willing to use myself to make my homebrew coding faster.

As such ... I'm willing to limit things to a subset of C, and to have to write C code that lets the compiler do a good job (i.e. lots of global variables), but I would expect the end-result to be semi-respectable assembly code, or else it's just not worth it.

One obvious thing to do is to follow one of the suggestions on Dave Wheeler's page http://www.dwheeler.com/6502/a-lang.txt

If we limit the parameter stack to 256-entries and refer to it as "__stack,X" instead of HuC's "(__stack),Y", then all local variable accesses become fast operations, including access to any stack-based argument or parameter.

If we choose to put the stack in zero-page, then C's local variables become as fast as hand-optimized code.

It's just a case of accepting that if we make that choice, then there will be consequences in other areas.

For instance ... it might be best to disallow arrays or structs as local variables in a function (but you'd still be able to allocate them on the heap and have pointers to them).

It would also simplify things if you weren't allowed to take the address of a local variable (just static and global variables).

I can live with a lot of restrictions like these if they make fast code possible.

Can everyone else?

Here's another idea ... if you no longer have a C data stack that grows through memory, then you can afford to have a really simple heap instead.

The "classic" simple heap scheme in games is to basically have an area of memory, and then push and pop memory allocations from both ends of that memory area.

"Temporary" allocations might start from top, and go downwards; and "Long-lived/Permanent" allocations might start at the bottom and go upwards.

That incredibly-simple scheme lets you do a lot of useful dynamic allocation with very little overhead.

aurbina · « **Reply #14 on:** May 25, 2016, 05:18:12 PM »

Quote from: TheOldMan on May 24, 2016, 12:28:17 PM

Quote
If you do some work with HuC, adding these feature would really help it out IMO.

I'd settle for it remembering how much space is empty in each bank, and trying to fit functions in where
there is empty space.

That is what the last patch I made to HuC does, since I was working on the 240p test suite version for system card 1.0.

Author Topic: HuC questions. (Read 2518 times)

elmer

HuC questions.

TheOldMan

Re: HuC questions.

Bonknuts

Re: HuC questions.

elmer

Re: HuC questions.

TheOldMan

Re: HuC questions.

Gredler

Re: HuC questions.

elmer

Re: HuC questions.

elmer

Re: HuC questions.

TheOldMan

Re: HuC questions.

elmer

Re: HuC questions.

dshadoff

Re: HuC questions.

Bonknuts

Re: HuC questions.

TheOldMan

Re: HuC questions.

elmer

Re: HuC questions.

aurbina

Re: HuC questions.