Well, it's not as cramped as that, but the system card does allocate from the bottom up, and the top down.
Okay. I just checked and there's an area between $90 and $DC that's not being used, afaik.
So yeah, maybe not too cramped for a stack area.
...
I'm not reading a manual. I'm looking at the system card code.
Granted, you probably could use most of the zero page for a stack...but you would lose access to the cd, since a lot of cd-related variables are stored from the bottom upwards (ie, $00+)
For example, you couldn't play a cd audio track, since the TOC information is loaded down there....
OK guys, you're scaring me here ... and I missing something crucial, or are we talking about different things?
ZP is $2000-$20FF. The Hu7 CD manual clearly documents that $2000-$20DB are User Area (i.e. free for use).
RAM is $2200-$3FFF. The Hu7 CD manual clearly documents that $2680-$3FFF are User Area (i.e. free for use).
AFAIK, any other usage of ZP that you're currently seeing is something to do with HuC, and not the CD System Card.
Am I missing something?
I would also try to avoid making my functions call other functions much, and avoid passing any sort of parameters in the 'C' stack, because stack-frame accounting itself wastes time and energy. Just like local object instatiation does in C++ (instantiation often = waste).
One thing HuC does well, is to try to pass a single 8-bit or 16-bit value via registers. Creating and dropping the stack frame is serious wasted effort on a machine of these capabilities.
Yep, CC65 also passes the last parameter in registers rather than on the stack.
And "yes",
any stack handling is slower than none at all ... but when the stack is in ZP, then accessing it is just as fast as the fastest static variable, and stack handling becomes just "dex" ... which is about as fast as you can get.
It's like having the compiler automatically create ZP static variables for you without you having to think about it.
One thing that a 'C' compiler - through its mere existence - does, is to lull people into a false sense that programming habits on one machine will translate well to another machine. So, I would anticipate people passing 4 int variables in a function call. I would anticipate 8-deep call levels. And so I would anticipate corruption of variables due to exhausting all memory. The target code would fail without warning (because who's going to put bounds checks in there ?), and the user would blame the compiler for his problems.
Well, 8 levels deep with 4 ints per level is 64 bytes. Well within a 128 byte stack.
There's no reason that there would be no warning. Stack checking on a PCE, if enabled, could be as simple as a "dex; bmi overflow".
It would also be trivial to have an emulator, like Mednafen, specifically watch for a stack overflow (even without the overhead of an embedded "bmi overflow"), and break the program.
That's just a debugging improvement. Like adding symbol support to Mednafen, and even code profiling. None of those are particularly difficult.
And speaking of stack overflows without warning ... does HuC support stack checking? I can't see an option for it on the HuC command line.
+1. 8 deep level calls is not that unusual; 4 ints as parameters isn't either (consider Rovers 'example', where there are several variables, for setting up a sprite).
If you're doing that in HuC, then you're generating some pretty slow and ugly code ... unless everything is already declared as a static.
or, given X is an offset, you could generate labels for the entire stack area, and access values as
'lda <stk06'. No indirection needed. Right?
Yes, I know thats not workable in reality.
If you're talking about accessing local variables without indirection, then "yes", that's what I'm already implementing in CC65, and it's easy because the compiler already knows the offset of any local variable relative to the current stack pointer.
So
every local variable in a function is just "stack+offset,x" ... fast.
If you're talking about the step beyond that, where the compiler/linker actually analyzes the code at link time and gives every single parameter and local variable a static location in memory ... then that's also workable. SDCC implements that strategy.
Unfortunately, I don't think that I'm ready to add complete 65C02 processor support to SDCC, its assembler and its linker.
What I pesonally think would be useful is to blend the current C stack and the ZP area stack.
The ZP stack could hold the address of a parameter block. Since the parameters would be consecutive, you could place the base address (stack,x) into a temp, then use [temp],x to access them. Not as fast, but not as limiting either.
(just an idea )
I take it that you really mean "[temp],y" to access them. I guess that I'm missing something again. How is that an improvment over the current HuC "[stack],y"?
I still think a good advanced (ie, not peephole) optimization program could do wonders for even the lousy code HuC generates....
Improving either HuC or CC65's actual internal optimization would be great, but is beyond my interest level.
If someone really wants an optimized C on the 65C02, then I suggest that they look at getting SDCC processor support implemented ... it's not supposed to be
totally horrible to do. That way you'd get a real modern C compiler with all the expected optimizations (like constant propogation, loop invariables, dead-code elimination, etc, etc).
Perhaps that could be Bonknuts' Degree/Masters project!