Ok... back.
So declaring an array is the same thing as a "static" pointer in C. Or a constant pointer. Because that pointer never changes.
This means you can access small arrays in work ram, with direct addressing plus indexing (as shown in the BYTE/BYTE example). I not entirely sure, but I think Huc, being Small C, doesn't allow the creation of pointers like you can in normal C. But that doesn't mean you can't do it in ASM.
char *pointer;
.
.
.
pointer = &label_1
equates to...
lda #low(_label1)
sta <_pointer
lda #high(_label1)
sta <_pointer+1
So now 'pointer' holds the address of 'label1'
*pointer=var
pointer++;
lda _var
sta [_pointer]
inc <_pointer
bne .skip
inc <_pointer+1
.skip
There are 128 possible 'address' vectors or pointer slots on the 65x. In HuC and/or the system card, some of those are reserved so you actually have less than that. But in reality, you don't usually need more than 20-30 pointers/address vectors. The cool thing about the 65x's hardware ZeroPage registers, is that since there are soo many address vectors definable, it's not a problem just to leave the vector setup. I.e. you don't have to keep loading and unloading pointers when you need to use them.
The 65x allows a few other things. First thing to note is; all indexing on the HuC6280 is free. And you can index pre and post index registers. Here are some examples:
*pointer[x++]=var;
ldy _x
lda _var
sta [_pointer],y
inc _x
In that example, the pointer isn't destroyed or altered. This means you can randomly and quickly access up to 256 elements from the base pointer address. Very fast and flexible. The only downside is the limited size of the indexing to 256, but there are clever and fast ways to extend this further.
Another example of pointer flexibility:
*(pointer_array[x])=var;
lda _var
ldx _x
sta [_pointer_array,x]
So you can index a pointer table array. I hope my C syntax (C99) is correct on those, but if not - you should be able to get the idea I'm trying to convey.
Now, this has all been for accessing near data. Accessing far is exactly the same thing, except for one minor difference. Far data needs a 24bit address. Unfortunately, Hudson didn't add a long addressing mode to the custom R65C02S. It's not too big of a problem though, but it does require prioritizing and optimization of you need specific access in relation to speed. Basically how you map out your logical cpu address range.
Any, for far data - the process I've described above is exactly the same. It just requires you mapping in a far page/block of memory, into the local address range. Once this is down, all that I've written applies exactly. Work ram, you never want to map that out - except for extreme conditions. Mostly because the stack and address vectors get replaced with whatever you map in - in that address range/page. You'll definitely have a problem with interrupts and such. So leave that page alone. The last page and the very first page also, normally shouldn't be changed. If you design your code from the ground up, it's not much a problem. But if you plan to use HuC or any of the Mkit libs or setup, you're mostly restricted to leaving that 24k fixed in local address range.
Not sure if I'm forgetting anything
Edit: Oh yeah. Some closing information/suggestions.
Most array/pointer data is accessed in some sort of sequential method. Take advantage of this. You can using the free indexing even if the source array/requires are longer than 256bytes/128words. Just use the indexing reg as a counter, then increment the MSB of the pointer. Like such:
for(int x=0,x<384,x++)
{
*pointer &= 0x03;
*pointer++;
}
clx
.loop_outer
cly
.loop_inner
lda [_pointer],y
and #$03
sta [_pointer],y
iny
cpy #128
bcc .loop_inner
inx
cpx #03
beq .out
tya
clc
adc <_pointer
sta <_pointer
bne .loop_outer
inc <_pointer+1
bra .loop_outer
.out
384 is an easy multiple of 128. So here's an example of a variable length loop, but still using indexing for addressing and a counter:
; lets assume len is 521
for(int x=0,x<len,x++)
{
*pointer &= 0x03;
*pointer++;
}
clx
.loop_outer
cly
.loop_inner
lda [_pointer],y ;7
and #$03 ;2
sta [_pointer],y ;7
cpy low(_len) ;5
beq .check_msb ;2
.cont
iny ;2
bne .loop_inner ;4
inc <_pointer+1
inx
cpx high(_len)
bne .loop_outer
bra .out
.check_msb
cpx high(_len)
bne .cont
.out
Compare that with this normal, non index method:
.loop
lda [_pointer] ;7
and #$03 ;2
sta [_pointer] ;7
lda <_pointer ;4
clc ;2
adc #$01 ;2
sta <_pointer ;4
lda <_pointer+1 ;4
adc #$00 ;4
sta <_pointer+1 ;4
cmp high(_len) ;5
bcc .loop ;4
lda <_pointer+1 ;4
cmp low(_len) ;5
bne .loop ;4
.out
The index version of the loop is 29 cycles per single cycle of the for/loop. You have a tiny amount of over head on X rollover - which happens once every 256 increment of Y. That translate into less than a single cycle over all into the loop count of cycles.
On the other hand, the normal method is 49 cycles per single cycle of the for/loop. And when the MSB is aligned, you put on another 13 cycles on top of that 49 cycles per single cycle of the for/loop. So if 'len' was $1ff, the last $ff of cycles would be 62cycles per instance. So if len was $1ff, the average would be 55.5cycles.
55cycles VS 29cycles. The index method is more abstract, but clearly the winner. 1.9 times as fast. If you did the same thing in HuC, it'd take around 200 or more cycles per for/loop instance.