But what I posted was a "safe" version. So that you don't delayed interrupts (in this case, since it's in vblank, the TIRQ routine), through smaller transfers and a small/fast iteration overhead.
Good point!
I was trying to keep a similar interface to the original functions which read a single sample, and I was thinking that they'd be run after a vsync() in order to avoid snow on the screen.
But you're right, I should still limit the TAI size in order to avoid blocking the TIMER interrupt.
My bad. :oops:
Plus, it's a chance to use the T flag... who doesn't like using the T flag???
Hahaha ... also a good point, but keeping the low-byte of the address in the A reg and doing ...
adc #$20 ;2
sta <low((.loop & 0xff)+2) ;4
... is a cycle faster, and it avoids messing with my stack-pointer-in-X.
Of course ... I throw away that cycle with the setup and the JSR, but that's a tradeoff for not needing a permanent routine in ZP.
There's certainly an argument for have a few of these Txx-32-byte subroutines in regular RAM to use for self-modifying code, and IIRC, HuC already has one somewhere.
Here's a corrected code, and of course, things are much cleaner in the CDROM version ...
; --------
; Alternate names when the parameter-passing area is used for
; a self-modifying Txx instruction.
;
__tc = $20F8
__ts = $20F9
__td = $20FB
__tl = $20FD
__tr = $20FF
; set_colors(int *pbuffer [__ts] )
; set_colors(int index [color_reg], int *pbuffer [__ts], unsigned char count [acc] )
; ----
; index: index in the palette (0-511)
; pbuffer: source buffer
; count: # of 16-color palettes, (1-32)
; ----
_set_colors.1: stz color_reg_l
stz color_reg_h
lda #32
_set_colors.3: tay
.if (!CDROM)
lda #$E3 ; TIA
sta <__tc
lda #$60 ; RTS
sta <__tr
lda #$04
sta <__td+0
sta <__td+1
lda #$20
sta <__tl+0
stz <__tl+1
lda <__ts+0
.l1: jsr __tc
adc #$20
sta <__ts+0
bcc .l2
inc <__ts+1
.l2: dey
bne .l1
rts
.else
lda <__ts+1
sta .l1+2
lda <__ts+0
sta .l1+1
.l1: tia $0000,color_data,$0020
adc #$20
sta .l1+1
bcc .l2
inc .l1+2
.l2: dey
bne .l1
rts
.endif