I figured I'd post this. I've written a few in the past, but I needed something more efficient and faster. Plus, I like the fade 'in' part much better than my older method.
;....................................................................
;
FadePal:
lda <D3
asl a
asl a
asl a
sta <D4
asl a
asl a
asl a
sta <D5
cla
rol a
sta <D5+1
_int_Fade0:
clx
.loop
lda PalBlock,x
and #$07
sec
sbc <D3
bpl .BlueOVF
cla
.BlueOVF
sta WorkBlock,x
lda PalBlock,x
and #$38
sec
sbc <D4
bpl .RedOVF
cla
.RedOVF
ora WorkBlock,x
sta WorkBlock,x
lda PalBlock,x
and #$c0
sec
sbc <D5
tay
lda PalBlock+1,x
and #$01 ;needed because reading pal data from VCE leaves garbage in the top unused bits
sbc <D5+1
bpl .GreenOVF
cla
cly
.GreenOVF
sta WorkBlock+1,x
tya
ora WorkBlock,x
sta WorkBlock,x
inx
inx
bne .loop
_int_Fade1:
.loop
lda PalBlock+256,x
and #$07
sec
sbc <D3
bpl .BlueOVF
cla
.BlueOVF
sta WorkBlock+256,x
lda PalBlock+256,x
and #$38
sec
sbc <D4
bpl .RedOVF
cla
.RedOVF
ora WorkBlock+256,x
sta WorkBlock+256,x
lda PalBlock+256,x
and #$c0
sec
sbc <D5
tay
lda PalBlock+256+1,x
and #$01
sbc <D5+1
bpl .GreenOVF
cla
cly
.GreenOVF
sta WorkBlock+256+1,x
tya
ora WorkBlock+256,x
sta WorkBlock+256,x
inx
inx
bne .loop
rts
;endsub
The call argument is D3, and two blocks of memory are whatever you define in local ram. D3 is a value from 00 to 07. It's modified for D4 and D5, so they can subtract those specific R/G elements without shifting them. I use two blocks of memory; both are 512 bytes long. Because I need to fade the existing BG palette, which might be made up of dynamic subpalettes at any point in time - I read all of CRAM from the VCE into "PalBlock" segment of ram. "WorkBlock" is the temporary palette buffer that the changes are saved to, to be uploaded to the VCE CRAM.
Here's the code to read CRAM into the block/array(local ram).
;....................................................................
;
UpdateWorkPal:
stz $402
stz $403
tia WorkBlock,$404,$200
rts
;endsub
Not many docs mention that you can actually read from VCE CRAM, but you can. IIRC, though, there's no guarantee that the upper unused bits will be cleared. Normally not a problem, especially if you upload with those bits set, but when altering the color value as such as the above code - make sure you AND/mask off those bits. The above fade routine does that, so no worries.
Reading and writing to the VCE (any of the VCE regs) will stall the VCE and show distortion on active display. It's best to call the VCE CRAM read routine from vblank to avoid this. You only need to call it once; one and done (the VCE CRAM read routine).
The above routines only fade/adjust the BG 256 colors. You'd have to modify it for sprite 256 colors. If you want to fade to black or fade up from black, you'll need another routine that modifies D3 and calls the FadePal routine each time.