Sorry to go off topic, but speaking of SATB updating and SATB DMA. SATB auto update happens right after the active display ends on the VDC side. If you write to (or read) VRAM, the CPU will be stalled the whole 3 scanlines ( length for low res mode IIRC) of the SATB DMA. While that's only like 1% CPU time in a single frame, it's 9% vblank time. Might want to do some other things and not a SATB local to vram call right at vsync(). Also remember, using vblank to do SATB local transfer puts sprites out of phase by 1 frame. (I use an H-int line to generate an earlier pseudo vblank() call and do my satb local update then, just for simplicity reasons).
Also, after working on the Genesis and its really nice sprite dynamic link-list system - one could replicate that on PCE. Using self modifying code, setup 64 areas of ST1/2 opcodes. The operands of the instructions should have equates, which is no big deal. Of the 64 entries/sections of ST1/ST2 codes, there should be an RTS at the end(maybe pad a NOP too for quick address calculation for other code access, etc). Then make a table with all the starting points of each local SATB entry. Your overhead would be a JSR to a jmp [table,x] routine. X corresponds to the SATB be entry. You can easily do a 64byte array to simulate the link list. Instead of doing a TIA of the whole local block, you run through the link-list array and call that corresponding code block update. ST1/ST2 is faster than TIA to vram, so the added overhead of JSR and jump table is only slight. Plus, you have the added benefit of not stalling the TIMER interrupt. Which is always a good thing.
There are other ways to simulates a link-list system, but they're slower; multiple TIA's (the call overhead adds up, still stalls TIMER INT), manual copy with a loop (pretty slow in ASM, *extremely* slow if done in C with HuC), or just manually re-arranging all the local satb entries by manual loop/copy code (slowest method of them all, and add that on top of HuC for some real piss poor performance).
If you're not familiar with a sprite link-list system then here's the run down; the video processor grabs the first sprite in the table and fetches all the attributes but instead of going to the next physical entry, it jumps to the next entry designated by the link-list value. It does this until it reaches the max number of sprite entries OR it links to a termination entry (IIRC, is itself). You can very easily re-order priorities of sprites with this technique. Simple games might need much re-ordering, but a system that has a Z binary depth style position system would ( SF2, Final Fight, 3/4 RPGs, even top down RPGs, etc). A link-list is extra useful for complex sprite clipping systems (many PCE games do this due to a lack of a second layer for complex clipping).