I'd love to dig through your old HuC stuff, Bonknuts. It sounds like you wrote some really useful stuff.
HuC can do horizontal split-screen so easily, but I wish the screen could split into more than 4 segments... ie control individual scanlines--for making wavy raster effects
Like this?
http://www.pcedev.net/HuC/Chsync_ver_1_1/chsync_ver1_1.zip Like I mentioned before, you need ASM to support it. It's not JUST array (more specifically pointer) being slow, although it is, it's everything included and needed to modify 224 scanlines of data. That demo for HuC, 224 scanlines (doh! I clipped the display to 216 scanlines.. so I wasted some cpu cycles making scanline data that will never get used) are used for three effects: BG color #0, X, and Y regs. I didn't animate BG color #0, so it requires no maintenance in vblank. Only X and Y scroll reg arrays. And at that, I only did the LSB of the scroll regs (usually don't need more than that for wavy effects or line scrolls). The loop takes 11,200 cycles. Vblank (216 scanlines) is 20,000 cycles. So half of vblank is needed. Though if you double buffer or 'chase' the beam, you can update during active display without artifacts and give more time back to vblank sensitive code.
Equivalent C code would be:
sin1_idx=old_idx1++;
sin2_idx=old_idx2++;
for(j=1;j<224;j++)
{ hs_x_l[j]=sin1[sin1_idx++]; hs_y_l[j]=sin1[sin2_idx++]+j; }
J is a char, although that won't matter much here if it was an int. It takes the above C code 217,729 cpu cycles to complete that loop! 11,200cycles VS 217,729cycles. Hell, a single 1/60 frame is only ~119,436cycles.
Edit: Opps, sorry. I had the 'master' cycle count in there instead of cpu cycle count for C code. Fixed. Also fixed link.