I'm not at a system that I can test this code on, but try this out:
void clearVram(int cellOffset, int numOfCells)
{
/* vramAddress and VDCcells are global ints */
vramAddress = cellOffset<<4;
VDCcells = numOfCells;
#asm
;// In ASM, global variables are accessed with a preceding underscore.
st0 #$00
st1 #low(_vramAddress)
st2 #high(_vramAddress)
st0 #$02
lda #low(_VDCcells)
ldx #high(_VDCcells)
st1 #$00
.loop
;// Plane 0 & 1
st2 #$00
st2 #$00
st2 #$00
st2 #$00
st2 #$00
st2 #$00
st2 #$00
st2 #$00
;// Plane 2 & 3
st2 #$00
st2 #$00
st2 #$00
st2 #$00
st2 #$00
st2 #$00
st2 #$00
st2 #$00
sec
sbc #$01
bcs .loop
dex
bne .loop
#endasm
}
It doesn't stall interrupts. It clears vram on a tile basis count, and a tile offset location. Normally vram is $0000 to $8000, but using tile offsets it's $000 to $800. Typically, HuC puts tiles at vram address $1000, which it tile offset $100. If you want to clear a single sprite cell, that would be a group of 4 tiles (a tile is 32 bytes, a sprite cell is 128 bytes). For example: to clear a single 16x16 sprite at vram address $5000, then do $5000 / $10 = $500 and a sprite cell is 4 tiles, so clearVram($500, 4). To clear all of vram; clearVram($000, $800). Etc.
It's pretty rare that you'd want to "clear" a section of vram that's not tile based (the offset or segment system) - i.e. finer than 32byte segment/offset. It's also faster to do it this way too (32 bytes are clear in one loop iteration).