Well, the low res mode has scanline width of ~341 pixels. That doesn't fit nicely into any of the VDC pixel reg layouts, so that's what HSW is for. It waits for VCE interrupt and as soon as that happens, the VDC moves into HDS phase (HSW window ends early). It's a time out window. This is how it works with external sync. It's the same way for vertical sync.
But what happens if the VCE doesn't generate a hsync signal to the VDC when it's waiting for it? HSW times out and it goes into HDS phase anyway. But when the VCE asserts hsync to the VDC, regardless of what phase it's in, the VDC automatically starts the next line process.
So this is what I meant, that if HDE is a big part of sprite fetching pixel data, then the VDC line actually starts at HDE, HSW, HDS, and then HDW. In that order. And not HSW, HDS, HDW, and finally HDE. Because if it did, and HDE is part of that sprite pixel fetch process, then my 16x8 sprite cell trick wouldn't work at all. As in, it wouldn't show all 16 cells. But in fact it does. It's hard to make out those sprites,
, but those are eight 32 wide sprites - just half height. Not to mention other games that do "4 window mode" via cheat codes (useless, but looks cool).
The 16x8 cell trick is different than full scanline/sprite line skip method (with similar timings), but I discovered that VDC preps for sprite lines and BG lines at different parts of the hblank area. If the spite line prep starts (some internal counter), but the VCE sends hsync to the VDC before it begins BG line prep, then only the sprite line is skipped and the BG is show normally. This has the effect of sprites showing with every other line missing. It also has the effect the D0 of the Y position selects whether to show even or odd lives of a sprite; basically meaning sprite data in vram is now interleaved.
So, this begs the question.. where in the VDC scanline is the cpu triggered from the interrupt on the VDC side? Because it's the VDC that sends the interrupt to the processor. In my above example, you could do double interrupts per scanline. I suspect when HDE starts, is when the processor receives the interrupt. Visually though, the VCE delays the VDC's output (I think it's by 8 or 16 pixels). So trying to do visual tests probably won't work. It needs dual channel scope.