Almost all these effects are done just by shoving data into VRAM as fast as possible, such as using a TIA instruction to transfer data from ROM (or RAM for CD games) to VRAM. The PCE can move a lot of data so a brute force approach like this is sufficient most of the time.
In terms of programming there are a lot of optimizations you can do. The big one is storing pre-scrolled copies of the graphics so you aren't wasting time doing costly rotations on the bitplane data every frame. You only need eight copies of your tileset which isn't too bad (or more, but it depends on how far your image can scroll before repeating).
When I developed the "canyon demo" with Fragmare, I had to push a lot of data around at 60 FPS. Some of the techniques that helped included:
- Storing pre-scrolled copies of data in VRAM (not ROM/RAM), and using VRAM to VRAM DMA (faster than CPU transfer) to copy the scrolled data to a 'work buffer', which was the set of tiles that was actually displayed on the screen. It wastes some VRAM but is incredibly fast, and that's important for games where your VBlank cycles are precious.
- Making the copies twice as tall so I could start a copy and have it 'wrap' without having to break it into two smaller copy operations. This was for vertical scrolling but the same concept applies horizontally. There are wait states from accessing the VDC and VCE, so accessing them as infrequently as possible really helps (another reason to use DMA over CPU transfers too)
- Similarly, arranging BAT tile numbers so data could be loaded sequentially in long bunches instead of having to do many small transfers.
- Switching to 512-pixel mode during VBlank, which runs the VDC twice as fast as 256-pixel mode and doubles the VRAM to VRAM DMA transfer rate.
- Packing graphics data into 2 bits so one set of tiles could store two different images, in half the space. This made a tremendous amount of time savings and was really instrumental to doing full-screen column scrolling. You can do similar things with four 1-bit tiles in a single 4-bit tile. When you have a good artist (I did!) the color limitations are not a big deal and the end result still looks vivid and has that 16-bit feel.
On this subject, I think the one game that is a tad exceptional is Ninja Gaiden which has terrible choppy crappy parallax scrolling, but it does it by updating the BAT frequently and not VRAM which is unique. Kind of the wrong way to go about it.. but it's sort of charming in its own weird way. So you can update the BAT, or the tile data in VRAM.
For the Bonk transparency demo I even used dynamic tiles for sprites, instead of the background. So these techniques can be applied in a lot of ways. But in the end it's always just transferring data to VRAM as fast as possible.
Download 2 looks pretty cool BTW, I hadn't seen it before. Nice use of the hardware.