Author Topic: Hardware tips and tricks  (Read 648 times)

Bonknuts

  • Hero Member
  • *****
  • Posts: 3292
Hardware tips and tricks
« on: December 30, 2013, 09:49:45 AM »
I have a decent amount of tricks for the PCE, some known - some not.

 I figure'd maybe I should share some with the community.

 Here's one for transparency effect on the PCE. Pretty easy. It's similar to what the Sonic games do on the Genesis. They flash interleaving sprites to had the garbage pixels on that scanline, when they update color ram. The PCE also gets garbage pixels, but they are in the form of short stretched pixel lines going across the screen. There's a way to hide this transition.

 The method works as such:

Say you have a water transition line. For the sake of setup, let's say that water below the line is going to be a blue hue-ish color/transparency. The first thing you do, is make sprite color #0 a nice blue color, relative to this under water scheme. Next, you reserve a single row in the tilemap with solid color tiles. These tiles are to make a solid line, and the same color as you did for sprite color #0. Now, for the transition line - you do an hsync interrupt, reposition the tilemap to that solid color area, then disable sprites. That'll make sure that nothing but this line will show.

 Now comes the tricky part. There are two ways to update CRAM (vce); one is to use TIA to block copy the new colors. This will delay interrupts of the TIMER interrupt, if you're playing samples. I doubt a delay this short will even be audible, but if you're that picky - you can use the alternate method. You reserve a chunk of ram, and setup a string of code that LDA #$nn, STA port. With an RTS at the end. It's slightly slower (14 cycles to copy a WORD vs 12 cycles of TIA). Ok, so it's not actually that tricky ;)

 You can easily fit ~64 color updates (4 subpalettes for sprites) in this time. Right at the end, you re-enable sprites and reposition the tilemap. All of this needs to be done fairly quickly, so do all of your calculations before hand.

 What you don't need to do, is update ~any~ colors in the BG subpalettes. The PCE has plenty of those and you can easily make duplicates with the alternate coloring for them. You also don't need two tilemaps either. You just need a redundant copy of the tile row (of the tilemap) that the transition line is on - with the alter water color. Just ~that~ row. Everything below that should have already been changed from the previous vblank call in preparation for this, if needed (i.e. your water line moves up and down on the screen).

 The reason why sprite color #0 needs to be the same color as the BG transition line, is that updating color ram (VCE) stretches pixels. Since they are all the same color, you can't see this pixel stretching and all you see the solid color line. The border outside of the BG area (including left/right sides), is filled with sprite color #0. You don't need to update this every frame, unless you're doing color cycling on the transparency (going through hues). And you don't need to do this during hsync; doing so during the previous vblank is good enough. That saves one less color to update for the sprite palette anyway, during the active screen update. I hope that makes sense. Sure, you only have 4 sprite subpalettes for this design (for the main character and any enemies that transition this line). Of course, any objects or enemies that remain above or below this line - don't need to fall into this 4 subpalette group; they can remain fixed in their subpalette. This trick uses two scanlines total. So two solid lines appear as the transition.


Possibly make this a sticky?
« Last Edit: December 30, 2013, 09:55:29 AM by Bonknuts »

fragmare

  • Hero Member
  • *****
  • Posts: 676
Re: Hardware tips and tricks
« Reply #1 on: December 30, 2013, 11:19:22 AM »
I'll contribute one.

Since the PCE's background tiles have a planar 4-bit per pixel (16 color) setup, you can instead draw a tile using only 2-bits per pixel (4 colors) and effectively store 2 2bpp tiles in the memory space 1 4bpp tile would normally take up.  Similarly, you can also fit four 1-bit per pixel (2 color) tiles in the same way.

This comes in especially handy when animating tiles for parallax effects and cutting down on VRAM usage.  In the GIF below, this entire background plane is really scrolling at the same speed.  The "hotter" tiles appear to be scrolling slower and also wiggling because they're using the aforementioned trick.



Here is a demo that Charles MacDonald and I worked on showing the level 3 column scrolling effect from MUSHA on the Genesis, as done on the PC-Engine.  Use Mednafen or a flash card on real hardware to view it properly.  Ootake doesn't seem to like it.

http://fragmare.mindrec.com/canyon8.pce

Punch

  • Hero Member
  • *****
  • Posts: 3278
Re: Hardware tips and tricks
« Reply #2 on: December 30, 2013, 11:43:46 AM »
I don't understand, the displayed tiles on some portions of the screen are displaced in VRAM by writing 2~1 bpp tiles in VRAM (reducing PPU writing time) or it has something to do with palette swap? I'm really confused.

Bonknuts

  • Hero Member
  • *****
  • Posts: 3292
Re: Hardware tips and tricks
« Reply #3 on: December 30, 2013, 12:12:52 PM »
I don't understand, the displayed tiles on some portions of the screen are displaced in VRAM by writing 2~1 bpp tiles in VRAM (reducing PPU writing time) or it has something to do with palette swap? I'm really confused.

 It saves on cart space (or CD ~RAM~ if a cd project), because the tiles only need to be store in 2bpp format. It saves on vram writing time, since you only need to write 2bpp and not the full 4bpp - cutting writing time/bytes in half. It saves on vram, because of how the PCE 4bpp tiles are actually a composite group of two 2bpp tiles (that's because the VDC can be used to display just 2bpp tiles, like the SNES). Saving on vram requires the subpalette to be setup in a special order of colors. You will only get 8 colors total out of the 16 - four for each 2bpp tile, but 16 are needed in the subpalette. One subpalette if the dynamic tiles will 'animate' behind the second one - allowing such transitions that are not straight edged (the 2nd stage of Ninja Spirit does this, to get the pseudo BG layer to animate behind the leaves of the tree). The other benefit, if you don't want to do the Ninja Spirit method, requires two subpalettes - so that you can use each 2bpp 'layer' as a specific tile (you won't see the other tile colors because of how the subpalette colors are ordered). That's the method Frag is talking about. It allows you to treat each composite tile as two different ones, thus allowing more vram tiles to 'animate'. All the while the VDC is still in 4bpp mode. And, you still have other parts of the BG (using either one of these methods mentioned) as using normal 4bpp tiles.

 You can do 1bpp stuff as well (mixing, in fact) by using similar subpalette setup. 1bpp also saves on cart space and vram writes - because of an exploit that write once to the LSB of the vram port, and everything else to the MSB port. This doesn't save vram usage (over using 2bpp tiles), but it does save on cart and vram write bandwidth. You can mix and match both methods for 1bpp+2bpp, as I described above.

 The benefit of all of this, is that you still have 16 subpalettes to spread some color around for the BG areas. So you can hide the color limitations of this trick fairly well. And other parts of the BG using the full/normal 4bpp tiles.
« Last Edit: December 30, 2013, 12:17:50 PM by Bonknuts »

Sadler

  • Hero Member
  • *****
  • Posts: 1065
Re: Hardware tips and tricks
« Reply #4 on: December 30, 2013, 01:24:26 PM »
Awesome thread, I really enjoy reading this kind of thing. Any tips for varying the background color per scan line for another layer? I'm pretty sure this is how MC does the farthest layer in stage one.

Probably obvious, but horizontal slabs can be used for another layer and can be done in huc. See Martial Champion.

I'd love to hear about more advanced techniques. Swapping palettes for transparency is a great example. How about efficient line drawing algorithms without direct pixel access? Or scaling/rotation? Am I too dumb to realize how LUTs might work for the PCE?

Bonknuts

  • Hero Member
  • *****
  • Posts: 3292
Re: Hardware tips and tricks
« Reply #5 on: December 30, 2013, 02:16:57 PM »
Awesome thread, I really enjoy reading this kind of thing. Any tips for varying the background color per scan line for another layer? I'm pretty sure this is how MC does the farthest layer in stage one.

 I'm pretty sure MC (in that video, first stage), is just swapping the BG position to a different point in the tilemap (unseen area). Possibly with sync effects to do some 'linescrolling' here and there.

 I think the effect you're talking about goes like this:
BG Color #0 -> sprite low priority -> BG tilemap -> Sprite high priority
That's the display from back to front. Since BG color #0 is its own layer, if you will, you can modify it with an Hsync interrupt (just modify BG color #0). This will allow you to draw simple shapes, but you can 'scroll' those shapes independently of the tilemap. The common effect used to make fixed gradients or bars. But if you sprinkle a few low priority sprites along with it, you can connect those 'bars' and make a grid looking BG's and such.


Quote
How about efficient line drawing algorithms without direct pixel access?
That's tricky ;)

Sadler

  • Hero Member
  • *****
  • Posts: 1065
Re: Hardware tips and tricks
« Reply #6 on: December 30, 2013, 02:33:32 PM »
Awesome thread, I really enjoy reading this kind of thing. Any tips for varying the background color per scan line for another layer? I'm pretty sure this is how MC does the farthest layer in stage one.

 I'm pretty sure MC (in that video, first stage), is just swapping the BG position to a different point in the tilemap (unseen area). Possibly with sync effects to do some 'linescrolling' here and there.

 I think the effect you're talking about goes like this:
BG Color #0 -> sprite low priority -> BG tilemap -> Sprite high priority
That's the display from back to front. Since BG color #0 is its own layer, if you will, you can modify it with an Hsync interrupt (just modify BG color #0). This will allow you to draw simple shapes, but you can 'scroll' those shapes independently of the tilemap. The common effect used to make fixed gradients or bars. But if you sprinkle a few low priority sprites along with it, you can connect those 'bars' and make a grid looking BG's and such.

Just realized the ambiguity. :D Magical Chase does 3 layers in the first level and I'm pretty sure it's using the background color for the back layer. There's definitely some line scrolling going on there as well, including the Martial Champion style slab.

Quote
Quote
How about efficient line drawing algorithms without direct pixel access?
That's tricky ;)

I can't tell if that means "Sadler, you're retarded" or "I've got a few tricks up my sleeve". :) Either way, I'd love to read your thoughts.

Bonknuts

  • Hero Member
  • *****
  • Posts: 3292
Re: Hardware tips and tricks
« Reply #7 on: December 30, 2013, 03:13:10 PM »
Just realized the ambiguity. :D Magical Chase does 3 layers in the first level and I'm pretty sure it's using the background color for the back layer. There's definitely some line scrolling going on there as well, including the Martial Champion style slab.

 Ohh, Magical Chase. Yeah, that's the exact effect.

Quote
I can't tell if that means "Sadler, you're retarded" or "I've got a few tricks up my sleeve". :) Either way, I'd love to read your thoughts.

 "I've got a few tricks up my sleeve" ;) The trick is fairly new, so I don't want to spill the beans just yet.

touko

  • Hero Member
  • *****
  • Posts: 953
Re: Hardware tips and tricks
« Reply #8 on: December 30, 2013, 09:52:06 PM »
excellent thread, thanks for sharing ..

I have one, but not hardware trick ..
When i wrote my SGX functions, especialy for sprites,  i was amased by how many cycles can take a SATB copy to vram ,which is also a SATB in fact, the true SAT is in the VDC  ..
The standard procedure is to have a SAT buffer in ram who'is copied in VRAM each frame, and VRAM SATB who's copied each frame to VDC SAT ...

This process occur every frame, even sprites no need to be updated,and consume at best 3584+ cycles for PCE and 7000+ for SGX ..
This cycles are of course if you use a tia transfert, the huc satb_update routine is more slower .

My point is to write directly the sprites attribute in VRAM, and you update only what has changed .
for proper sprites update you need to disable the auto SAT update in each VDC, and lunch it manualy before a vsync ..
This is what i did for my SGX shoot, and it works perfectly .

ccovell

  • Hero Member
  • *****
  • Posts: 2245
Re: Hardware tips and tricks
« Reply #9 on: December 30, 2013, 11:32:32 PM »
This cycles are of course if you use a tia transfert, the huc satb_update routine is more slower .

Kinda related to this: you can always split up sprite update or VRAM-writing code between successive VBlanks if your game isn't a fast action game.

Also kinda related: in MagicKit (dunno about HuC) Hsync is set on by default and some program frameworks trigger an HSync each scanline.  If you turn Hsync routines off (and the Hsync flag in the VDC) you can gain much more CPU time for your own program during screen redraw.

Bonknuts

  • Hero Member
  • *****
  • Posts: 3292
Re: Hardware tips and tricks
« Reply #10 on: December 31, 2013, 05:05:27 AM »
In relatetion to what touko said, I've setup SATB in ram in a special way. After coding for the Genesis for a bit, I really liked the link-list feature it has for sprites. So I tried to emulate something along those lines on the PCE.

 I have all 64 SATB entries in local ram. Every entry is actually embedded opcodes: st1/st2. At the end of the single SATB entry, is a RTS. I have a simple byte table that I used to reorder individual SATB entries, and a jump table. You have a little added overhead, of the jump table and rts, but you also have a faster transfer to vram that TIA (TIA is 7 cycles byte while STx opcode is 5 cycles) and you don't have the 17 cycle call overhead of Txx opcode. So basically, the value in the order table (assuming it's not a termination/skip code) is used as the index into the jump table.

 I also speed this up, by having a code list table that I jump to - clear the rest of the sprites. But I don't update every entry in the SATB in vram, to clear them. I just update the Y position; that removes them from the screen and doesn't effect the scanline pixel limit. It's quick and saves cycles. So the embedded opcodes in local ram is fast for transfers and only updating Y position is fast for removing sprites. Plus, reordering sprites becomes fairly quick. That down side is the little over double SATB space needed in local ram, and of course the added complexity of accessing it in local ram (due to the st- opcodes and rts).

This process occur every frame, even sprites no need to be updated,and consume at best 3584+ cycles for PCE and 7000+ for SGX ..
This cycles are of course if you use a tia transfert, the huc satb_update routine is more slower .

 Yeah, they should have added a DMA on the VPC. /RDY is there on the board, to stall the processor. Shame really, considering the VPC could easily ready two bytes from ram and write to either VDC port in about 5 cycles (it's own cycles, dictated by the VCE). That would be twice as fast as TIA.
« Last Edit: December 31, 2013, 05:11:41 AM by Bonknuts »

touko

  • Hero Member
  • *****
  • Posts: 953
Re: Hardware tips and tricks
« Reply #11 on: December 31, 2013, 05:56:00 AM »
yes a RAM -> VRAM dma would be better, but writing directly in vram is easy and fast.
You can use the classic SATB in ram with his 64 entry and affect sprites for desired SGX VDC  .

fragmare

  • Hero Member
  • *****
  • Posts: 676
Re: Hardware tips and tricks
« Reply #12 on: January 25, 2014, 01:13:26 AM »
I really can't believe this thread has not been stickied yet.  Joe?  nat?  Keranu?

Bonknuts

  • Hero Member
  • *****
  • Posts: 3292
Re: Hardware tips and tricks
« Reply #13 on: January 25, 2014, 04:25:22 AM »
I'm curious as to how many of you guys setup your homebrew projects to run game logic from vblank. By that I mean, the game logic part gets called from a vblank hook, rather than just waiting for a vblank flag. Some games put the game logic call in vblank, specifically so you can run another processing.. 'thread' for a lack of a better work. Maybe 'process' is a better term. I.e. you can call a background process to build some sprites (composite them or scale them, etc) while the main code continues on which any impact (since it has priority over the background process). I've seen this done on NES games, as well as PCE games. Probably not uncommon for Genesis and SNES as well. The background process can do anything you want; refresh vram tiles/sprites; decompress data, composite/build animation frames, etc. They key is just knowing how long the background process is gonna take, so you can call it ahead of time.

 One example is Gate of Thunder. It calls a sprite decompression background process while you're playing the level, in preparation for up coming enemies in the level. It doesn't cause any slowdown because it's a background process. It wouldn't be too hard to extend this idea to more than just one background process. Though you'd probably need something like the TIMER IRQ to time slice between different background processes.

touko

  • Hero Member
  • *****
  • Posts: 953
Re: Hardware tips and tricks
« Reply #14 on: January 25, 2014, 05:04:45 AM »
i'am classic, score, sound ,hud update, etc ... for me ..
Game logic is out of vblank .
« Last Edit: January 25, 2014, 05:06:20 AM by touko »