Author Topic: Faster fade out code?  (Read 3302 times)

elmer

  • Hero Member
  • *****
  • Posts: 2153
Re: Faster fade out code?
« Reply #45 on: December 01, 2016, 12:58:45 PM »
I'm thinking that this sort of stuff is so commonly needed, that it really should be built into the HuC library.

Fading down is easy ... but the fun comes when you're fading back in.  :wink:

There you need to know what the desired palette is, and have that stored in memory somewhere.

And you need the buffer where you're going to calculate what the next set of colors is that you're going to send to the VDC.

Has anyone avoid needing *both* of these "target" and "current" buffers needing to be in memory at once?

The classic "cheap" fade-down looks OK, but the same thing run in reverse (increment color component if not at target) has the effect of greying everything out, and then having the stronger colors appear later on.

It's not horrible, and I've done that before, and I believe that Arkhan does it, too.

A more sophisticated fade is nearly 3 times slower ... but that's still fast enough to calculate all 512 colors in only 2/3 of a 60Hz frame, and nobody runs a fade at 60-steps-per-second.

Does anyone have any opinion about including a decent fade in HuC?

Bonknuts

  • Hero Member
  • *****
  • Posts: 3292
Re: Faster fade out code?
« Reply #46 on: December 01, 2016, 01:42:42 PM »
So.. what are the dynamics at play in this design? Speed and memory size? Is the routine going to be an automatic thing? As in, it only takes a rate argument for fade in/out? If so, does it have complete control of the main code until it's finished? Or is it a lighter process, that only does up to 64 sets of colors per call and is divided into prep and update functions, allowing game code to run at the same time (or at least same frame)? How much memory are you going to require (important for hucard projects)? Is the work buffer user defined and passed along as a pointer (so if can be reused for something else in the project)? Or is it an internal static defined size, that takes away from ram regardless?

 I never really liked this trying to make one size fits all thing when designing libs/stuff for HuC. It'd be nice if it was something they directly included into the main source file (different small libs), than trying to attach it exiting library (in startup). Though I think doing that would require restructuring the main lib bank, and having support for bank directive directly in HuC.

DarkKobold

  • Hero Member
  • *****
  • Posts: 1200
Re: Faster fade out code?
« Reply #47 on: December 01, 2016, 02:14:08 PM »
As an update, of course my code didn't work (for the reasons Bonknuts illustrated). His did. No surprise there.

I don't think this needs to be in the core of HuC. It would serve better as example code, which someone could work to their needs.
Hey, you.

elmer

  • Hero Member
  • *****
  • Posts: 2153
Re: Faster fade out code?
« Reply #48 on: December 01, 2016, 02:34:59 PM »
So.. what are the dynamics at play in this design? Speed and memory size? Is the routine going to be an automatic thing? As in, it only takes a rate argument for fade in/out? If so, does it have complete control of the main code until it's finished? Or is it a lighter process, that only does up to 64 sets of colors per call and is divided into prep and update functions, allowing game code to run at the same time (or at least same frame)? How much memory are you going to require (important for hucard projects)? Is the work buffer user defined and passed along as a pointer (so if can be reused for something else in the project)? Or is it an internal static defined size, that takes away from ram regardless?

Good questions!  :-k

Taking control of the system would be impolite.

The goal would be to provide function calls that the HuC user can use to provide fast alternatives to writing their own code.

For example ...

void __fastcall get_colors( int *pbuffer<__td> );
void __fastcall get_colors( int index<color_reg>, int *pbuffer<__td>, unsigned char count<__tl> );

void __fastcall set_colors( int *pbuffer<__ts> );
void __fastcall set_colors( int index<color_reg>, int *pbuffer<__ts>, unsigned char count<__tl> );

void __fastcall fade_colors( int *psource<__si>, int *pdestination<__di>, unsigned char count<__al>, unsigned char fade<acc> );

Those make up a simple set of functions that do everything that DK wanted in Catastrophy, and ended up writing in either slow C code, or fast inline-assembly.

They do it fast, and they keep things flexible enough that you can use as-much or as-little resources as you need.

The "get" and "set" functions use TAI & TIA instructions for fast processing.

"count" is limited to a maximum of 128 for fast indexing

"fade" is a value 0-7.

I *think* that's enough basic functionality for the end-user to build pretty much whatever they want.

Can you think of a better *practical* design?


Quote
I never really liked this trying to make one size fits all thing when designing libs/stuff for HuC. It'd be nice if it was something they directly included into the main source file (different small libs), than trying to attach it exiting library (in startup). Though I think doing that would require restructuring the main lib bank, and having support for bank directive directly in HuC.

Making the libraries modular would be great ... but it's going to take a significant time-investment from whoever wants to do it.

Since there's no linker phase and dead-code elimination, so from what I'm seeing, HuC is pretty-much a behemoth right now.

But ... there is some argument for providing common functionality within the library itself, especially since the code that HuC generates to do the same stuff if you do it in C (like DK did for catastrophy) is going to be much larger and slower than the same code hand-written in assembly.


I don't think this needs to be in the core of HuC. It would serve better as example code, which someone could work to their needs.

It's not like a "fade" routine is an uncommon requirement.

1) Your C code is big and slow, and generates a lot of Hu6280 code that a hand-written assembly function doesn't. That's not you ... that's just HuC.

2) Have you got a fade-up working, yet?  :wink:

ccovell

  • Hero Member
  • *****
  • Posts: 2245
Re: Faster fade out code?
« Reply #49 on: December 01, 2016, 05:14:21 PM »
"fade" is a value 0-7.

I have no stake in this, but thinking down the road it might be better to add more granularity (0..15 or more) right now.  For example, many Sega games have more levels of fading by fading out Red & Green at different speeds before finally doing Blue... and it looks fantastic and far smoother than 8 steps as on the PCE.

I did something similar (using lookup tables) for my HuZero game.

touko

  • Hero Member
  • *****
  • Posts: 953
Re: Faster fade out code?
« Reply #50 on: December 01, 2016, 07:26:08 PM »
i made some fade out/in in this intro some times ago :
« Last Edit: December 01, 2016, 07:32:32 PM by touko »

elmer

  • Hero Member
  • *****
  • Posts: 2153
Re: Faster fade out code?
« Reply #51 on: December 02, 2016, 03:35:26 AM »
i made some fade out/in in this intro some times ago :

That looks nice!  :)

So what technique did you use for the processing each step of the fade up/down?


I have no stake in this, but thinking down the road it might be better to add more granularity (0..15 or more) right now.  For example, many Sega games have more levels of fading by fading out Red & Green at different speeds before finally doing Blue... and it looks fantastic and far smoother than 8 steps as on the PCE.

I did something similar (using lookup tables) for my HuZero game.

I haven't heard of that stepping technique before, it sounds interesting.

Do you have any more details?

It's trivial to switch to a 0..15 range, even if I'm only processing 8 steps, so I've done that.

I definitely agree with using a table-based approach ... it gives you the flexibility to change the tables and get a fade-to-white, or a fade-to-sepia, or to correct for any gamma differences.

For HuC, I suspect that it's just a case of the tradeoff between quality and memory usage for the tables.

I'm also limited by trying to keep compatibility with HuCard usage rather then just using self-modifying code.

Here's an implementation that uses a single 64-byte table for a simple 8-step fade ... can anyone suggest improvements?

Code: [Select]
; fade_colors(int *psrc [__si], int *pdst [__di], char count, char level)
; ----
; fade down an array of colors
; ----
; psrc:  source buffer
; pdst:  destination buffer
; count: # of colors, (0-128)
; level: level of fading (0 = black, 7 = full)
; ----
; color: color value,   GREEN:  bit 6-8
;                       RED:    bit 3-5
;                       BLUE:   bit 0-2
; ----

_fade_colors.4: asl     a                       ; 2 fade level (0-15)
                asl     a                       ; 2
                and     #$38                    ; 2
                sta     <__ah                   ; 4

                lda     <__al                   ; 4 # of colors
                beq     .l2                     ; 2
                asl     a                       ; 2

                phx                             ; 3

                ; 129 cycle inner loop.
                ; fade GREEN

.l1:            dey                             ; 2
                lda     [__si],y                ; 7 src color hi-byte
                dey                             ; 2
                lsr     a                       ; 2
                lda     [__si],y                ; 7 src color lo-byte
                iny                             ; 2
                sta     <__al                   ; 4
                rol     a                       ; 2
                rol     a                       ; 2
                rol     a                       ; 2
                and     #7                      ; 2
                ora     <__ah                   ; 4
                tax                             ; 2
                lda     fade_table,x            ; 5
                asl     a                       ; 2
                asl     a                       ; 2
                asl     a                       ; 2
                tax                             ; 2

                ; fade RED

                lda     <__al                   ; 4 src color lo-byte
                ror     a                       ; 2
                ror     a                       ; 2
                ror     a                       ; 2
                and     #7                      ; 2
                ora     <__ah                   ; 4
                sax                             ; 3
                ora     fade_table,x            ; 5
                asl     a                       ; 2
                asl     a                       ; 2
                asl     a                       ; 2
                tax                             ; 2

                cla                             ; 2
                rol     a                       ; 2
                sta     [__di],y                ; 7 dst color hi-byte
                dey                             ; 2

                ; fade BLUE

                lda     <__al                   ; 4 src color lo-byte
                and     #7                      ; 2
                ora     <__ah                   ; 4
                sax                             ; 3
                ora     fade_table,x            ; 5
                sta     [__di],y                ; 7 dst color lo-byte
                cpy     #0                      ; 2
                bne     .l1                     ; 4

                plx                             ; 4
.l2:            rts                             ; 7

fade_table:     .db     0, 0, 0, 0, 0, 0, 0, 0
                .db     0, 0, 0, 0, 1, 1, 1, 1
                .db     0, 0, 1, 1, 1, 1, 2, 2
                .db     0, 0, 1, 1, 2, 2, 3, 3
                .db     0, 1, 1, 2, 2, 3, 3, 4
                .db     0, 1, 1, 2, 3, 4, 4, 5
                .db     0, 1, 2, 3, 3, 4, 5, 6
                .db     0, 1, 2, 3, 4, 5, 6, 7

touko

  • Hero Member
  • *****
  • Posts: 953
Re: Faster fade out code?
« Reply #52 on: December 02, 2016, 04:23:39 AM »
Quote
So what technique did you use for the processing each step of the fade up/down?
The simpliest, add/sub 1 for each RGB componant .

elmer

  • Hero Member
  • *****
  • Posts: 2153
Re: Faster fade out code?
« Reply #53 on: December 02, 2016, 05:11:12 AM »
Quote
So what technique did you use for the processing each step of the fade up/down?
The simpliest, add/sub 1 for each RGB componant .

Well, the fade-down is easy ... but what did you do for the fade-up?  :-k

Are you using the simple "add 1 if not at target" for each component?

That tends to grey things out a little during the fade-up, like this ...

Target GRB : 456

Step 0 GRB : 000
Step 1 GRB : 111
Step 2 GRB : 222
Step 3 GRB : 333
Step 4 GRB : 444
Step 5 GRB : 455
Step 6 GRB : 456


It's not a bad effect, and most people don't notice/care about it.

Just curious.


I have no stake in this, but thinking down the road it might be better to add more granularity (0..15 or more) right now.  For example, many Sega games have more levels of fading by fading out Red & Green at different speeds before finally doing Blue... and it looks fantastic and far smoother than 8 steps as on the PCE.

I did something similar (using lookup tables) for my HuZero game.

I haven't heard of that stepping technique before, it sounds interesting.

Do you have any more details?

I presume that you're talking about taking advantage ot the human eye's perception of brightness.

The RGB to Y (brightness) formula is ...

Y = 0.299R + 0.587G + 0.114B

So, to reduce percieved brightness, you need to remove more of the green than you do of the blue.

From a practical implementation POV, do you mean something like this?  :-k

It changes things to a (0..17) range instead of (0..15).

This would provide a sort-of-half-step in the color transition, and delay the blue and red component fades.

fade_table_g:   .db     0, 0, 0, 0, 0, 0, 0, 0
                .db     0, 0, 0, 0, 0, 0, 0, 0
fade_table_r:   .db     0, 0, 0, 0, 0, 0, 0, 0
fade_table_b:   .db     0, 0, 0, 0, 0, 0, 0, 0
                .db     0, 0, 0, 0, 1, 1, 1, 1
                .db     0, 0, 0, 1, 1, 1, 1, 1
                .db     0, 0, 1, 1, 1, 1, 2, 2
                .db     0, 0, 1, 1, 1, 2, 2, 2
                .db     0, 0, 1, 1, 2, 2, 2, 3
                .db     0, 0, 1, 1, 2, 2, 3, 3
                .db     0, 1, 1, 2, 2, 3, 3, 4
                .db     0, 1, 1, 2, 2, 3, 3, 4
                .db     0, 1, 1, 2, 3, 3, 4, 4
                .db     0, 1, 1, 2, 3, 3, 4, 5
                .db     0, 1, 2, 2, 3, 4, 5, 5
                .db     0, 1, 2, 2, 3, 4, 5, 6
                .db     0, 1, 2, 3, 4, 4, 5, 6
                .db     0, 1, 2, 3, 4, 5, 6, 7
                .db     0, 1, 2, 3, 4, 5, 6, 7
                .db     0, 1, 2, 3, 4, 5, 6, 7
                .db     0, 1, 2, 3, 4, 5, 6, 7


Or just this easier-to-read version with step (0..9) ...


fade_table_g:   .db     0, 0, 0, 0, 0, 0, 0, 0
fade_table_r:   .db     0, 0, 0, 0, 0, 0, 0, 0
fade_table_b:   .db     0, 0, 0, 0, 0, 0, 0, 0
                .db     0, 0, 0, 0, 1, 1, 1, 1
                .db     0, 0, 1, 1, 1, 1, 2, 2
                .db     0, 0, 1, 1, 2, 2, 3, 3
                .db     0, 1, 1, 2, 2, 3, 3, 4
                .db     0, 1, 1, 2, 3, 4, 4, 5
                .db     0, 1, 2, 3, 3, 4, 5, 6
                .db     0, 1, 2, 3, 4, 5, 6, 7
                .db     0, 1, 2, 3, 4, 5, 6, 7
                .db     0, 1, 2, 3, 4, 5, 6, 7

« Last Edit: December 02, 2016, 05:28:08 AM by elmer »

touko

  • Hero Member
  • *****
  • Posts: 953
Re: Faster fade out code?
« Reply #54 on: December 02, 2016, 07:14:23 AM »
Quote
but what did you do for the fade-up?
You start with an entire black palette, and you add 1 for each component until you reach the good palette .
in fact for fade in and fade out i have a palette for reference to reach (black for a fade out, and the object's palette for fade in) ,i read directly the corresponding colors in the VCE,i make the fade and store it in a buffer(for sending later with TIA) .
I have a 256 bytes buffer for fading multiple palettes at same time .

Quote
It's not a bad effect, and most people don't notice/care about it.
You're right, but is not noticeable as long as your fade is fast enough.
I think for best result, you must nomalise all your RGB component for each color first to avoid a dominant color at the end of fade .
EG : reaching a 444 or 222,333 and start the add/sub after that,and you end with 0 for each component . .
« Last Edit: December 02, 2016, 07:23:59 AM by touko »

Bonknuts

  • Hero Member
  • *****
  • Posts: 3292
Re: Faster fade out code?
« Reply #55 on: December 02, 2016, 08:24:36 AM »
I personally like the idea of the fade table, for speed reasons. As in, fade is actually just a "brightness" (loosely termed) state of the palette, and fade is the transition from one level of brightness to another over time.

 But of course, I'd say make this not a built in library function - but something the programmer can just include. I mean, there's no critical reason why it should be the very main bank of the main lib - so having it as a function with ASM inside of it, is no slower than the far call to the far end of the main lib. Speaking of which, there's probably a good number of stuff that probably should be in the main lib bank to begin with. And some stuff could easily be moved to include-able functions. I'm going to look into this as soon as winter break starts.

ccovell

  • Hero Member
  • *****
  • Posts: 2245
Re: Faster fade out code?
« Reply #56 on: December 02, 2016, 09:34:06 AM »
My table was a wasteful 512 bytes, mapping all 512 colours to the next step down... so it's a fadeout routine only.  Anyway, the code (minus the table):
Code: [Select]
Fade_Down: ;Fades a specified palette down 1 step!
;A = Palette entry (0,$10,$20,$30...)
;X = 0/1 = BG or Sprite
;-------------------------------------------------------
;copy our specified palette from VCE to RAM
sta $0402       ;Point to colours
stx $0403
pha
phx
phy
TAI $0404,temp_pal,32
;----------
clx
.fade_loop:
lda temp_pal,X
tay
lda temp_pal+1,X ;(MSB is 0 or 1)
and #1 ;All other bits were set in VCE.
beq .lopal
;MSB was high; (leave it as-is...)
lda PALFADE1HiTblLSB,Y ;Get LSB
sta temp_pal,X
cpy #64 ;64th entry and up, MSB=1
bcs .next_entry
.zeromsb:
stz temp_pal+1,X
bra .next_entry
;----------
.lopal: ;MSB will always be zero anyway
lda PALFADE1LoTblLSB,Y ;Get LSB
sta temp_pal,X
.next_entry:
inx
inx
cpx #32
bne .fade_loop
;--------
;now copy RAM back to VCE
ply
plx
pla
sta $0402       ;Point to colours
stx $0403
TIA temp_pal,$0404,32
; A, Y, and X should be preserved here.
rts

Bonknuts

  • Hero Member
  • *****
  • Posts: 3292
Re: Faster fade out code?
« Reply #57 on: December 02, 2016, 09:37:03 AM »
So.. maybe this would be helpful?

The call code...
Code: [Select]
      ldy iterations 
      ldx #low(xfer_source)
      clc
      jsr xfer_ZP

The self modifying code sitting in... Zeropage!
Code: [Select]
xfer_entry:
.loop
      tia source,dest,num
      set                               ;2
      adc #$nn                          ;5 (RMW+T)
    bcc .skip                           ;4:2
      inc <low((.loop & 0xff)+2)+1      ;6
      clc                               ;2
.skip
      dey                               ;2
      bne .loop                         ;4
    rts                                 ;7

xfer_source = (.loop & 0xff) + BASE_ZP + 1
xfer_dest = (.loop & 0xff) + BASE_ZP + 3
xfer_num = (.loop & 0xff) + BASE_ZP + 5     
xfer_ZP = (.loop & 0xff) + BASE_ZP 
Of course, it needs to be copied at least once to ZP buffer. That's what all the address translations are for via the equates.

elmer

  • Hero Member
  • *****
  • Posts: 2153
Re: Faster fade out code?
« Reply #58 on: December 03, 2016, 06:26:44 AM »
Of course, it needs to be copied at least once to ZP buffer. That's what all the address translations are for via the equates.

Yes, I do like using a self-modifying Txx instruction in ZP.  :wink:

This is what I've got, which uses HuC's __fastcall convention to have the compiler-itself set up the ZP locations that it can ...

Code: [Select]
; --------
; Alternate names when the parameter-passing area is used for
; a self-modifying Txx instruction.
;

__tc    = $20F8
__ts    = $20F9
__td    = $20FB
__tl    = $20FD
__tr    = $20FF

; set_colors(int *pbuffer [__ts] )
; set_colors(int index [color_reg], int *pbuffer [__ts], int count [__tl] )
; ----
; index:   index in the palette (0-511)
; pbuffer: source buffer
; count:   # of colors, (1-512)
; ----

_set_colors.1:  stz     color_reg_l
                stz     color_reg_h

                stz     <__tl+0
                lda     #>512
                sta     <__tl+1

_set_colors.3:  lda     #$E3 ; TIA
                sta     <__tc
                lda     #$60 ; RTS
                sta     <__tr
                lda     #<color_data
                sta     <__td+0
                lda     #>color_data
                sta     <__td+1
                asl     <__tl+0
                rol     <__tl+1
                jmp     __tc

Bonknuts

  • Hero Member
  • *****
  • Posts: 3292
Re: Faster fade out code?
« Reply #59 on: December 03, 2016, 08:46:41 AM »
Yeah, classic Txx in ram setup.
 
But what I posted was a "safe" version. So that you don't delayed interrupts (in this case, since it's in vblank, the TIRQ routine), through smaller transfers and a small/fast iteration overhead.

 It can also work for active video, with scanline interrupts and TIMER interrupts all firing like mad. The sample playback might be a little bit of jitter, but nothing Genesis cringe worthy. And the VDC buffer for next line should be able to absorb any delay (as long as the routine is tight). The best of all words: TIMER, H-int, and Txx availability. And the cost, if you did 32byte transfers, is 8cycles per byte instead of the 7cycles per byte. It makes Txx more usable IMO. Plus, it's a chance to use the T flag... who doesn't like using the T flag???