Author Topic: Faster fade out code?  (Read 3314 times)

Gredler

  • Guest
Re: Faster fade out code?
« Reply #30 on: October 06, 2016, 09:56:06 AM »
I feel the need to post something to make DK feel like less of an idiot.

Can't you just store all the palettes to an array, then create a loop that lerps from black to the array color or the array to black?

#Python
#C#
#Scripting

TheOldMan

  • Hero Member
  • *****
  • Posts: 958
Re: Faster fade out code?
« Reply #31 on: October 06, 2016, 10:33:34 AM »
Quote
First, why the store zero in $402 and $403?

Select palette slot 0

Quote
Second, if I want to split this into chunks of 64, how do I tell it $404 plus 64 etc

You don't use $404+anything. $404/$405 is the color.
You set $402/$403 to slot number you want. (0, 64, 128. etc)

In general you set the starting slot # in $402/$403, and send the colors to $404/$405.
The slot increments after every write to high byte (I think), so you can just loop through the
palette and pump them out. If you want to break it into 64 color chunks, write the first 64 colors,
then the next 64 colors, etc.

You can set the slot number to start an any slot, evn in the middle of a palette, afaik.


Quote
Can't you just store all the palettes to an array, then create a loop that lerps from black to the array color or the array to black?

Yes. But that won't update the palettes until you send it to the vce.
If you really want to, you can read the color from the vce, fade it, and write it back, without any intermediate array. It's just quicker to use an array. (Less overhead)

Bonknuts

  • Hero Member
  • *****
  • Posts: 3292
Re: Faster fade out code?
« Reply #32 on: October 06, 2016, 04:23:04 PM »
I think the best way for optimising your routine is doing the fade in a buffer for all palettes and transfer all the buffer after a vsync in asm with a tia bloc transfer .

like that:

/* A 256 bytes buffer is enough for 8 palettes */
int my_buffer[1024];

#asm

   stz $402
   stz $403

   tia _my_buffer , $404 , 1023

#endasm

My fade routine is close to yours, and is very fast,but in ASM .




So, I'm really confused.

First, why the store zero in $402 and $403?

Second, if I want to split this into chunks of 64, how do I tell it $404 plus 64 etc?

This is why I hate assembly. I know, I'm definitely the idiot of the thread.

 It's not assembly, but direct hardware interfacing. You could access the ports directly in C.

 $402/$403 make up a single port to the VCE color # or slot. Because there are 512 color slots in the VCE, it's larger than a 8bit value for a single port to handle. So the 16bit port is spread across two 8bit ports; 0x402 is the LSByte and 0x403 is the MSByte.

 VCE and VDC ports tend to have what is known as "latch" system. This means when you write the upper address of a 16bit port, it triggers the transferring the contents of the two ports to the internal place it needs to go (be it VCE or VDC).

 In the case of the VCE, 0x402 is the LSB, and 0x403 is the MSB and latch. Once 0x403 is written to, the contents are transferred to whatever reg internal to the VCE. But not until that latch port is accessed - so the order of port pair access if very important.

 On the VCE, here are some ports:
 0x402/0x403 = is the color slot you want to update.
 0x404/0x405 = the color value to update on the corresponding color slot.

 One other thing to note: While you can constantly tell the VCE what specific color you want to update, it does have an "auto increment" internal mechanism that automatically advances to the next color slot after a successful write/update (i.e. latch port). Same with reading color data from the VCE.

DarkKobold

  • Hero Member
  • *****
  • Posts: 1200
Re: Faster fade out code?
« Reply #33 on: November 01, 2016, 05:55:13 PM »
int my_buffer[64];

fade_out()
{
   
   int i, clr;
   char j,k;
   for (j = 0; j <8; j++)
   {
      
      #asm

            stz $402
            stz $403
         #endasm
      for (i=0;i<512;i+=64)
      {
         for (k=0; k<64; k++)
         {
            clr = get_color(i+k);
            if (clr&7) clr = clr - 1;
            if (clr&56) clr = clr - 8;
              if (clr&448) clr = clr - 64;
              my_buffer[k] = clr;                  
           }
          vsync();       
       #asm   
          tia _my_buffer , $404 , 64
       #endasm
       }
   }   
     cls();
     reset_satb();
     satb_update();
     vsync();
}   


So, here's my attempt to split it into 64 color chunks. It... fails. Miserably. I'd assume its latching fine, and should store the current state of the latches, through each vsync.
Hey, you.

TheOldMan

  • Hero Member
  • *****
  • Posts: 958
Re: Faster fade out code?
« Reply #34 on: November 01, 2016, 06:17:33 PM »
Quote
int my_buffer[64];
.
.
.
 tia _my_buffer , $404 , 64

Ints are 2 bytes. You're only transferring 32 'colors'. Try using 128 as length

Also, be careful mixing ints and chars. HuC doesn't promote chars to ints.

touko

  • Hero Member
  • *****
  • Posts: 953
Re: Faster fade out code?
« Reply #35 on: November 01, 2016, 10:11:18 PM »
And be careful, you have only one pointer in the VCE's hardware color table .
you must select color entry for writing AND for reading(except if you want to read a pallet and write the next one) like that :

         #asm
          ; // May be not needed here because get_color() select the pallet entry every time.
            stz $402
            stz $403

         #endasm
      for (i=0;i<512;i+=64)
      {
         for (k=0; k<64; k++)
         {
            clr = get_color(i+k);
            if (clr&7) clr = clr - 1;
            if (clr&56) clr = clr - 8;
              if (clr&448) clr = clr - 64;
              my_buffer[k] = clr;                 
           }
          vsync();       
       #asm
          stz $402
          stz $403

          tia _my_buffer , $404 , 64
       #endasm

Else you read in pallet  0, and write in pallet 2,as you did .
« Last Edit: November 01, 2016, 10:21:15 PM by touko »

DarkKobold

  • Hero Member
  • *****
  • Posts: 1200
Re: Faster fade out code?
« Reply #36 on: November 02, 2016, 01:20:20 PM »
for posterity, here is the final function. I will be testing it on hardware shortly. Whoever sees this in the near or far future, feel free to use it.


fade_out()
{
   
   int i, clr;
   char j,k;
   for (j = 0; j <8; j++)
   {         
      for (i=0;i<512;i+=64)
      {
         for (k=0; k<64; k++)
         {
            clr = get_color(i+k);
            if (clr&7) clr = clr - 1;
            if (clr&56) clr = clr - 8;
              if (clr&448) clr = clr - 64;
              my_buffer[k] = clr;               
           }
          vsync();       
          clr = get_color(i-1);
       #asm   
          tia _my_buffer , $404 , 128
       #endasm
       }
   }   
     cls();
     reset_satb();
     satb_update();
     vsync();
}   


EDIT: Oh, and a huge thanks to everyone for working with me. I'm glad that a simple fade didn't kill catastrophy.
Hey, you.

Gredler

  • Guest
Re: Faster fade out code?
« Reply #37 on: November 02, 2016, 01:34:20 PM »
for posterity, here is the final function. I will be testing it on hardware shortly. Whoever sees this in the near or far future, feel free to use it.


fade_out()
{
   
   int i, clr;
   char j,k;
   for (j = 0; j <8; j++)
   {         
      for (i=0;i<512;i+=64)
      {
         for (k=0; k<64; k++)
         {
            clr = get_color(i+k);
            if (clr&7) clr = clr - 1;
            if (clr&56) clr = clr - 8;
              if (clr&448) clr = clr - 64;
              my_buffer[k] = clr;               
           }
          vsync();       
          clr = get_color(i-1);
       #asm   
          tia _my_buffer , $404 , 128
       #endasm
       }
   }   
     cls();
     reset_satb();
     satb_update();
     vsync();
}   


EDIT: Oh, and a huge thanks to everyone for working with me. I'm glad that a simple fade didn't kill catastrophy.

That would have been catastrophic

touko

  • Hero Member
  • *****
  • Posts: 953
Re: Faster fade out code?
« Reply #38 on: November 02, 2016, 09:18:20 PM »
if you want your routine more faster you can translate this in assembly

Quote
for (j = 0; j <8; j++)
   {         
      for (i=0;i<512;i+=64)
      {
         for (k=0; k<64; k++)
         {
            clr = get_color(i+k);
            if (clr&7) clr = clr - 1;
            if (clr&56) clr = clr - 8;
              if (clr&448) clr = clr - 64;
              my_buffer[k] = clr;               
           }
This loop is slow as hell .
« Last Edit: November 03, 2016, 10:20:59 PM by touko »

Bonknuts

  • Hero Member
  • *****
  • Posts: 3292
Re: Faster fade out code?
« Reply #39 on: November 03, 2016, 07:07:21 AM »
for posterity, here is the final function. I will be testing it on hardware shortly. Whoever sees this in the near or far future, feel free to use it.


fade_out()
{
   
   int i, clr;
   char j,k;
   for (j = 0; j <8; j++)
   {         
      for (i=0;i<512;i+=64)
      {
         for (k=0; k<64; k++)
         {
            clr = get_color(i+k);
            if (clr&7) clr = clr - 1;
            if (clr&56) clr = clr - 8;
              if (clr&448) clr = clr - 64;
              my_buffer[k] = clr;               
           }
          vsync();       
          clr = get_color(i-1);
       #asm   
          tia _my_buffer , $404 , 128
       #endasm
       }
   }   
     cls();
     reset_satb();
     satb_update();
     vsync();
}   


EDIT: Oh, and a huge thanks to everyone for working with me. I'm glad that a simple fade didn't kill catastrophy.

It's going to cause snow on the real system if this takes too long (goes into active display), which I think it will.

 Use the Txx instruction and read the color data directly into your my_buffer during vblank. Then do your modifications on the array - when each iteration is has completed (one iteration of j), then wait for vsync and Txx the buffer back to color ram port. That should prevent any snow on screen.


 If you slightly modify your code like this:
Quote
fade_out()
{
   
   int i, clr;
   char j,k;
   
   vsync();
   
   for (j = 0; j <8; j++)
   {         
      for (i=0;i<512;i+=64)
      {
 
       temp = i
       #asm
          lda temp
          sta $402
          lda temp+1
          sta $403
          tai $404, _my_buffer, 128
        #endasm
 
         for (k=0; k<64; k++)
         {
            clr = my_buffer[k];
            if (clr&7) clr = clr - 1;
            if (clr&56) clr = clr - 8;
            if (clr&448) clr = clr - 64;
            my_buffer[k] = clr;               
           }
          vsync();       
       #asm
          lda temp
          sta $402
          lda temp+1
          sta $403   
          tia _my_buffer , $404 , 128
       #endasm
       }
   }   
     cls();
     reset_satb();
     satb_update();
     vsync();
}   
It should get everything done during vblank and avoid snow on screen on the real system. That also includes reading in the next 64 colors as well (both transfers together only take 1.5k cpy cycles). Note: You'll need a global variable "temp" or some such name, so that you can access the  function's instance variable in asm. Also, I think the read port is $404 and not $406. If not, then change it to $406.
« Last Edit: November 03, 2016, 07:28:53 AM by Bonknuts »

DarkKobold

  • Hero Member
  • *****
  • Posts: 1200
Re: Faster fade out code?
« Reply #40 on: November 03, 2016, 08:19:06 AM »


It's going to cause snow on the real system if this takes too long (goes into active display), which I think it will.

 Use the Txx instruction and read the color data directly into your my_buffer during vblank. Then do your modifications on the array - when each iteration is has completed (one iteration of j), then wait for vsync and Txx the buffer back to color ram port. That should prevent any snow on screen.


So, that is why I put the vsync right before the transfer - even if the code takes two frames, the transfer will still only occur right after a vblank. I'm actually concerned that if I make the code faster, i'll have to put in delays,  as the fade is already pretty fast now. Its a minimum of 64 frames already.

Granted, I need to try my new code in hardware. I'll also try yours.

As a question - I keep needing to add globals to do ASM. Could a future version of HuC do locals?
Hey, you.

TheOldMan

  • Hero Member
  • *****
  • Posts: 958
Re: Faster fade out code?
« Reply #41 on: November 03, 2016, 08:44:57 AM »
Quote
I'm actually concerned that if I make the code faster, i'll have to put in delays,  as the fade is already pretty fast now.
Suck up the need to add a delay. Use a down-counter so its tuneable. It's not too bad if you use a dedicated fade routine. You can use the wait time to do other things, like loading new gfx....
Just my opinion.

Quote
As a question - I keep needing to add globals to do ASM. Could a future version of HuC do locals?

You can do that already. It's just a pain, as they have to be accessed via the Huc Stack pointer, which is slow...

Bonknuts

  • Hero Member
  • *****
  • Posts: 3292
Re: Faster fade out code?
« Reply #42 on: November 03, 2016, 09:17:41 AM »


It's going to cause snow on the real system if this takes too long (goes into active display), which I think it will.

 Use the Txx instruction and read the color data directly into your my_buffer during vblank. Then do your modifications on the array - when each iteration is has completed (one iteration of j), then wait for vsync and Txx the buffer back to color ram port. That should prevent any snow on screen.



So, that is why I put the vsync right before the transfer - even if the code takes two frames, the transfer will still only occur right after a vblank. I'm actually concerned that if I make the code faster, i'll have to put in delays,  as the fade is already pretty fast now. Its a minimum of 64 frames already.



 Hmm.. there's an issue you might not be aware of; every time you read or write to any VCE regs (that includes $400 and $401), and you'll cause the VCE not to be able to read from pixel bus that the VDC is constantly outputting to. What happens, is that since it can't read from the pixel bus - it will output the last color (pixel) that it read from the pixel bus. You get horizontal 'stretches' of colors across the screen - i.e. snow. Not just a the borders, but anywhere on a scanline. This actually happens when you turn the display "off", but since its all one color for the screen - you can't see the pixel "stretching". This is different from other color update interfere of other systems, where if update a color while display is active - you see that color update as corruption on screen. The VCE doesn't do this, but reading and writing from any VCE port gives the same stretching behavior regardless (read or write, color update regs or other VCE regs).

 Here's an example video where I purposely do it:

So any access to the VCE does this, not just reading or writing. If your routine does manage to read in and modify all 64 colors withing vblank, and you update on the following frame - then you'll be fine. And if that's the case, then don't worry about the code changes I made (unless you want more resource during vblank to do something else, but it doesn't look like it. You'd have to make a completely different system/function for that).


Quote
Granted, I need to try my new code in hardware. I'll also try yours.

Test yours first, and if it's good then don't worry about mine. Just keep my code in mind. I.e. the approach I took, as you might want to do a more flexible fade routines in the future.

Quote
As a question - I keep needing to add globals to do ASM. Could a future version of HuC do locals?

 What TheOldMan said. It's a pain, because you have to generate the .s file, look at what index represents that variable, then go back and write an indirect-index load from it. It's not just that the instance variable inside the function is a stack object, but there's no clean way to access it in asm without knowing what the index is on that stack for that specific instance variable. Indeed, it would be nice for HuC to pass this on to asm block. If the assembler had a way to make scope equates, then HuC could generate a function scope equate list for each function (the index into the stack). Globals are just easier to transfer stuff to.

touko

  • Hero Member
  • *****
  • Posts: 953
Re: Faster fade out code?
« Reply #43 on: November 03, 2016, 10:24:58 PM »
Quote
As a question - I keep needing to add globals to do ASM. Could a future version of HuC do locals?

Actually as declared:
int i, clr;
char j,k;

Are treated as local variables,if you want local in asm, you must use the stack ($100 -> $106 must be safe enough),or you can use the classic push/pop .
But in fact you have a bunch of temporary global variables already reserved(like __temp,<_al,<_bl,etc..) .
« Last Edit: November 03, 2016, 10:35:22 PM by touko »

Arkhan

  • Hero Member
  • *****
  • Posts: 14142
  • Fuck Elmer.
    • Incessant Negativity Software
Re: Faster fade out code?
« Reply #44 on: November 14, 2016, 01:49:00 PM »
Quote
I'm actually concerned that if I make the code faster, i'll have to put in delays,  as the fade is already pretty fast now.
Suck up the need to add a delay. Use a down-counter so its tuneable. It's not too bad if you use a dedicated fade routine. You can use the wait time to do other things, like loading new gfx....
Just my opinion.

Quote
As a question - I keep needing to add globals to do ASM. Could a future version of HuC do locals?

You can do that already. It's just a pain, as they have to be accessed via the Huc Stack pointer, which is slow...


The entirety of Atlantean is written with global variables.

Just saying.

lol

ASM doesn't have a concept of local, really.  you push/pop things to make them "local", simply by saving the state of all of the registers so you can f*ck around with them again before popping the stack back to reset everything, but yeah

global = <3

you'll get faster code. 
[Fri 19:34]<nectarsis> been wanting to try that one for awhile now Ope
[Fri 19:33]<Opethian> l;ol huge dong

I'm a max level Forum Warrior.  I'm immortal.
If you're not ready to defend your claims, don't post em.