I wouldn't do this. If you're trying to silence the channel in DDA mode, simply don't write anything to it (also, writing the same sample value over and over in a row also have the same effect as silence). Otherwise this is going to give you a pop/click on the DAC at the end of a sample, on non SGX systems. It might not be noticeable in loud samples (that have a large amplitude pattern at the end), but ones that end with a built in fade (ending with a soft part) will make this pop/click more noticeable.
Yes i know, but i have not tested on a real PCE, only SGX,and i hesitate with muting balance (i don't know if pop is present with balance )
Ok, I understand your compression scheme now. I hadn't seen that one before. It's pretty decent. A nice trade off between size and speed.
Oh, I thought it was obvious to do like that ..  
 
 It's organised like that
Classic bytes organisation,you loose 3 bits/sample
AAAAA000  ; sample 1 => byte 1
BBBBB000  ; sample 2 => byte 2
CCCCC000 ; sample 3 => byte 3
you loose 1 bit for 3 samples
AAAAACCC ; byte 1
BBBBBCC0; byte 2
sampe A & B are send without any treatment,only sample C is shifted to a correct 5 bits sample .
It's fast (30 cycles) with a compression of 33%.
Mapping stuff into TAM #4 isn't needed, because Touko is already relying on his sample data being aligned on an even boundary, so I removed it, and got rid of the X and Y registers which were wasting cycles.
Hum i see the trick, but it's more complicated to see if you have reached the end of your sample .
I didn't use any long double buffer system. Just buffered 6 samples instead of 3, every 6th call.
Of course,long buffer is not my goal too and out of question, a 6 bytes can be descent, but no more .
Then I rearranged the data format a bit to pack the current-state flags into the sample cache for a bit more speed, and used an end-of-sample marker in bit 15 of the sample word, which could also be used for sample-looping ...
Clever ,and your bank overflow deal is clever too ..

Timing for sample 1 (122 cycles) if page-overflow
; Timing for sample 1 (110 cycles) if normal
; Timing for sample 2 ( 67 cycles)
; Timing for sample 3 ( 73 cycles)
The bonknut's results and yours show that i said 

A little buffer decrease the need of mapping and by the way the average of CPU cycles .. 

; Time (normal 1 channel): 122 *   1 calls +
;                          110 *  38 calls +
;                           67 *  39 calls +
;                           73 *  39 calls = 9762 cycles (8.2%)
8.2% is really good, and there are some good ideas here to improve sample playing.
EDIT: I modified my PCM routine a little bit, now the sample finish with the bit 7 of the second byte (if 1), the mapping is now only on MPR3, and remap if a page overflow occur .
Thanks elmer, it's mush faster now ;-)
User_Timer_Irq:   
      stz    $1403                  ; // RAZ TIMER      
      pha                        
   ; // Evite de désactiver la voix si pas de sample
      bbs      #7 , <test_octet_voix1 , .fin_sample_voix1            
      lda      #VOIX_DDA1            ; // Choix de la voix DDA 1
      sta      $800                
      bbs      #0 , <test_octet_voix1 , .prep_octets_voix1            
   ; // On Met en cache 3 samples(2 octets) pour la voix 1 et on lit le premier sample
     .fin_comp1:
      tma     #3
      pha                    
      lda     <sample_bank_voix1              
      tam     #3            
      lda     [ sample_base_voix1 ]  
      sta      <cache_memory_voix1         
      sta      $806            ; // On joue le sample 1 voix 1      
      inc      <sample_base_voix1   
      lda      [ sample_base_voix1 ]
      bmi      .voix_1_off      ; // Si bit 7 = 1 alors fin sample on sort      
      sta      <cache_memory_voix1 + 1   
   ; // On restaure l'ancienne bank
      pla
      tam    #3                 
      lda      #3
      sta    <test_octet_voix1                  
      inc      <sample_base_voix1      
      bne      .fin_sample_voix1
      inc      <sample_base_voix1 + 1
      bpl      .fin_sample_voix1      
      lda      #$60
      sta      <sample_base_voix1 + 1
      inc      <sample_bank_voix1          
      bra      .fin_sample_voix1      
     .voix_1_off:
   ; // On restaure l'ancienne bank
      pla
      tam    #3            
      stz      $804               ; // Son à 0 sur la voix 1                     
      smb      #7 , <test_octet_voix1  ; // On déactive la lecture des sample pour la voix 1      
      bra      .fin_sample_voix1      
   ; // On lit le second sample
     .prep_octets_voix1:      
      lda      <cache_memory_voix1 + 1   
   ; // On decale le compteur de sample pour la voix 1
      lsr      <test_octet_voix1
   ; // Si second sample lu, on décompresse le sample 3      
      bne      .transfert_data_sample_voix1
      and    #$60      
      lsr    A
      lsr    A      
      sta      <cache_memory_voix1 + 1
      lda    <cache_memory_voix1   
      lsr    A
      lsr    A
      lsr    A
      lsr    A
      lsr    A
      ora      <cache_memory_voix1 + 1      
     .transfert_data_sample_voix1:      
      sta       $806                                             
     .fin_sample_voix1:     
              pla
              rti