So here's a basic 4 PCM playing at the same time; a long stream followed by 3 other "FX" type samples playing shortly into (voice one, gate of thunder explosion, and a gate of thunder voice).
So this is how I'm handling this:
The 4 channels are actually a set of paired soft mixed channels. Each pair reads in an 8bit sample, used a volume table to adjust the sample, and then adds them together. But.. I don't store this as 9bit. I store it as 8bit, which means I saturate on overflow. I do the same for the next pair, but this time the output is 9bit. Using a table, which is precalculated as multiplying all samples by 2 (so 9bit becomes 10bit) and divided into upper 5bits and lower 5bits (it's a split table) to be store in a set of buffers.
So while the H-int PCM Driver is always outputting 10bit audio (in this 15.3khz version), the "mixer" can do all kinds of things, with all kinds of configurations (as well as the resource it takes to do whatever). You can use different mixers as long as they output to that specific buffer.
Ok, so the buffer: 256 bytes each for high and low. The display has 262 or 263 scanlines depending on mode you choose, but I only output 256 samples regardless. So ever so many scanlines, a sample is not output ( I think this is something like 30 or 40 lines, I forget). Since the rate is so high, you won't hear the this.
There's a couple of reasons I did this: both on the mixer side and the Hint PCM driver side, it makes things much easier. I also, with my conversion util, make sure samples all have a multiple length of 256byte blocks. Silence is appended to the sample block if the original sample ends premature of the 256 block boundary. This isn't an issue, because you don't need to start another sample MID frame.. only on frame boundaries. It makes mixing and play so much faster and at the MOST, your sample will be 255bytes longer than normal. What this translates into on the mixer side, is that you don't need to check EOF for
every byte that you read. Multiply that check by 256, and then multiply that by 4, and you'll see that it quickly adds up.
As for the mixer, why don't I just mix 8:8 to 9bit, and 8:8 9bit, and then 9:9 to 10bit. I could, and that's all in the whatever mixer module I choose to use with the Hint PCM driver. But for now, I'm looking speed.
In the current driver that I'm using (8:8->8, 8:8->8, 8:8->9, 9*2), you might notice that I'm mixing beyond my capable resolution. But cause two 8bit samples added together really need 9bit to represent it. So in cases where it overflows, I saturate it at #$ff. You might be thinking, that's got to sound horrible. It can, if your samples are near max amplitude as an average. In this next example, I boosted the amplitude of all samples to be really loud. But to avoid to distortion, I set the volume of all channels using 8bit samples to 11 bout of 15. This is roughly equivalent to 7.5 bit resolution. There is less occurrence of "clipping", they sound loud enough, and resolution is pretty good.
On more thing to add: the mixer and current samples, mixes unsigned samples. This is done for speed (the clipping thing). It sound fine, but there's a catch for any samples that start with a length of silence (though there shouldn't be, that's wasteful), and end with silence - there's going to be a pop. The way this works, is that the lower sample amplitude is 00. An unsigned 8bit sample center lines at $80. So a string of $80 is silence. If a sample trails with this, and is removed, you go from $80 down to $00 for that channel. That's going to result in a pop. So a small ramp to 0 is required to remove the pop. The same could be said of samples that have larges parts of silence - ramp down to 00, then ramp back up 00 at the end of the section - but this isn't for popping, but giving back the other mixed in channel of the pair its resolution.
If this unsigned mixing sound convoluted in design, it kinda is. But it's surprisingly easy to work with. And it sounds great. The issue when working with higher rate playback, is that it affects a lot of things. The 65x is a faster processor for tight data sets, once you start to move outside this range - performance starts to really drop off. There are ways to handle it, such as subdividing the data set into smaller chunks but with lots of multiple code paths - results in code bloat and some complex code that can be difficult to follow (debug or understand). Any, my point is - is that something has to give and I chose to change the mixing approach as my main approach.
Here's an example of the tail end of a sample ramp down to avoid popping the output:
Pretty simple stuff.
And here's a example output; all 8bit samples played at volume 11 (out of 15; linear volume scale). So you can judge the results for yourself:
http://www.pcedev.net/HuPCMDriver/8bitmixer_test2.ziphttp://www.pcedev.net/HuPCMDriver/8bitmixer_test1.zip
That's just two 8bit samples mixed at a time. Anyone have any good 4 sample mix set they can think of to demo this driver?
Cool, I really look forward to studying this!
But there's no way that I'd give a music driver 25% or more of the frame time ... that's for graphics!
I understand. There's always a trade off for something. Honestly, shmups tend to be the most active in the sound FX department IMO. This was primarily my idea for this mixer; when the big explosions samples happen in Blazing Lazers, the drum samples and some other samples immediately drop out. When playing something like GOT or LOT, that have CD audio and loud FX - it's not as noticeable. But even then, you can't have a loud creative death scream and explosion sounds at the same time with single channel ADPCM. What I envisioned was something along those lines. Of course, it works with chip music too: 2 channels reserved for drum kit and other music related samples, and two channels for awesome FX. I would give up 25% resource for that in a shmup no sweat.
Now ... if I can cut it down to 2 channels of 8-bit sound, then that seems good to me.
If the sample channels can't be tuned, then they're limited (in practice) to percussion and speech/sound-effects anyway.
Pretty much. There's no frequency scaling here. It's you basic sample playback of PCE, but with more channels without using more hardware channels, and greater bit resolution. The 7khz version is less modular at the moment, so you can't just change out mixers. I might change that and make it like the 15.3khz version, but still using the TIRQ. I'll have to play around with the numbers. If I did the modular version, then they PCM driver doesn't care how many mixed channels there are because it always outputs a buffer to a paired hardware channel set. The downside of the modular version, is that you need two sets of paired buffers (4 x 117 bytes total). Most flexible, but eats up some ram.
BTW ... do you have an estimate of the CPU time taken for 2 5-bit channels with volume control?
At 7khz? At fixed frequency? Uncompressed? It looks pretty much like this:
;call
__skip_PCM
rti
PCM:
stz $1403
BBS0 <PCM_In_Progress, __skip_PCM
inc <PCM_In_Progress
cli
pha
tma
pha
.ch0.on
stz $800
.ch0.bank
lda #00
tam #nn
.ch0
lda $0000
bmi .ch0_control
sta $806
inc .ch0+1
beq .msb_ch0
.ch1.on
lda #01
sta $800
.ch1.bank
lda #00
tam #nn
.ch1
lda $0000
bmi .ch1_control
sta $806
inc .ch1+1
beq .msb_ch1
pla
tam
pla
stz <PCM_In_Progress
rti
Sits in ram. Self modifying code. Plays nice with the Hsync interrupts. If Hsync interrupts for whatever reason takes too long, this interrupt has protection so it can't be called more than once. The self modifying labels ".ch1.on" would replace the opcodes with BRA $nn if the channel was disabled. This allows you to use both samples and regular use; the channel wouldn't be just reserved for sample use. Since it's hardware channels, independent, volume only needs to be handled on a 60hz or less basis and not in the driver itself. And since it's hardware, no soft volume translation needed. You'll have to count those cycles to see what it comes out to, ignoring bank adjustment cases. I tend to either do 116 samples a frame by resyncing the TIRQ in Vblank int, or 117 as the same as 116 +sync, but I make a fake INT call to the routine inside vblank. Though I really can't tell the difference between 7000hz and 6960hz.
As far as a "new" driver goes ... I'm curious about using the ADPCM hardware for a drum channel.
I can give you the source to a PCE soft ADPCM player. Though it handles saturation in the player itself, but it would be faster to handle those cases outside the player.