Author Topic: PCE PCM (Read 10669 times)

Bonknuts · « **on:** November 25, 2016, 09:54:45 AM »

I was digging through my PCM code from over the summer, and looked at an unfinished project; 15.3khz @ four 8bit PCM channels using just 2 hardware channels, for a total of 8 channels.

It has no frequency scaling or "mod" type sample-based synth, just regular sample playback. I had problems mixing in the last channel of the 4. So yesterday, I fixed it and it's all working. There's software volume for each mixed channel too. I use the video scanline interrupt to playback a buffer of mixed samples to a paired channel 10bit output. I added up all the numbers, and it's 47% cpu resource to do this. Probably not attractive enough number for some projects, but it's a working example of a proof of concept.

So I started modifying it for 7khz output (no video interrupt required). I'm looking at ~24% cpu resource for the same thing, but at 7khz. So 8channels on PCE, 4 are PCM at 8bit res. I think someone was interested in this (touko maybe?). I dunno, but considering Air Zonk eats up that same amount for its music engine with just one channel - I thought that was pretty good. It's possible to mix more software channels in, but it doesn't seem worth it. I mean, what are you going to do with something like 8 PCM channel anyway???

I might be able to drop the resource down a couple of percentage points on the 7khz 4 channel version with some optimization. I'll have to see what I can do.

Update:
Here's my batch of 7khz sample scaling vs 14khz sample scaling demo roms. On the real system, the 14khz performs better than on emulators thanks to the analog filtering. It's still not a difference, or as much as I expected, going with double the frequency. But there is more 'punch' to some of the samples. Or at least on my stereo system. http://www.pcedev.net/HuPCMDriver/7khz_and_14khz.zip <- try them out on the real system (not emulator).

Bonknuts · « **Reply #1 on:** November 25, 2016, 05:18:26 PM »

http://www.pcedev.net/HuPCMDriver/8bitmixer_test1.zip
That's just two 8bit samples mixed at a time. Anyone have any good 4 sample mix set they can think of to demo this driver?

touko · « **Reply #2 on:** November 25, 2016, 09:11:39 PM »

Quote

So 8channels on PCE, 4 are PCM at 8bit res. I think someone was interested in this (touko maybe?)

Yes of course,even if i have already a 2 channel PCM with compression .

elmer · « **Reply #3 on:** November 26, 2016, 04:19:29 AM »

Quote from: Bonknuts on November 25, 2016, 05:18:26 PM

http://www.pcedev.net/HuPCMDriver/8bitmixer_test1.zip
That's just two 8bit samples mixed at a time. Anyone have any good 4 sample mix set they can think of to demo this driver?

Cool, I really look forward to studying this!

But there's no way that I'd give a music driver 25% or more of the frame time ... that's for graphics!

Now ... if I can cut it down to 2 channels of 8-bit sound, then that seems good to me.

If the sample channels can't be tuned, then they're limited (in practice) to percussion and speech/sound-effects anyway.

BTW ... do you have an estimate of the CPU time taken for 2 5-bit channels with volume control?

As far as a "new" driver goes ... I'm curious about using the ADPCM hardware for a drum channel.

<EDIT>

Whoops ... I thought that you'd released source code rather than a ROM demo. :oops:

Sure ... it sounds good!

There's a little audible crackling in mednafen, but that could easily just be mednafen.

Bonknuts · « **Reply #4 on:** November 26, 2016, 05:51:17 AM »

So here's a basic 4 PCM playing at the same time; a long stream followed by 3 other "FX" type samples playing shortly into (voice one, gate of thunder explosion, and a gate of thunder voice).

So this is how I'm handling this:
The 4 channels are actually a set of paired soft mixed channels. Each pair reads in an 8bit sample, used a volume table to adjust the sample, and then adds them together. But.. I don't store this as 9bit. I store it as 8bit, which means I saturate on overflow. I do the same for the next pair, but this time the output is 9bit. Using a table, which is precalculated as multiplying all samples by 2 (so 9bit becomes 10bit) and divided into upper 5bits and lower 5bits (it's a split table) to be store in a set of buffers.

So while the H-int PCM Driver is always outputting 10bit audio (in this 15.3khz version), the "mixer" can do all kinds of things, with all kinds of configurations (as well as the resource it takes to do whatever). You can use different mixers as long as they output to that specific buffer.

Ok, so the buffer: 256 bytes each for high and low. The display has 262 or 263 scanlines depending on mode you choose, but I only output 256 samples regardless. So ever so many scanlines, a sample is not output ( I think this is something like 30 or 40 lines, I forget). Since the rate is so high, you won't hear the this.

There's a couple of reasons I did this: both on the mixer side and the Hint PCM driver side, it makes things much easier. I also, with my conversion util, make sure samples all have a multiple length of 256byte blocks. Silence is appended to the sample block if the original sample ends premature of the 256 block boundary. This isn't an issue, because you don't need to start another sample MID frame.. only on frame boundaries. It makes mixing and play so much faster and at the MOST, your sample will be 255bytes longer than normal. What this translates into on the mixer side, is that you don't need to check EOF for every byte that you read. Multiply that check by 256, and then multiply that by 4, and you'll see that it quickly adds up.

As for the mixer, why don't I just mix 8:8 to 9bit, and 8:8 9bit, and then 9:9 to 10bit. I could, and that's all in the whatever mixer module I choose to use with the Hint PCM driver. But for now, I'm looking speed.

In the current driver that I'm using (8:8->8, 8:8->8, 8:8->9, 9*2), you might notice that I'm mixing beyond my capable resolution. But cause two 8bit samples added together really need 9bit to represent it. So in cases where it overflows, I saturate it at #$ff. You might be thinking, that's got to sound horrible. It can, if your samples are near max amplitude as an average. In this next example, I boosted the amplitude of all samples to be really loud. But to avoid to distortion, I set the volume of all channels using 8bit samples to 11 bout of 15. This is roughly equivalent to 7.5 bit resolution. There is less occurrence of "clipping", they sound loud enough, and resolution is pretty good.

On more thing to add: the mixer and current samples, mixes unsigned samples. This is done for speed (the clipping thing). It sound fine, but there's a catch for any samples that start with a length of silence (though there shouldn't be, that's wasteful), and end with silence - there's going to be a pop. The way this works, is that the lower sample amplitude is 00. An unsigned 8bit sample center lines at $80. So a string of $80 is silence. If a sample trails with this, and is removed, you go from $80 down to $00 for that channel. That's going to result in a pop. So a small ramp to 0 is required to remove the pop. The same could be said of samples that have larges parts of silence - ramp down to 00, then ramp back up 00 at the end of the section - but this isn't for popping, but giving back the other mixed in channel of the pair its resolution.

If this unsigned mixing sound convoluted in design, it kinda is. But it's surprisingly easy to work with. And it sounds great. The issue when working with higher rate playback, is that it affects a lot of things. The 65x is a faster processor for tight data sets, once you start to move outside this range - performance starts to really drop off. There are ways to handle it, such as subdividing the data set into smaller chunks but with lots of multiple code paths - results in code bloat and some complex code that can be difficult to follow (debug or understand). Any, my point is - is that something has to give and I chose to change the mixing approach as my main approach.

Here's an example of the tail end of a sample ramp down to avoid popping the output:

Pretty simple stuff.

And here's a example output; all 8bit samples played at volume 11 (out of 15; linear volume scale). So you can judge the results for yourself:

http://www.pcedev.net/HuPCMDriver/8bitmixer_test2.zip

Quote from: elmer on November 26, 2016, 04:19:29 AM

Quote from: Bonknuts on November 25, 2016, 05:18:26 PM
http://www.pcedev.net/HuPCMDriver/8bitmixer_test1.zip
That's just two 8bit samples mixed at a time. Anyone have any good 4 sample mix set they can think of to demo this driver?

Cool, I really look forward to studying this!

But there's no way that I'd give a music driver 25% or more of the frame time ... that's for graphics!

I understand. There's always a trade off for something. Honestly, shmups tend to be the most active in the sound FX department IMO. This was primarily my idea for this mixer; when the big explosions samples happen in Blazing Lazers, the drum samples and some other samples immediately drop out. When playing something like GOT or LOT, that have CD audio and loud FX - it's not as noticeable. But even then, you can't have a loud creative death scream and explosion sounds at the same time with single channel ADPCM. What I envisioned was something along those lines. Of course, it works with chip music too: 2 channels reserved for drum kit and other music related samples, and two channels for awesome FX. I would give up 25% resource for that in a shmup no sweat.

Quote

Now ... if I can cut it down to 2 channels of 8-bit sound, then that seems good to me.

If the sample channels can't be tuned, then they're limited (in practice) to percussion and speech/sound-effects anyway.

Pretty much. There's no frequency scaling here. It's you basic sample playback of PCE, but with more channels without using more hardware channels, and greater bit resolution. The 7khz version is less modular at the moment, so you can't just change out mixers. I might change that and make it like the 15.3khz version, but still using the TIRQ. I'll have to play around with the numbers. If I did the modular version, then they PCM driver doesn't care how many mixed channels there are because it always outputs a buffer to a paired hardware channel set. The downside of the modular version, is that you need two sets of paired buffers (4 x 117 bytes total). Most flexible, but eats up some ram.

Quote

BTW ... do you have an estimate of the CPU time taken for 2 5-bit channels with volume control?

At 7khz? At fixed frequency? Uncompressed? It looks pretty much like this:

Code: [Select]

	;call
__skip_PCM
	rti
			

PCM:	
			
			stz $1403
			BBS0 <PCM_In_Progress, __skip_PCM
			inc <PCM_In_Progress
			cli

			pha

			tma
			pha

.ch0.on
			stz $800
.ch0.bank
			lda #00
			tam #nn

.ch0
			lda $0000
			bmi .ch0_control
			sta $806
			inc .ch0+1
			beq .msb_ch0 


.ch1.on
			lda #01
			sta $800
.ch1.bank
			lda #00
			tam #nn
			
.ch1
			lda $0000
			bmi .ch1_control
			sta $806
			inc .ch1+1
			beq .msb_ch1 
			
			pla
			tam
			pla
			stz <PCM_In_Progress
	rti

Sits in ram. Self modifying code. Plays nice with the Hsync interrupts. If Hsync interrupts for whatever reason takes too long, this interrupt has protection so it can't be called more than once. The self modifying labels ".ch1.on" would replace the opcodes with BRA $nn if the channel was disabled. This allows you to use both samples and regular use; the channel wouldn't be just reserved for sample use. Since it's hardware channels, independent, volume only needs to be handled on a 60hz or less basis and not in the driver itself. And since it's hardware, no soft volume translation needed. You'll have to count those cycles to see what it comes out to, ignoring bank adjustment cases. I tend to either do 116 samples a frame by resyncing the TIRQ in Vblank int, or 117 as the same as 116 +sync, but I make a fake INT call to the routine inside vblank. Though I really can't tell the difference between 7000hz and 6960hz.

Quote

As far as a "new" driver goes ... I'm curious about using the ADPCM hardware for a drum channel.

I can give you the source to a PCE soft ADPCM player. Though it handles saturation in the player itself, but it would be faster to handle those cases outside the player.

Bonknuts · « **Reply #5 on:** November 26, 2016, 05:53:22 AM »

Quote from: elmer on November 26, 2016, 04:19:29 AM

There's a little audible crackling in mednafen, but that could easily just be mednafen.

The clicking, if you play the second example, is just me not initializing the driver before sending anything to it (I just haven't got around to it).

Bonknuts · « **Reply #6 on:** November 26, 2016, 06:27:34 AM »

Code: [Select]

.loop

.ch0.a    ldx $0000,y
.ch0v.a   lda $0000,x  ;vol
      
    
.ch1.a    ldx $0000,y
.ch1v.a   adc $0000,x  ;vol

        bcc .skip00       ;4:6
        lda #$ff
        clc
.skip00
        sta <D0.l

I just realized something.. this is the mixing code for paired channels.. unsigned. Specifically the lda #$ff and clc are needed for overflow.

But.. if I did signed 2's complement method with clamping..

Code: [Select]

        bvc .skip00
        lda #$7f
        adc #$00
.skip00

It's like the 65x was made for this.. lol! I can't believe I missed this. Same amount of cycles, and it handles both overflow cases for signed addition.

Ok.. I'm gonna have to change this whole thing over to signed mixing. Dammit.. I'll have to re-order my 10bit conversion tables.

elmer · « **Reply #7 on:** November 26, 2016, 10:57:15 AM »

Quote from: Bonknuts on November 26, 2016, 05:51:17 AM

So here's a basic 4 PCM playing at the same time; a long stream followed by 3 other "FX" type samples playing shortly into (voice one, gate of thunder explosion, and a gate of thunder voice).

So this is how I'm handling this:

Thanks for the detailed explanation ... that's really nice and clever!

You've put a lot of thought into the implementation details.

Quote from: Bonknuts on November 26, 2016, 05:51:17 AM

Quote from: elmer on November 26, 2016, 04:19:29 AM
But there's no way that I'd give a music driver 25% or more of the frame time ... that's for graphics!

I understand. There's always a trade off for something. ...
I would give up 25% resource for that in a shmup no sweat.

Absolutely ... the 4th-gen is all about finding creative solutions to problems, and to designing your game to fit within the bounds of the hardware.

For me, in that example, I'd have the CD music, 1 channel of ADPCM (12-bit samples), and either 1 channel of 7KHz 8-bit samples on the PSG, or 2 channels of 7KHz 5-bit samples (probably the latter).

That would keep the CPU cost low, and the memory cost low, and give decent results without compromising the rest of the game.

But that's just me!

Quote

At 7khz? At fixed frequency? Uncompressed? It looks pretty much like this:

Thanks, that's nice and simple and fast ... I really like the "fast" part.

Quote

Though I really can't tell the difference between 7000hz and 6960hz.

Yeah, not worth the bother of rsyncing, you won't notice a 1% playback difference.

Even then, you can just resample to 6960Hz in SOX instead of 7000Hz.

Quote

I can give you the source to a PCE soft ADPCM player. Though it handles saturation in the player itself, but it would be faster to handle those cases outside the player.

Hahaha ... you misunderstand me!

I don't want to do realtime ADPCM conversion, either to read or write, it's way too slow for game use.

What I'm talking about would be to incorporate the PCE CD ADPCM playback into the sound driver as another channel.

That way it could be used for high-quality drums/percussion/voice whenever sound effects aren't using it.

It would just be another tool in the sound designer's arsenal, rather than a separate programmer-controlled feature, the way that it is now.

Bonknuts · « **Reply #8 on:** November 26, 2016, 01:53:56 PM »

Quote from: elmer on November 26, 2016, 10:57:15 AM

What I'm talking about would be to incorporate the PCE CD ADPCM playback into the sound driver as another channel.

That way it could be used for high-quality drums/percussion/voice whenever sound effects aren't using it.

It would just be another tool in the sound designer's arsenal, rather than a separate programmer-controlled feature, the way that it is now.

What sound driver though? The PSG player? It's been some years since I looked at it, but I suspect (and remember) you need a piece of code that spies/monitors some PSG player attributes as it's parsing other track data, to sync with it and then make its own calls to the ADPCM hardware to play whatever samples in sync. That also means either and outside channel parser that reads the mml byte converted code of the system player, but parse it itself (to keep the compiler happy), or modify whatever compiler to handle as special ADPCM track of code (mml or not - whatever). The easiest way might be to write your own Vbl or TIRQ routine so you can write a hook that executes first before the call to the PSG player.

I dunno. Interfacing it with the unmodified PSG player of the sys card is going to be hack-y. Probably doable, but still hack-y. And probably ugly hack-y too.

elmer · « **Reply #9 on:** November 26, 2016, 03:07:40 PM »

Quote from: Bonknuts on November 26, 2016, 01:53:56 PM

What sound driver though? The PSG player?

God, no!

We've already identified that the SquirrelPlayer is a disassembled version of the System Card PSG player ... and therefore "tainted" in copyright terms.

Easier to just replace it with a new command-stream-per-channel player.

It could even accept the same bytecodes as the PSG player, if that made any kind of sense.

As Arkhan found out with the SquirrelCompiler ... the System Card PSG player is just a processed form of MML, with some extra bells-and-whistles.

FYI, the sound driver that I wrote back in the 1980s is also based on similar ideas ... it was custom-specified (by the musician) to replace the driver that he'd written for the C64 ... which was MML-based.

The biggest differences are in the byte-coding, and not in the background theory.

Processing a MIDI file, or possibly a Deflemask file, into some bytestream format isn't exactly rocket-science.

Arkhan was concerned with 100%-compatibility with the System Card PSG Player.

I don't see that as a useful/interesting/desirable goal for a new sound driver.

The most-important criteria, if a new driver is to be made at all, would be to make sure that there are usable tools surrounding it.

It's mostly a question of desire, and priorities.

Remember ... Arkhan wrote Squirrel for his own use, because he needed something.

TailChao wrote HuSound for similar reasons.

I'm perfectly-capable of doing the same if I decide that it's in my own self-interest.

Oh ... and we need a sound driver for the PC-FX, anyway.

elmer · « **Reply #10 on:** November 27, 2016, 04:27:36 AM »

Quote from: Bonknuts on November 26, 2016, 01:53:56 PM

I dunno. Interfacing it with the unmodified PSG player of the sys card is going to be hack-y. Probably doable, but still hack-y. And probably ugly hack-y too.

BTW ... If someone actually wanted to make changes to the PSG player, then you wouldn't hack it at a binary level, just modify the source-code and assemble a new version.

After all, the player is available in source-code form, as the "SquirrelPlayer", with TheOldMan's excellent commented disassembly.

Heck, there's the old 2001 disassembly by zeograd and whoever-else that could be cleaned up back into fully-assemblable source.

But ... then you've still got the issue of the modifying the toolchain surrounding it in order to support your new features, and at this point, that means modifying Squirrel.

It's all about the toolchain, and less about the driver.

Bonknuts · « **Reply #11 on:** December 04, 2016, 01:52:24 PM »

In the 7khz resampler driver (XM/MOD style frequency scaling), I was curious of how to handle the issue of samples being crush when they were played at higher playback rates but at 7khz output. Initially, and not included in any of the demos I released, some samples would need to be resampled to a low base frequency (something better than nearest neighbor method), and then transposed.

But I got to thinking, what would it sound like if the 7khz XM driver did 14khz instead of 7khz? How would it sound for crushed samples? Did I redid the driver, and the resource surprisingly isn't that bad. The original one, if you played all 4 XM channels AND 2 regular sample channels (6 total) all at the same time - it would take 35.7% cpu resource. I decided, just to move the 4 XM channels into double frequency and keep the 2 fixed channels at 7khz (I was thinking sound FX for them). So 14khz 4 XM channels and 7khz 2 fixed channel = 61.3% cpu resource. That's pretty good. Again, that's all 6 channels playing samples. I could further reduce that if I moved one of the XM frequency channels to the 7khz domain (3x 14khz XM channels, 1x 7khz XM channel, 2x fixed 7khz channels) = 55.9% cpu resource.

Bonknuts · « **Reply #12 on:** December 05, 2016, 02:36:27 PM »

I haven't tested the 14khz version on the real system yet, but under mednafen - there really isn't a huge difference. The crushing of some samples, is only alleviated somewhat. Drum/snare/etc are more crisp, but I was expecting a much bigger difference for twice the frequency output. I'm beginning to think maybe it's the 5bit resolution paired with the higher frequency (sample skipping) that is the issue.

I'm gonna do a 7khz 4 channel 8bits and see what that sounds like. I have a feeling it going to sound better than the 14khz one..

elmer · « **Reply #13 on:** December 12, 2016, 05:15:44 PM »

Continued on from the MML thread in order to stop Arkhan from getting unhappy ...

http://www.pcenginefx.com/forums/index.php?topic=21677.msg479278#msg479278

Quote from: Bonknuts on December 12, 2016, 04:01:25 PM

On your 2nd channel one. I would add a cli, nop, sei right after .channel4. Your worst case scenario for each channel is 68 cycle delay for H-int, which would probably be fine, but I wouldn't push it with twice that in a worse case scenario.

Why? It's not needed.

You have 455 cycles per scanline.

In your hsync-interrupt planning you already have to make allowances for a 32-byte TIA instruction that disables interrupts for 241 cycles.

I allow any already-triggered hsync interrupts to run at the start of the timer IRQ with the "cli" instruction.

The max time taken in the timer IRQ after interrupts are disabled again is far less than 241 cycles.

Where is the problem?

Quote

Also, by not having a busy flag system (have interrupts open for the whole thing)- you're setup is going to be little less friendly with code using small Txx in 32byte segments during active display - and worse case scenarios in all settings (H-int and TIRQ). Just something to note. Might want to recommend or write block transfers with 16byte or 8byte segments with Txx.

Errr ... now I could be missing something ... but "nope".

My maximum IRQ-disable is far-shorter than a 32-byte TIA.

If another hsync/vync IRQ happens during the 3-instructions that IRQs are enabled, then there is no problem.

If it's another timer-IRQ, then it means that I should have output 2 samples during this time-period ... and I do! I've eliminated the "jitter"!

As-far-as-I-can-see, there is no way that something "bad" can happen here in a single-processor system.

Now, if you go multi-core ... then this whole construct falls apart.

But that's not a problem on the PC Engine.

Quote

Something I'm curious about: Why channel's 3 and 4? Why not channels 0 and 1? Channel 0 saves you 2 cycles. And leaving channels 4 and 5 free allow noise mode for both of those while samples are playing.

No massively earth-shaking reason, just practicality.

Channel 3 for a sampled sound-effect, and channel 4 for a sampled drum.

That leaves channel 5 for a regular drum, and the channels 0,1,2 for regular tune data.

It's just the most-likely usage from my POV. I could be wrong.

Bonknuts · « **Reply #14 on:** December 12, 2016, 06:03:37 PM »

Hmm.. let me think about this for a sec..

Txx happens and delays all interrupts by 241 cycles. Lets assume H-int and TIRQ fire near the exact same time, and at the very start of Txx. So, both are delayed by 241 cycles. Actually, I have no idea which IRQ has higher priority.. VDC or TIRQ. Anyway, whatever. Lets assume TIRQ has high priority on the CPU and gets called first. Your TIRQ routine re-enables interrupts 15 cycles after the call. So worse case, H-int is only stalled by 241+15. And 15 is only from the TIRQ routine.

Yeah, that works out pretty decent. Nevermind.

But what I was going to tell you, is that you can leave the channel in waveform mode and write to it like it was DDA. If you set the channel frequency really close to your TIRQ output, you can get weird overlay type of effects. Depending on the sample itself, it can sound interesting or not. I don't have a video to show though.

Or, if you did an 8bit phase accumulator (overflow increments pointer) on a 32byte waveform in memory, you would do rough timing to 'distort' a single channel's instrument waveform over time - giving somewhat predictable timbre changing effects.
Like this:
But better because this one is waveform updating a single sample at 60hz, not 6960/tick hz. You could make up for the phase accumulator overhead by not having overhead for bank, eof, or msb checks. Why have a phase accumulator? So you can roughly scaling the frequency of the distortion with the scale of the note (based on octave and note).

Author Topic: PCE PCM (Read 10669 times)

Bonknuts

PCE PCM

Bonknuts

Re: PCE PCM

touko

Re: PCE PCM

elmer

Re: PCE PCM

Bonknuts

Re: PCE PCM

Bonknuts

Re: PCE PCM

Bonknuts

Re: PCE PCM

elmer

Re: PCE PCM

Bonknuts

Re: PCE PCM

elmer

Re: PCE PCM

elmer

Re: PCE PCM

Bonknuts

Re: PCE PCM

Bonknuts

Re: PCE PCM

elmer

Re: PCE PCM

Bonknuts

Re: PCE PCM