Author Topic: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development (Read 23205 times)

Bonknuts · « **Reply #75 on:** November 09, 2015, 06:41:07 AM »

Hmm. So I have a direct channel 4 xm software player @ 29% cpu resource or 35% if you use the last two channels to stream fixed PCM stuffs. 4 XM channels with volume and looping support, 2 fixed frequency channels with volume control, all 6 are stereo @ 5bit PCM with a rate of 7khz... 35% cpu. No regular PCE channels if 2 auxiliary PCM channels used. 6 channels total.

So here goes this: 8 PCM channels: 4 XM style with volume and looping support, 2 fixed frequency with volume control, and 2 fixed frequency at full volume. All played back at 7khz @ 6bit PCM mono... 31% cpu. 4 PCE stereo channels still left over for full use. 12 channels total.

The drawbacks of the second engine: mono PCM and the XM driver is ranges from 0%-100% of the 7khz. The first engine can play back up to 8 times the 7khz driver by skipping samples. The second one would require a couple of re-octave'd of the same instruments. Though that isn't too terrible if you consider the average instrument sample is somewhere between 1 to 2 seconds long. At 7khz, if that means having three octaves for one sample at 2 seconds long - that's 14k*3= 42k. All the XM channels on both players support looping, so an instrument can be longer than the sample itself.

So, I have the first engine complete. I'm just picking a quick MOD song that doesn't do fancy stuffs to playback on it (because I really don't want to write a full music engine

).

And to back up my claims, I need to finish the tables for the second engine. And some sort of quick demo to show it off as well. Maybe a song comprised entirely of 8 channels of glorious pan flutes.

I feel like audio, or rather PCM, is the theme of November...

touko · « **Reply #76 on:** November 09, 2015, 07:45:57 PM »

Wahou, that's the proof that the PCE audio is really good and flexible .
Of course your second engine is the more impressive .

Good works .

Bonknuts · « **Reply #77 on:** November 10, 2015, 04:53:43 AM »

Touko, you were saying that you keep your samples bit-packed? Have you looked into RLE? As in, $8n would be repeat last sample n times. Not RLE the bitpacked bytes, but the 5bit samples in 8bit format.

It really depends on the sample itself, but when values get compressed in range down to 5bit, you tend to runs of samples. I just noticed quite a few samples I was converting for the XM players, easily fit into this. One sample that I was ripping from a mod file, was actually double sampled. If I hadn't looked, I would have wasted double the space in rom for it. Now I'm going to add in analysis to my conversion tool to look for this. As well as wave forms that might benefit from halving the frequency and see what the rate of error is - if it's barely noticeable, the it's worth the savings. You could even do mixed mode samples (parts where it's at half frequency and you just repeat every sample twice for that section).

touko · « **Reply #78 on:** November 10, 2015, 09:36:50 PM »

i use this simple technique .

1 sample in the first 5 bit of the 1st byte
1 sample in the first 5 bit of the 2nd byte
and the 3th packed in the 3 last bits of the first byte, and the 2 last ones in the 2nd byte, in fact only the 3th sample needs to be shifted .
Of course you lose 1 bit, but your sample is reduced by 33% ..

Quote

to summarise

11111333 2222233X

1 bits of the 1st sample
2 bits of the 2nd sample
3 bits of the 3th sample

it's very fast .

Bonknuts · « **Reply #79 on:** November 25, 2015, 04:18:02 PM »

I wrote up a small research blog thingy about how one could go about doing a Wolfenstein 3D style game on the PC-Engine, with decent results.

Dicer · « **Reply #80 on:** November 25, 2015, 05:10:40 PM »

Quote from: Bonknuts on November 25, 2015, 04:18:02 PM

I wrote up a small research blog thingy about how one could go about doing a Wolfenstein 3D style game on the PC-Engine, with decent results.

Love to see something, even if just a proof of concept....

and better than FACEBALL

Bonknuts · « **Reply #81 on:** November 26, 2015, 05:06:16 AM »

Faceball is generally unoptimized. I know that it has to contend with planar graphics, but it doesn't even use the sprite planar format. I mean, the solid color walls are symmetrical about the Y axis. They could have simply rendered half the screen and used the SAT to show it flipped/mirrored for the lower half. Instant speed up in rendering. The largest viewable window area is only 128 real pixels (64 double internal rendered pixels). They could used a second layer of sprites on top of that for rendering the objects into that window.

I have no idea about how optimized their internal rendering engine is, but mirror trick alone would have speed it up regardless (on the sheer issue of pixel fill rate).

A closer inspection: for two window mode each window is 128x64 real pixels. Since this is kept in local ram, it takes 49k cpu cycles to clear each window for a new renderer using the Txx instruction. That's almost a whole frame, 82%, just to clear the buffer. But from what I've looked at, the game uses a very slow ORA method to clear it (read, modify, write-back).

Here's part of the buffer clear code:

Code: [Select]

.loop
txa
ora [$5D],y
sta [$5D],y
iny
iny
cpy #$10
bcc .loop

The value in Acc is always $ff. That's 26 cycles a byte for 8 bytes (it's skipping the second plane in the composite tile). I've seen a full playthrough of the game and no where have I've seen an OR pattern other than #$ff. There's also an AND routine that's built the same way. The routines that draw pixels use a similar routine, but the compare is a memory/variable and not a constant because they actually draw variable length runs of vertical of pixels. So right off the bat you're looking at a difference of 425,984 cycles vs 98,304 cycles.

The game's rendering engine is unoptimized in both high level design and lower level design.

Black Tiger · « **Reply #82 on:** November 26, 2015, 06:07:58 AM »

Faceball may not be efficient, but the 4-player splitscreen is noteworthy for console games.

ABlackFalcon made a big deal of the fact that Mario Kart 64 invented the now standard 4-player splitscreen for 3D games. If it would have been revolutionary as a N64 game if it was actually true, then it should be all the more mindblowing for a 16-bit game.

spenoza · « **Reply #83 on:** November 28, 2015, 04:46:35 AM »

What I really really want to see is a demo of this 2-engine sound hybrid beast. Love to witness the PCE putting out that many channels of sound.

Bonknuts · « **Reply #84 on:** December 01, 2015, 03:37:28 AM »

spenoza:

The sound engine demos? They'll be out soon. The first engine(well, driver) is completely done (the 4XM channels, or MOD thingy). Six octave ranges (which is way overkill), all 12 notes per octave, 32 steps in between notes. Frequency sliding works perfectly, as does finetune. Looping works. EOF markers work. My wave conversion tool to make the special waveform format is done (does 5bit and special 2's compliment 6bit and 7bit output formats). All the interfacing for the driver is done, and it's buffered too so that everything is synced (and also so that it updates during a "safe" window of time): the interrupt driver needs full control of all the PCE audio regs, so any other app is subordinate and needs to be buffered for a windowed update.

I also did some tests. The nyquist theorem says you should sample at two times the frequency. Frequency scaling is technically re-sampling. At two times the frequency output, the artifacts are surprisingly low. At three times the output, they are still surprisingly decent or bearable . Anything above that, depending on the sample, and the artifacts become predominant. If you listen to this sample,

, the horns sound gritty. That's the artifacts that I'm talking about. Of course, samples in that video haven't been preprocessed. But to be honest, those "horns" are less gritty than the ones in Bloody Wolf. I had this idea of combining sample synthesis with PCE normal channels for a paired sound.

Another example of the artifacts from resampling too far above the driver output is https://youtu.be/m6_HvykkFKI?t=3m14s at 3:14. The main instrument sounds somewhat .. screech-y and distorted.

Not every sample instrument is going to sound great on a 7khz 5bit output driver, but some sound exceptionally decent. Some will need to be resampled by an external app with proper filtering to remove the artifacts, and if the octave range usage is wide for that instrument then it might need to be resampled into 2 octave ranges (two samples). Maximizing the sample amplitude with a preprocessing app is also important, even if that means some clipping. Resolution noise (static/hiss/etc) is very much less perceivable for the human ear to detect when things are loud. The closer the sample gets to the edges of the amplitude limits, the more values or steps it has access to to represent that waveform. This is especially true for quieter parts of a sample - the more steps you can throw at it, the better it will sound. Hardware volume can be re-adjusted to compensate for the louder sample, without losing resolution depth. At 5bit resolution depth, you take whatever you can get out of a sample.

This isn't my engine, but this is a MOD player written years ago for the PCE (never made public). It gives an idea of what instruments sound good and what sounds gritty - for 7khz 5bit output. Though keep in mind none of these samples have been preprocessed to help smooth out frequency artifacts or bitdepth issues.

Note: These are played on an overclocked PCE (emulator) because the original frequency scaling code was so slow that it would often end up skipping samples and had some extreme jitter. I overclocked it to give an idea of a more solid sound of frequency scaling that can be achieved; the cpu is overclocked but the output is still 7khz 5bit samples. It's one of the reasons why I'm not sharing the source to this particular player (that and I never heard back on permission to do so).

But anyway, this should give a rough, or decent, idea of the first engine's capability; 7khz, 5bit waveforms, hardware volume, 4 channels of frequency scaling.

Kind of off/on topic, but it's kinda rare for mods and similar files (XM, IT) to emulate timbre bending of instruments through multiple samples. I experimented with this BITD with Fast Tracker 2, and it works pretty decent. If you keep the sample short, with a loop point, you have more room in rom/ram/whatever for multiple versions of that sample with different preprocessed changes over time - a set of samples, with each sample loop representing a specific change in time. In the driver, you can easily switch to which sample of the set is being played back - even in the middle of it (though I would do that on a frame basis or 1/60 sec). Basically wavetable synthesis (which is not sample based synthesis used in MODs, XMs, or the SNES and Amiga). Of course it wouldn't be on the level of a PPG synth, but the flexibility of building and controlling sounds instead of just playing back a sample at different frequencies - is pretty damn cool IMO. This is where a 15khz soft mixed engine would dominate: 2 channels capable of both wavetable and sample based synth (6 or 7bit PCM), and the rest of the soft mix channels fixed frequency. And you still have 4 regular PCE channels to give it a mix of that distinct PCE sound.

Anyway, the first engine is done. I just need to whip up a small music engine to show it off. Then finish the second engine, which is much more exciting.

Bonknuts · « **Reply #85 on:** December 01, 2015, 02:53:07 PM »

FYI, you can use y/sin(a) instead of sqrt[x^2+y^2]. Not only is the table much smaller, but the table access is much faster (the addition of two squares is expensive for just building the index into the sqrt table). In trig, and in calculus, you always tend to resort to finding the root of r^2 just because it's easy and the calculator is fast. I remember almost all my stuff from trig and at least remember the basic stuff immediately off hand. But it's those people, with the right set of eyes, that see the connection and the easier way around. Those are the brilliant computer science math geeks.

Anyway, back on topic...

elmer · « **Reply #86 on:** December 02, 2015, 04:41:31 AM »

Quote from: Bonknuts on December 01, 2015, 02:53:07 PM

Anyway, back on topic...

Can I stay on the math side for a moment?

Back-in-the-days of limited hardware (like the PCE) we just used an approximation when we needed the distance (i.e. sqrt[x^2+y^2]).

It was fast, and "accurate-enough" for most tasks.

The classic one was ...

dx = abs(x1 - x0)
dy = abs(y1 - y0)

if (dx > dy)
dist = dx + 1/2 * dy
else
dist = dy + 1/2 * dx

On the PCE, that's a couple of compares and branches, a bit-shift, and an add. Nice and fast.

Here a nice plot of the function (stolen from another site just to add a pretty pic) ...

On the PC-FX, where you've got a fast integer multiply, you can improve the function with ...

dist = 1007/1024 * dx + 441/1024 * dy

I just dug up these modern references to the same old trick ...

http://www.flipcode.com/archives/Fast_Approximate_Distance_Functions.shtml

http://gamedev.stackexchange.com/questions/69241/how-to-optimize-the-distance-function

Bonknuts · « **Reply #87 on:** December 03, 2015, 01:10:21 AM »

Hey, that's pretty decent! I've never seen that one before.

touko · « **Reply #88 on:** December 31, 2015, 05:32:54 AM »

Quote

Yep, Xanadu 2 is using the TSB/TRB trick for writing the font data ... together with self-modifying code to switch between TSB and TRB opcodes in order to set the color.

I Thought about it, like i have a routine which use sprites for some hud informations like scrore/hi-score etc..,with 8 pixel's tiles i use 2 indexed arrays (i write 2 number in one 16 pixel's sprite) .
i thing storing tiles data in VRAM and use TSB with acc=0 should be a faster VRAM to VRAM copy than 2 arrays in RAM ..

Bonknuts · « **Reply #89 on:** January 03, 2016, 04:19:01 AM »

Just beware of TRB/TSB on the $0003 takes longer than normal. Even a LDA $0003/STA $0003 takes longer than normal. TRB/TSB $0003 takes ~ 15.6 cycles on the real console. TRB/TSB $0002 takes ~ 8.9 cycles on the real console.

I did some VDMA tests and can confirm that it's roughly 81 WORDs per line in 5.37mhz mode. So definitely ~324bytes per line in 10.74mhz mode. I was able to transfer 17.6k with a clipped 209 line display @ 10.74mhz. That's more than perfect for my other bitmap mode display. A few other things too.

Author Topic: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development (Read 23205 times)

Bonknuts

Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.

touko

Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.

Bonknuts

Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.

touko

Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.

Bonknuts

Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.

Dicer

Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.

Bonknuts

Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.

Black Tiger

Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.

spenoza

Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.

Bonknuts

Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.

Bonknuts

Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.

elmer

Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.

Bonknuts

Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.

touko

Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.

Bonknuts

Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.