At the 39C3 conference, the Usual Suspects talked about how they reverse engineered the Toshiba DSP chip from the JP80x0. In itself an incredible feat, and a super exciting talk, but the one thing that REALLY caught my interest, was what they claim to be the code for the original super saw.
https://www.youtube.com/watch?v=XM_q5T7wTpQ&t=1804s
They describe it as simply 7 saw waves, high pass filtered, with detuning, running at 88.2kHz to prevent aliasing within the audible range (?).
The code even shows the detuning coefficients, and state that it's integer maths, making them a bit hard to get right.
Now, of course I had to see if I understand the code. Here is a screenshot:
Not much. I assume next is run once per DAC update, i.e. 88200 times per second.
The saw oscillators are simply 24bit signed integers used as accumulators. For every round, "pitch" is added to the accumulator. Once the value reaches the max value that can be stored in a 24bit int, it overflows and wraps to negative minus. This way, by continously adding to the accumulator, we end up with a saw wave.
Oh, btw - looking at the code, the initialization of the array is a bit strange. This being a global array, it should automatically be initialized to all 0s. {0} explicitly sets the first element to 0, why is that needed?
Let's for a start ignore detuning. If we set detune to 0, the whole voice_detune parameter goes away and saw[i] is just incremented by pitch for every cycle.
Also, let's set spread to 1, so all oscillators have the same amplitude.
Assuming the saw waves are in perfect phase, summing them would give us a saw wave that increases 7 times faster than a single wave. But then there is something weird.
sum is also defined as a int24. My only way of understanding this is that it will overflow too, just like the saw wave accumulators. And that, would lead to a saw wave with the same amplitude as the individual waves, but with a frequency seven times higher!
Lets reduce the number of oscillators to 2 and introduce a phase shift of 25%. Without overflowing, this would lead to some tops higher than max, some lower than min and some cycles where the amplitude is less than min and max. But with overflowing, the parts above and below max/min fills in the gaps, and once again we're back to having a single waveform with a 2x frequency but the same amplitude:
| Red horizontal lines are where the sum accumulator overflows. |
Now, the PHASE of the output wave is different from the initial wave.
Here is a way of thinking about this.
For every step. each saw wave contributes "pitch" to the sum. The saw waves wrap, but the rest of pitch will be added to the bottom. This is similar to having a single saw wave with 2*pitch increase for every step.
Now consider different pitch values for the two saw waves (=different frequencies). Each wave still contributes its pitch to the sum, creating a single saw wave with pitch equal to the sum of the two other saw waves.
This extends to the rest of the saw waves, adding another saw wave just adds its pitch to the sum. In the end, the seven waves end up as a single wave with its pitch being the sum of all the pitches.
Here is an example. The grey line is all saws, with slight detuning, summed up without overflow (and plotted in a chart where y is at most 8 times that of a single saw wave. The blue line is the same waves summed with overflowing.
The horizontal lines divide the range into 8 parts, each corresponding to "one overflow".
If you look carefully, you can see that at every discontinuity, the part protruding above a grey line, is exactly the same as the part missing from the bottom and down to the previous grey line. When using overflow (or modulo), the top will wrap and be added to the bottom. Any of the divides that are empty, simply goes away in the wrapping, and we end up with the blue line.
Ok, that was a convoluted way of saying - I don't understand how the sum code is supposed to work. Saw waves of any frequency will always combine to a single saw wave of higher frequency if the sum also overflows. As neither the frequency nor the detune of a wave changes, the sum wave will stay unchanged.
If sum was a 32bit int, this would work fine and we would get an ever changing combination of the waves.
Detune and detune coefficients
Now, as for the other parts of the code, they have me confused as well, but maybe they use overflow as part of a trick?
In other parts of the presentation, a comparison between Adam Szabo's coefficients and the "real" ones is done. The coefficients are fractions, small ones too. To calculate a detune frequency, one uses
basefrequency * (1 + coefficient)
or
basefrequency + basefrequency * coefficient.
In the code above,
saw[i] = pitch + voice_detune
or
saw[i] = pitch + ( detune_table[i] * ( pitch * detune )) >> 7
Now, I presume the parenthesis are place the way they are for a reason, perhaps the parts inside the parenthesis overflow in a certain way that makes things work out, but substituting /127 for >> 7 and reordering gives us
saw[i] = pitch * (1 + detune_table[i] * detune / 128)
The lowest coefficients are 128, and the lowers integer value for detune is 1. Following that, we end up with
saw[i] = pitch * 2
This is clearly wrong. Perhaps the overflow inside the parenthesis, and the values chosen for detune, will lead to something that, when divided by 7, is always much less than (and propotional to) pitch?
As for the coefficients themselves, the individual propotions are not the same as for the fractional coefficients, so something strange is going on there as well.
Spread
Finally, we have "spread"
The outer saw waves are multiplied by spread before adding them to the sum. Presumably, this is the same as "mix" on the JP8000.
But again, being integers, spread can only INCREASE the amplitude of the saw wave (or perhaps rather the pitch, since the product of saw[i] * spread will overflow.
In Adam Szabo's paper, the center oscillator amount is reduced linearly, while the outer oscillators are increased by a curve:
Perhaps there is some kind of normalization going on, where, by increasing the outer oscillators, the relative contribution from the center one is decreased?
Again, I'm confused.
I have a feeling that at least one trick is used here. Since division is probably extremely expensive on a DSP which is built for Multiply and Accumulate, perhaps one instead uses multiply + overflow? (bitshift >> 7 is used to divide by 128, but this only works for powers of two).
I really wish someone could confirm a couple of things.
First of all, is the code completely correct - while I don't understand it at the moment, at least that would give me more confidence in looking for the solution
and
confirmation that the output of this code is indeed "samples" that, after filtering, will be output to a DAC (or the next DSP in the case of the JP8000).
My analog super saw
Years ago I build an analog 7 saw oscillator with control circuitry that emulated the control curves seen in Adam Szabo's paper. I had the curve for the detune pot using a three leg approximation, and something that looked close to the mix curves.
I actually built the whole thing before I realised
1) Mixing the saw waves would lead to clipping if the headroom was not high enough and
2) Part of what makes the supersaw sound the way it does, is that it's digital (D'oh).
Adam wrote a few things too, in the paper or on a forum, I can't quite remember. Quoted from memory: - The naive approach of generating multiple saw waves would not work as the JP8000 was not powerful enough
- He had discovered some kind of trick that Roland would not tell the world about.
Not sure if that was all smoke and mirrors, but I was hoping that this last trick was somehow related to how to prevent the overflow while still keeping gain high.
Oh well. Time to go to bed.
No comments:
Post a Comment