I've spent the last couple of weeks trying to get to the bottom of the super saw code example after it became clear that it cannot simply be implemented the way it is written. Among other things, it's using DSP multiply high, and perhaps there are some other simplifications in there?
In general, these things didn't make sense:
- The detune table didn't match the fractional detune coefficients we know and love.
- Pitch * detune didn't make sense as it would lead to detuning of more than twice the base frequency
- Summing of the saw waves would make the sum accumulator overflow, turning the output into a single saw wave of higher frequency than the individual saws.
- The summing doesn't follow the curves suggested by Adam Szabo
After thorough studies of the emulator and running code, I've managed to reproduce an accurate version of the code that runs on the DSP, but one that can be used on a normal processor. It's very close to the suggested version, but with some crucial differences:
- Coefficients 4 to 7 are half of what they were presented as
- Pitch * detune is a multiply-high, which is common in DSPs
- The individual saw waves, including the center wave, are attenuated (divided) before summing to prevent overflow.
- The summing curves are indeed different from what is expected. Specifically, the center oscillator is never attenuated when the others are increased. The curves are likely the effect of normalization or similar later in the code
The modified code looks like this. It utilizes variable roll-over, so it is crucial to use 24bit integers. Also, allow multiplication results to be 48 bits before shifting right.
int24_t saw[7] = {0,0,0,0,0,0,0};
const int24_t detune_table[7] = {0, 128, -128, 408, -412, 704, -720};
int24_t next(int24_t pitch, int24_t mix, int24_t detune) {
int24_t sum = 0;
for (int i = 0; i < 7; i++) {
int24_t detunePitch = ((int48_t) pitch * detune) >> 23;
int24_t voice_detune = ((int48_t) detune_table[i] * detunePitch) >> 7;
saw[i] += pitch + voice_detune;
if (i == 0) {
sum += ((int48_t) saw[i] * 25) >> 7;
} else {
sum += ((int48_t) saw[i] * (mix >> 16)) >> 7;
}
}
return high_pass(sum);
}
Explanation of the code
Generating saw waves
Oscillators are calculated by summing the current value with a new increment and letting the variable overflow/wrap around.
The center oscillator is simply the previous value + pitch, pitch is nothing more fancy than the increment needed to get the variable to overflow the correct number of times per second.
Pitch/detuning
For all the other oscillators, a detune base is calculated. This is the number by which the coefficients are multiplied.
The multiplication is a multiply high, e.g. it multiplies the two numbers but keeps only the upper part:
int24_t detuneBase = (pitch * (detune >> 16)) >> 7
detuneBase += ((pitch >> 7) * ((detune >> 9) &0x7f)) >> 7
or
int24_t detuneBase = pitch * detune >> 23
Now, the oscillator increment values can be written as the following:
int24_t osc2Inc = pitch + detuneBase
int24_t osc3Inc = pitch + (detuneBase * -128) >> 7
int24_t osc4Inc = pitch + (detuneBase * 102) >> 5
int24_t osc5Inc = pitch + (detuneBase * -103) >> 5
int24_t osc6Inc = pitch + (detuneBase * 44) >> 3
int24_t osc7Inc = pitch + (detuneBase * -45) >> 3
And if we make all of them shiftable by >> 7:
int24_t osc2Inc = pitch + (detuneBase * 128) >> 7
int24_t osc3Inc = pitch + (detuneBase * -128) >> 7
int24_t osc4Inc = pitch + (detuneBase * 408) >> 7
int24_t osc5Inc = pitch + (detuneBase * -412) >> 7
int24_t osc6Inc = pitch + (detuneBase * 704) >> 7
int24_t osc7Inc = pitch + (detuneBase * -720) >> 7
From this, we get the correct coefficients:
{0, 128, -128, 408, -412, 704, -720}
And the general formula:
saw[i] += pitch + (detuneBase * coefficient) >> 7
Summing, multiplication by mix
Mixing is very simple:
Osc 1: (saw[0] * 25) >> 7 // divide by 0.1953 to prevent overflow
Osc n: (saw[n] * (mix >> 16)) >> 7 // uses 8 MSB from mix.
Inputs
The code above, while understandable, is quite unusable without the proper input values. Let's have a quick look at what they mean. I've added a bit about where they can be found in the DSP code in a different post.
Pitch
Pitch input, without any modulation, ranges from 1555 for midi note 0 to 1338944 for midi note 117, which is the last note that has a unique value (e.g. the highest playable note).
Pitch is simply the number that must be added to the 24bit accumulator every step to make it overflow f times per second.
For example: Triggering note 97 sets pitch to 421800. For 421800 => 16777216 / 421800 = 39.7753 steps are needed to get the variable to roll over. At 88.2kHz that means f = 88200 / 39.7753 = 2217.46Hz.
Looking at the midi table, that's the exact frequency represented by midi note 97.
Mix
Mix ranges from 102400 to 2183168, and follows a straight line. 128 discrete steps are available.
Ex: Midi value 127 arrives as 2183168 (MSB-aligned, sign + 14bit precision 24bit int). This is the value used throughout the code. As the value is transmitted as two 8bit coefficients, it can also be thought of as 4264 internally in the MCU*
* the value is transmitted as 8MSB, which includes the sign bit, and then 7bits (sign bit not used), joined into a 15 bit number and 0 padded to a 24bit signed int.
An important ting to note is that the mix control signal is completely linear, and it only affects the detuned oscillators, not the center one. For those who have seen the Adam Szabo paper, he states that the outer oscillators follow a curved response and that the center oscillator is attenuated as the others are turned up.
This effect is absolutely real - but it does not stem from the supersaw generation code. In fact, the output from the DSP that creates the supersaw shows the output one would get from the code above. However, once the signal reaches the DAC, at the output of DSP 4, the signal does indeed function as Szabo measured. Somewhere along the line, the amount of each frequency is changed, perhaps in some form of total-energy or normalization process.
Detune
Detune ranges from 512 to 164352 and follows a exponential-ish curve (more on the details later). 128 discrete steps are available.
Ex: Midi value 127 arrives as 164352 (MSB-aligned, sign + 14 bit precision 24bit int). This is the value used throughout the code. As the value is transmitted as two 8bit coefficients, it can also be thought of as 321 interally in the MCU.
Smoothing
While not shown in the code above, mix and detune are smoothed, e.g. changes are not immediate. Instead they are changed gradually during a few steps after setting. This happens in the DSP-code, not the MCU.
Input values
Here is how to get the correct input values for pitch, mix and detune
Pitch
The formula for pitch, given midi note n is
frequency f = 400 * 2^((n - 69) / 12)
pitch = round(f * 2^24 / 88200)
Now, this won't give the exact values the JP8000 uses as those are a bit imprecise, but the difference is very small.
Mix
The mix control signal is linear and follows these rules:
0 = 102400
1 to 127: += 16384
When it arrives at the oscillator mixing code, only the upper 16 bits of mix are used, which means the control curve internally is
0, 1 = 1
2 to 127: increase by 1 for every four steps.
Detune
As mentioned, detune follows a sort of exponential curve. In reality, it's made up of linear segments. The value is transmitted from the MCU to the DSPs as two 8bit numbers that together make up a 15 bit (sign bit + 14 databit) numbers, so it can be thought of as a 15 bit number inside the MCU.
If doing so, the curve follows these rules:
0 = 1
0 to 63: increment by 1 every second step
64 to 80: increment by 1 every step
81 to 120: increment by 2 every step
121 to 123: increment by 8 every step
124: increment by 16
125: increment by 32
126: increment by 96
To get the value as seen by the DSP, multiply by 512.
There is a special case with 103, it is loaded as 40448. I'm not sure why, it looks like a bug. However, it IS present at the point where pitch * detune is calculated in the emulator, and it does affect the calculation.
 |
| X: Midi values, Y: Detune values (15bit) |