Monday, January 19, 2026

Roland/Toshiba TC170C140 ESP multiplication

The Roland/Toshiba TC170C140 ESP is a 24bit DSP, presumably fixed point. 24 bit fixed point DSPs store numbers as Q1.23, meaning one sign bit and 23 "value bits". The numbers, while strictly large integers, are treated as numbers from -1.0 to +0.9999999.

The ESP has an internal 24 x 8 multiplier, and does multiply-high, meaning it keeps a full multiplication result and then returns the upper 24 bits of the result (excluding any extended sign bits) by doing a bit shift right after the multiplication.

It is common for DSPs to do multiply high, as it makes the second factor behave like it's in the range -1 to 1 (which, I guess, is fixed point). It makes the multiplier a scaler, scaling the multiplicand between plus/minus the original value. 

Multiply high in a 24bit fixed point DSP would multiply two 24bit numbers, then shift >> 23 and store the result as a 24bit integer (keeping the sign bit and 23 MSB of the multiplication).

The fact that the ESP multiplier is 24 x 8 does not mean that the second factor of the multiplication, as seen from the system side, is 8bit. It is actually 24 bits, but only the upper 8 bits (sign + 7 bits) are used during multiplication. The rest of the bits are discarded beforehand using >> 16, to make it fit within an 8 bit variable. 

After the multiplication, the result is shifted a further 7 bits for a total of 23, making it a normal 24bit multiply high. However, the precision of the multiplier is only 7 bits + sign, e.g. only +127/-128 steps are available. This, of course, will often not be good enough.

 

Multiplication "by hand"

Let's take a step back and consider how multiplication is done by hand, to get a more intuitive feeling of what precision loss means:

67 * 52

First multiply 6 by 52 and align with the 10-place

6 * 52 = 312

Then multiply by the 7 and align with the 1-place, below 

7 * 52 = 364

Finally, take the sum:

  312_

+ _364  

= 3484

Now let's pretend we're doing a multiply-high in a decimal system. We only have two places available to store the result, so we're dividing the multiplication with 100 afterwards. The result would be 34.

Then consider a multiplication where the precision of the first factor is a single place only - the resolution is 10s, not 1s. 

In this case, the result of the multiplication would be 3120, and the final result 31. We've lost precision.

This is similar to what happens with the 24 x 8 >> 7 multiplication in the ESP.

But there is something interesting here. The full multiplication is done in two steps, one for the 1's and one for the 10's of the first factor, and the ESP is only doing one of them in its normal multiply and accumulate (MAC).


Double precision multiplication

Fortunately, the ESP provides a second multiplication instruction, called  DMAC (Double precision multiply and accumulate). This lets us calculate another part of the multiplication using the bits below the 8 MSB.

To see exactly how this works, let's go back to multiply-high:

With A and B being 24 bit, we want to do A * B >> 23.

But as B cannot be more than sign + 7 bits, we need to divide up the multiplication, just like how we did separate multiplications for 10s and 1s in the decimal example above.

Here are the bits of B grouped in chunks of 7 (X, Y, Z, R) preceded by a sign bit S:

SXXXXXXX YYYYYYYZ ZZZZZZRR 

Now we can write the expression as:

Product P = A * B >> 23 

P = (A * (B[22:16] << 16 + B[15:9] << 9 + B[8:2] << 2 + B[1:0])) >> 23 

 

Substituting the values for the B-bits:

B[22:16] = B >> 16 = b1

B[15:9]  = (B >> 9) & 0x7F = b2

B[8:2]   = (B >> 2) & 0x7F = b3

B[1:0]   = (B & 0x03) = b4

 

P = (A * ((b1 << 16) + (b2 << 9) + (b3 << 2) + b4)) >> 23 

P = (A * (b1 << 16)) >> 23 + (A * (b2 << 9)) >> 23 + (A * (b3 << 2)) >> 23 + (A * b4) >> 23

and finally

P = (A * b1) >> 7 + (A * b2) >> 14 + (A * b3) >> 21 + (A * b4) >> 23

We now have four separate 24 x 8 bit multiplications.

The ESP does not have support for doing all the four parts in hardware, but it can do b1 using MAC and then b2 using DMAC. This increases the precision to 15 bits (sign + 2 * 7 bits), at the cost of an additional cycle. 

  

Substituting b1 back to B shows us how:

MAC: 

acc = (A * (B >> 16)) >> 7

During MAC, both A and B are stored in their full 24bit representation so they are available to the next instruction. Then a flag is set that tells the next instruction that the previous one was MAC.

DMAC: 

Now, A and B are still available, and we can substitute b2 to get

acc += (A * ((B >> 9) & 0x7F) >> 14

We know that the ESP implicitly does a >> 7 after every multiplication*, so we have to keep that separate. That leaves us with 7 more shift rights. We cannot shift b2 7 bits - it is 7 bits only so the result would be 0. Instead, we have to shift A, leaving us with:

acc += ((A >> 7) * ((B >> 9) & 0x7F)) >> 7  

In other words, running MAC and DMAC on A and B means

acc =  (A * (B1 >> 16)) >> 7  + ((A >> 7) * ((B >> 9) & 0x7F)) >> 7

The precision has increased from 7 to 14 bits (+sign). It is still not a completely correct 24 x 24 bit multiplication, but it lets us work with a precision that is good enough for rock'n roll.

(* it can do 3,5,6,7 but default is 7)


PS: the accumulator in the ESP may be cleared between every instruction - the flag that decides this is set either in the instruction itself or in a memory location. For MAC + DMAC to work it has to be set to clear = true before MAC and clear = false before DMAC.

Friday, January 16, 2026

Mix and oscillator amplitudes, more research

I just can't let this one go. The code appears to add a linear amount of the side oscillators to the sum, but Adam Szabo says differently - the center oscillator is attenuated and the outer ones follow a curved gain.

I just had a happy accident. I am trying to find the contribution of each oscillator to the total if one normalize the sum - saying the total should always be 1.

However, I only added one single outer oscillator - but the output was very interesting:

Here it is compared to the graph in the article:

 

Here, the value of oscillator 1 is 1 / (1 + mix), whereas the plot for the others is mix / (1 + mix). The plots are quite similar! It really makes me want to understand this even more!

But - if I assume that ALL 6 outer oscillators should be included in the normalization, everything breaks down, so clearly that's not correct.

Going back to the article, we have this graph (Figure 9):

Note that the amplitudes of 5, 6 and 7 are higher than 1, 2, 3.

In the text, Szabo says that 2, 3, 5, 6 and 7 are removed, leaving 1 and 4, as illustrated in the first graph.

I don't know if he DID measure those too, but there is a chance that they don't follow the exact same curve. We'll see if we can figure that one out.

Also, let's renumber the spikes in the plot to match the order in the detune_table:

[0, 318, -318, 1020, -1029, 1760, -1800], let's call them A-G to keep them separated

That gives us:

A = 4, B = 5, C = 3, D = 6, E = 2, F = 7, G = 1

In case it matters, what is called 1 here is actually the last element added in the summing in the ESP code. 

 

Now, I'm not entirely sure how to interpret Figure 9 in terms of "max wave amplitude". The spectrum has peaks of a certain width, not just a single frequency, and there are no units on the Y axis. If the scale is linear and one assumes that the amplitude of each wave in the result is actually propotional to the max value of the center oscillator, we get the numbers in the table (each pure frequency is a sine wave, so the highest peak would correspond to the root frequency of the saw waves, wouldn't it)?

It looks like that's what Szabo means that they are, so let's accept that.

If so, the sum of all amplitudes is definitely not 1. Could the perceived total "loudness" be equal if one uses dB instead of a linear scale? The total energy or something?  

And in any case, how does one go from the sum += saw[i] * mix to this thing? 

My own measurements

I wanted to confirm my understanding of Szabo's graph, so I fired up JE-8086 and coded a spectrum analyzer in web audio. I fed the audio from JE-8086 back to the input of my mac using the virtual microphone/input "VB-cable".

I used a linear Y-axis and a logarithmic X-axis. Then, with detuning at max, for every line on the mix pot (11 in total) I screenshot'ed the spectrum. Finally, I went through every graph, measuring the height in pixels.

Here is the first and last spectrum plots:



The measurements were of course wildly inaccurate, and the spacing between the mix values not quite even as I couldn't see the slider value in the display (and also, the slider resolution wasn't high enough). 

Here are the values:



And, more importantly, the plot of the values relative to max value of the center oscillator:


 



This is indeed very cool. It does confirm most of what Szabo described - the center oscillator is fairly linear and the others are definitely curved, and the center oscillator ends at an amplitude lower than the outer ones. There are some small differences though:

- I don't get the feeling that the outer oscillators actually go DOWN in amplitude at the end. The 7th oscillator appears to go slightly  down again, but I think it's more likely a measuring error. 

- The higher pitch / right hand oscillators have a higher amplitude than the lower ones. This matches what can be seen in Figure 9 in Szabo. The top oscillator is even higher though, and that does not match. I am still not sure if this is an artifact of the spectrum analyzer or if it is real. It could be an artifact of approximate multiplication or running average or something. 

- Something else to note: Both in mine and Szabo's spectrum analyzers, the minimum amplitudes for the outer oscillators is not zero. If the summing is actually saw[i] * spread, there should be no trace of the oscillator if spread is 0. Very strange. Also - why do they call it spread and not mix?

Tuesday, January 13, 2026

A closer look at the super saw code

In this post I try to understand most of the super saw code found by the Usual Suspects when reverse engineering the Roland/Toshiba TC170C140 ESP chip. Since I have no real DSP programming experience, it took some time to realize what is actually going on.
 
Presumably, this is the code for the ESP emulator itself: https://github.com/dsp56300/gearmulator/tree/main/source/ronaldo/esp 


Coefficients - floats 

The coefficients listed in the usual suspects' presentation, when represented as floats, are:

[0, 0.01953125, -0.01953125, 0.06225585, -0.0628662, 0.107421875, -0.10986328125]

if we multiply by 8192 we get

[0, 160,  -160, 509.9999, -514.99999, 880, -900]

and if we presume that the strings of nines are the result of a division somewhere, we get

[0, 160,  -160, 510, -515, 880, -900]

Pretty neat!

8192 is 2 to the power of 13. The real values may be that, or perhaps either 14 or 16 bits, so either  

[0, 320,  -320, 1020, -1030, 1760, -1800] 

or

[0, 1280, -1280, 4080, -4120, 7040, -7200] 

 

Coefficients - integers

The code example lists these integer coefficients

[0, 318, -318, 1020, -1029, 1760, -1800]

Now that's  really cool. Those are almost a prefect match for the 14 bit representation of the coefficients. 

There are some strange mismatches though - 318 and -1029 instead of -320 and 1030. 

I'm not sure WHY this is yet.

Chat GPT suggest:

The small mismatches (318 vs 320 etc) come from:

  • rounding
  • deliberate asymmetry to reduce beating regularity
  • truncation after multiplication
  • and accumulator overflow behavior

But the mismatches are there in the originals, not the calculated values, and if they had the originals, they would have used a shared factor of 16384 when dividing down to the floats (weirdly, even 1020 / 16384 is listed as 0.06225585, when the real value - without any rounding error - is 0.06225586).


Pitch value

The pitch value is how much we need to increase the saw wave for every sample. We now know that the JP8000 uses a sample rate of 88.2kHz, and 24bit accumulators that have a range of 16,777,216 (bipolar)

The exact pitch range of the JP8000 is not known, but let's for a start consider the midi standard.

Midi note 0 (C-1) has a frequency of approximately 8.18 Hz, while the highest note, MIDI 127 (G9), is around 12,544 Hz 

Running at 88200Hz, every cycle of

8.18Hz is 10782.396 samples long

12544Hz is 7.03125 samples long.

With a 24bit accumulator, each increment will be

for 8.18Hz:  1555.98, e.g. 1556

for 12554Hz: 2386092,942, e.g. 2386093.

 

From this, we can assume that, approximated

Pitch range is1556 to 2,386,093 

Knowing this, the only unknown in the detuning equation, is the int24_t detune parameter.

 

Detuning

Given the center oscillator frequency F0,

When using floats, the outer oscillators should have a frequency Fn:

Fn = F0 * (1 + floatCoefficient[n])

When using the integer coefficients, we instead get

Fn = F0 * (1 + integerCoefficient[n] / 2^14)

Or 

Fn = F0 * (1 + integerCoefficient[n] >> 14)

Now, doing the bitshift on the coefficient alone would lead to a massive loss of precision (or rather, all the coefficients would become 0), so at very least, we have to do the bitshift after multiplying with F0:

Fn = F0 + (F0 * integerCoefficient[n]) >> 14

Ah, this is starting to look like the code, exciting!  

 

Detune amount

There is a third number in the detune calculation. The code calls it "detune" but in reality it's detune amount. It says how much of the detune coefficient to apply. 

According to Szabo, pitch * coefficient is the _maximum_ amount to apply. That means the detune amount should be a ratio between 0 and 1. This, of course, is not possible using an integer, without a division following the multiplication. 

Let's take a look at the original equation:

int24_t voice_detune = (detune_table[i] * (pitch * detune)) >> 7 

We know that detune somehow should give us a scaling between 0 and 1, so let's ignore how we get there for a second and remove it. That leaves us with detune_table[i] * pitch. 

We also know that we should divide this result with 2^14, though that is not included in the equation. Let's include it still, it has to be there somehow.

A quick check of the multiplication and following bitshift in the detune frequency calculation:

The lowest possible value is (1550 * 318) >> 14, which is 30. (lowest frequency, lowest detune without detune amount scaling).

The highest possible value is  (2386093 * 1800) >> 14, which is 262144. (highest frequency, highest detune amount).

The results are within a 24bit int range. However, the intermediate value from the multiplication is 4,294,967,400, which is much higher than what can be stored in a 24 bit int. Under normal conditions, this would make everything overflow. To be able to properly store the multiplication result, we need a division, and one that happens before the result is stored. Something strange is clearly going on.

Side note: The max result is even slighty higher than what can be stored in an uint32 (4,294,967,295). At the same time, its so amazingly close that it's hard to believe that its just random? And actually, if we go back to the highest frequency, it's actually 2386092,942. With that fractional result, the product is 4,294,967,295.6 - a mere 0.6 above. This is too weird to be a coincidence? Also, it turns out that the max frequency isn't 12554, it's slightly lower. That would keep the detune within a uint32 range. Interesting.


Now, lets go back to the equation and reintroduce detune:

int24_t voice_detune = (detune_table[i] * (pitch * detune)) >> 7

pitch * detune, let's call it pitchWithDetuneAmount will be calculated first. Max pitch is 2386093, so detune may be up to 3 without overflowing. That doesn't make much sense.

Next,  detune_table[i] * pitchWithDetuneAmount is calculated. detune_table[i] is at most 1800, so it will overflow if pitchWithDetuneAmount is larger than 4660.

Finally, everything is divided by 128 

And all of this, with detune at max, should not be larger than 0.10986328125 * pitch.

In a normal situation, we should divide detune table by 2^14, and detune by a maxDetune to make it into a ratio between 0 and 1. It is highly probable that maxDetune would be a factor of 2 as well, to make division a bitshift here too.  

Side note: maxDetune needs to be high enough to represent the curve shown by Szabo, at least if there are 128 different values not linearly spaced apart.

But the mystery remains. Where have the remaining divisions gone? We see some division (>>7 is the same as / 128) but that's not enough and it's not in the right place to prevent overflow.

 

DSP magic

I admit it, I had to ask ChatGPT about this one. At first it was reluctant to admit that there are something that makes division/bitshifts superfluous, but then everything dropped into place!

Enter multiply-high

DSPs have two major tasks - summing and scaling. Summing is +, and scaling is multiplication followed by a division. 

In fact, scaling is so important that most DSPs do multiplications in a slightly different way. They multiply the two numbers into an accumulator with twice the bit count of the factors, but then it only returns _the high order_ bits. E.g. if it multiplies, say, two unsigned uint16_t variables a and b, it would store the intermediate result in a u32_t, but then only return the 16 MSB. This is equal to bit shifting >> 16 or dividing by 65536. If we let a be our signal value, and b a scaling factor, this turns b into a scaling between 0 and 1!

So there you go, we get free bit shifts, invisible in the code.

In other words, the code the Usual Suspects is showing is is not normal C code, it's DSP code (of course...) meaning the * does not do what it normally does, it also bit shifts.

 

Implicit bit shifts 

Lets go back to the code again and see how this works out

int24_t voice_detune = (detune_table[i] * (pitch * detune)) >> 7

 

First we do 

pitch * detune

which we called pitchWithDetuneAmount earlier.

By making detune max equal to the bitshift included in *, it becomes a ratio between 0 and 1. Hooray! 

Then we do 

detune_table[i] * pitchWithDetuneAmount

Again, * will introduce bitshifting.


If we go back to the paragraph about multiply-high, an int24_t * int24_t would result in a 48bit intermediate, of which the upper 24 are returned (it may be slight differences working with signed ints but the principle is the same).

Shifting >> 24 is fine for detune amount. It would mean we could use the full 24 bits as a factor, getting any detune amount curve we could possibly want.

But shifting detune_table * detuneAmount by 24 is too much. It should only be 14, and even then 7 of the shifts are done outside the parenthesis.

Now, it's not important exactly how the bit shifts are done to understand the code. We can just accept that the code

int24_t voice_detune = (detune_table[i] * (pitch * detune)) >> 7

is equal to

pitch * (detune / maxDetune)* detune_table[i] / 2^14

without overflows etc. It is only important if we want to run the exact code and use the same range for detune amount.


Different bit shift?

There is however a posibility that the DSP doesn't actually shift by 24. Maybe it shifts by 7? That would make  

(detune_table[i] * pitchWithDetuneAmount ) >> 7

the same as

detune_table[i] * pitchWithDetuneAmount >> 14 

in a normal system.

It would mean that detune amount has to be 7 bit if it uses the same multiplication-high, leaving 128 steps of detune amount. That does not work well with the Szabo curve, but it is possible. Or maybe the C code is just inaccurately translated from assembly and that they use different multiplication operations.


The DSP (ESP) emulator code for the JE-8086 is on github, so we can peak at what it actually does. I have not studied it in detail, but there are traces of a configurable bit shift multiplication.

Studying https://github.com/dsp56300/gearmulator/blob/main/source/ronaldo/esp/esp.hpp may shine some light on this.

multResult in esp.hpp shifts the result after multiplication, 5, 6 or 7 places. It uses two bits of the instruction to select the shift, and 0,0 will default to 7 bit shifts. That sounds exactly like what we are looking for.

So, while not proving that 7 is the correct answer here, at least it makes it plausible that the ESP does infact use a different shift than 24. 

UPDATE: The multiplication (kMAC in the code) does two bitshifts. First, it shifts the second term by >> 16, meaning it only uses the 8MSB. THEN it does an up-to 7 bit right shift). In other words, detune and spread are not 0-128, they are the full 24bit range but only the 8MSB are used. There is also a double precision multiplication available so it is possible that is used for higher resolution

Something that may support this theory is that on a slide about the ESP, it says that it has a 24 x 8 bit multiplier. This seems to indicate that it does NOT have a 24 x 24 bit multiplier, and that, combined with signed arithmetic where one bit of the 8 bit variable is used as sign, would make a shift of >> 7 quite plausible and 7 bit (positive value) detune the way to go. 

It does however leave a question as to how the multiplication of detune_table and pitch works since both are definitely > 128. Perhaps it does multiple 24 x 8 multiplications?

--> It looks like it is possible to do that. The final bitshift will always be >> 7 which makes the last >> 7 explainable. This would make the detune_table[i] * pitch >> 14 multiplication possible. Detune and spread would still have to be max 127.

Example: 24 × 24 multiply-high using three 24×8 blocks

According to ChatGPT. Test this!

Let the 24-bit multiplier be split into bytes:

B = b2·2^16 + b1·2^8 + b0

You compute:

P0 = (A·b0) >> 7 P1 = (A·b1) >> 7 P2 = (A·b2) >> 7

Then re-align:

Result = P0 + P1 << 8 + P2 << 16

Substituting:

= A·(b0 + b1·2^8 + b2·2^16) >> 7 = (A·B) >> 7 (approximately)

Update: Double precision multiplication

The ESP supports "double precision" multiplication. I have not yet fully understood the result but essentially it does exactly what is suggested above - it first multiplies A with the 8MSB of B. It then multiplies A >> 7 with (B >> 9) & 0x7F, or bits 15 to 8 (0 indexed) of B

first time: acc += ((mulInputA_24 * (mulInputB_24 >> 16)) >> shift)        

second time: acc += (((mulInputA_24 >> 7) * ((mulInputB_24 >> 9) & 0x7f)) >> shift)

... I am missing something here, TBC. 

Spread 

There is a third multiplication in the code, saw[i] * spread. Just as with detune, this looked very strange and would lead to an overflow in a normal system. But with multiply-high this too becomes a scaling from 0 to 1, just as we needed. It could, as detune, be a value between 0 and 127 and work fine with >> 7. Again, it's not important to the understanding of the code, we can just accept that it's a scaling factor.

 

Conclusion

There are some questions left unanswered. Shifting by 7 on * means that detune_table[i] * pitch may still overflow (that 32bit thing above, remember), and the curves of detune and spread can't be explained properly. And finally, as mentioned in the previous post, summing of the seven saws will overflow.

In general, however, the code looks like it could do exactly what we think. If we were to reimplement it we would just take care of these issues - increasing sum to 32bit and doing the appropriate bits shifts manually, and selecting whatever resolution for detune and spread that we want. 

The only thing I cannot explain at the moment is how Szabo could see a attenuation of the center oscillator when doing mix (spread), as that is not part of the code. Perhaps it is some kind of normalization effect, that the center oscillator contributes less to the total. Guess that one just has to remain a mystery for the time being. 

Thursday, January 8, 2026

The super saw code from the Usual Suspects

At the 39C3 conference, the Usual Suspects talked about how they reverse engineered the Toshiba DSP chip from the JP80x0. In itself an incredible feat, and a super exciting talk, but the one thing that REALLY caught my interest, was what they claim to be the code for the original super saw.

https://www.youtube.com/watch?v=XM_q5T7wTpQ&t=1804s 

They describe it as simply 7 saw waves, high pass filtered, with detuning, running at 88.2kHz to prevent aliasing within the audible range (?).

The code even shows the detuning coefficients, and state that it's integer maths, making them a bit hard to get right.

 

Now, of course I had to see if I understand the code. Here is a screenshot:

 

Not much. I assume next is run once per DAC update, i.e. 88200 times per second.

The saw oscillators are simply 24bit signed integers used as accumulators. For every round, "pitch" is added to the accumulator. Once the value reaches the max value that can be stored in a 24bit int, it overflows and wraps to negative minus. This way, by continously adding to the accumulator, we end up with a saw wave. 

Oh, btw - looking at the code, the initialization of the array is a bit strange. This being a global array, it should automatically be initialized to all 0s. {0} explicitly sets the first element to 0, why is that needed?

Let's for a start ignore detuning. If we set detune to 0, the whole voice_detune parameter goes away and saw[i] is just incremented by pitch for every cycle.

Also, let's set spread to 1, so all oscillators have the same amplitude.

 

Assuming the saw waves are in perfect phase, summing them would give us a saw wave that increases 7 times faster than a single wave. But then there is something weird. 

sum is also defined as a int24. My only way of understanding this is that it will overflow too, just like the saw wave accumulators. And that, would lead to a saw wave with the same amplitude as the individual waves, but with a frequency seven times higher!

Lets reduce the number of oscillators to 2 and introduce a phase shift of 25%. Without overflowing, this would lead to some tops higher than max, some lower than min and some cycles where the amplitude is less than min and max. But with overflowing, the parts above and below max/min fills in the gaps, and once again we're back to having a single waveform with a 2x frequency but the same amplitude:

Red horizontal lines are where the sum accumulator overflows.

 Now, the PHASE of the output wave is different from the initial wave. 

Here is a way of thinking about this. 

For every step. each saw wave contributes "pitch" to the sum. The saw waves wrap, but the rest of pitch will be added to the bottom. This is similar to having a single saw wave with 2*pitch increase for every step.

Now consider different pitch values for the two saw waves (=different frequencies). Each wave still contributes its pitch to the sum, creating a single saw wave with pitch equal to the sum of the two other saw waves.

This extends to the rest of the saw waves, adding another saw wave just adds its pitch to the sum. In the end, the seven waves end up as a single wave with its pitch being the sum of all the pitches. 

Here is an example. The grey line is all saws, with slight detuning, summed up without overflow (and plotted in a chart where y is at most 8 times that of a single saw wave. The blue line is the same waves summed with overflowing.

The horizontal lines divide the range into 8 parts, each corresponding to "one overflow".

 

 

If you look carefully, you can see that at every discontinuity, the part protruding above a grey line, is exactly the same as the part missing from the bottom and down to the previous grey line. When using overflow (or modulo), the top will wrap and be added to the bottom. Any of the divides that are empty, simply goes away in the wrapping, and we end up with the blue line.

 

Ok, that was a convoluted way of saying -  I don't understand how the sum code is supposed to work. Saw waves of any frequency will always combine to a single saw wave of higher frequency if the sum also overflows. As neither the frequency nor the detune of a wave changes, the sum wave will stay unchanged.

If sum was a 32bit int, this would work fine and we would get an ever changing combination of the waves. 

Detune and detune coefficients

Now, as for the other parts of the code, they have me confused as well, but maybe they use overflow as part of a trick? 

In other parts of the presentation, a comparison between Adam Szabo's coefficients and the "real" ones is done. The coefficients are fractions, small ones too. To calculate a detune frequency, one uses 

basefrequency * (1 + coefficient)

or 

basefrequency + basefrequency * coefficient.

In the code above, 

saw[i] = pitch + voice_detune

or 

saw[i] = pitch + ( detune_table[i] * ( pitch *  detune )) >> 7

Now, I presume the parenthesis are place the way they are for a reason, perhaps the parts inside the parenthesis overflow in a certain way that makes things work out, but substituting /127 for >> 7 and reordering gives us

saw[i] = pitch * (1 + detune_table[i] * detune / 128) 

The lowest coefficients are 128, and the lowers integer value for detune is 1. Following that, we end up with 

saw[i] = pitch * 2

This is clearly wrong. Perhaps the overflow inside the parenthesis, and the values chosen for detune, will lead to something that, when divided by 7, is always much less than (and propotional to) pitch?

As for the coefficients themselves, the individual propotions are not the same as for the fractional coefficients, so something strange is going on there as well.


Spread

Finally, we have "spread"

The outer saw waves are multiplied by spread before adding them to the sum. Presumably, this is the same as "mix" on the JP8000. 

But again, being integers, spread can only INCREASE the amplitude of the saw wave (or perhaps rather the pitch, since the product of saw[i] * spread will overflow. 

In Adam Szabo's paper, the center oscillator amount is reduced linearly, while the outer oscillators are increased by a curve:
 

Perhaps there is some kind of normalization going on, where, by increasing the outer oscillators, the relative contribution from the center one is decreased? 

Again, I'm confused. 

I have a feeling that at least one trick is used here. Since division is probably extremely expensive on a DSP which is built for Multiply and Accumulate, perhaps one instead uses multiply + overflow? (bitshift >> 7 is used to divide by 128, but this only works for powers of two).

 

I really wish someone could confirm a couple of things.

First of all, is the code completely correct - while I don't understand it at the moment, at least that would give me more confidence in looking for the solution

and

confirmation that the output of this code is indeed "samples" that, after filtering, will be output to a DAC (or the next DSP in the case of the JP8000).


My analog super saw

Years ago I build an analog 7 saw oscillator with control circuitry that emulated the control curves seen in Adam Szabo's paper. I had the curve for the detune pot using a three leg approximation, and something that looked close to the mix curves. 

I actually built the whole thing before I realised

1) Mixing the saw waves would lead to clipping if the headroom was not high enough and

2) Part of what makes the supersaw sound the way it does, is that it's digital (D'oh).

 

Adam wrote a few things too, in the paper or on a forum, I can't quite remember. Quoted from memory: - The naive approach of generating multiple saw waves would not work as the JP8000 was not powerful enough

- He had discovered some kind of trick that Roland would not tell the world about.

Not sure if that was all smoke and mirrors, but I was hoping that this last trick was somehow related to how to prevent the overflow while still keeping gain high.

 

Oh well. Time to go to bed.  

Saturday, January 3, 2026

Power-up sequencing

As there is a significant amount of power needed for the synth, it may be a good idea to not turn on everything at once.

Fortunately, the DC-to-DC converters have CTRL-pins on them. Setting this to GND disables the output, and setting a logic high turns on power.

I have not found the exact voltage needed for logic high on the converters I use (YLPTEC URA2412YMD-20WR3), but elsewhere I've found 3.5V for similar converters. Testing with direct connection to the 3v3 output pins on the Teensy was successful, but I'm still a bit skeptical about doing this "in production", in case it doesn't work.

The solution is to level shift the signals. At the same time, I don't want to invert the signal, so a simple transistor level shifter would require multiple parts.

My solution: Use TTL-compatible CMOS, the 74HCT series. By connecting the supply to 5V, the output logic will be 5V. At the same time, the input is 3v3 compatible. I've tested this with a 74HCT08 quad AND gate and it works perfectly.

Adding 4.7k resistors to the output makes sure the output stays at GND as long as the power is off (? these are not tristate outputs, so not sure this is a good or even needed solution. Perhaps a pulldown on the input instead or in addition is better) 

 I've ordered two cmos chips for testing:

74HCT164, a octal output shift register - this lets me control all voice card power using two pins of the microcontroller. There is no inhibit though, so the outputs have to be turned on in sequence

CD4504 - hex level shifter, with dual supply pins. 

We'll see what feels best later. I guess a power distribution board with some logic on it will be worth considering.

Monday, December 29, 2025

Secondary SMPS testing

I ended up ordering a batch of five DC-to-DC converter boards from JLCPCB. They take 9-36V in and gives +/-12V out, at 833mA per rail max. The voice cards draw 500mA per rail so that's about perfect.

 

To test everything, I found a 19V 2.4A power adapter from an old laptop. It measures 19.8V out which is fine. 

 

To keep the output within range of my Logic2, I used a resistor voltage divider with 100k on top and 200k on bottom, which should give us 8V out.

The DC-to-DC converter has no trimming option, so we're stuck with whatever it outputs. My multimeter says 12.05 and 11.98V, which I think is good enough for Rock'n Roll. 

I tried measuring the output, and while it's hard to know everything that affects the signal, here is my initial findings:

The output fluctuates between 7.797 and 7.807V, or 10mV. (or two steps on the Logic 2. The real fluctuation may be a bit higher).

 


 

It looks like the noise has a period of around 8uS, which means about 125kHz. Again, it's hard to tell if this is real and if it is caused by the converter itself, but this is what I got.

10 cycles take 81uSlog

 

I don't think I have any way of measuring ripple more closely than this, so next up is seeing if this is audible. 

Sunday, December 28, 2025

Dell adapters - are they useable?

I found several Dell laptop chargers on Finn ("Craigslist"), rated at 180W and 240W, 19.5V. 

I have no idea if they're any good when it comes to switching noise, but I've bought a 180W one for testing.

The adapters have a 7.4mm/5.0mm barrel connector, with an inner pin of < 1.0mm. I've found a 7.5mm/5.0mm connector from JLCPCB rated at 15A/30V, which should be ok even for 240W:

https://jlcpcb.com/partdetail/XUNPU-DC_505ACL060/C30607513

Also, I've found second hand wire harnesses from Dell laptops, that converts the barrel into 6 x 19.5V, 6 x GND and a single wire for signaling. It is hard to figure out what the ratings of these are. I've tried searching for the ones used in the top-end laptops, the ones most likely to use the 240W PSU. So far I've landed on the "0J60G1 / J60G1 / DC301015A00", which is used in some of the Dell Alienware PCs. 

I've ordered a cheap version off Ali Express, along with a 7.4mm/5.0m PCB connector and a 7.4 to 4.5mm converter with 20AWG wires inside - these are too thin for my fully spec'ed synth, but will let me cut the wire to get access to the wires, letting me measure ripple etc. 

I may also end up cutting the plug of the 180W charger if I get impatient, it was only about $15 on Finn. 

As for the center signaling pin, this is quite interesting - it lets me read the type of charger plugged in - I could for example check if the wattage rating is high enough for the synth. Not a big deal but a nice addition if I want to do proper power management - after all, I think I have full control of powering up the secondary SMPS'es - if the PSU is too weak I can leave the voice card unpowered.

So - Dell adapters, are they usable for what I want to do? Time will tell...
 

Making the circuit tolerate switching noise

This is a big one. Even the secondary SMPS'es will have some switching noise, perhaps as much as 80mV. 80mV on the VCO pitch CV will be a disaster. 

Let's assume for a second that the ground rail will stay somewhat clean and the noise is only present on the 12V/-12V power rails (not sure if this is true though).

As far as I understand, most op amps will not care much about this noise, they go about their business regardless, so I will have to look at other places the supply voltages are used. One place immediately springs to mind - the base voltage for the 1V/oct input on the VCO. This is connected via a resistor to the power rail. I may be able to get around this by using a voltage reference instead.

Most other stuff in the circuit is hopefully more resilient to noise, as the noise will mostly affect the amplitude of the signal. We will probably have the same issue around filter frequency and resonance controls. 

In short, I have to go through the voice card and modules to look at where the power rails are used directly. 

I suspected this would come back to bite me - I've read several times that one should not use the power rails as reference voltages, but it's so darn convenient, they're already there. Oh well, you live, you learn... 

Sunday, December 14, 2025

It needs more power!

Each voice card draws approximately 500mA from +12V and another 500mA from -12V.

My current Doepfer PSU supplies 1200mA per rail, so I can drive two cards from that.

Also, I see that the rails sag a bit, to 11.6V when fully loaded. That may be trimmed but who knows.

 

Anyway, time to start looking for other solutions. Here are some quick google results:

DIY High Power Supply

https://aisynthesis.com/eurorack-high-power-supply-build-guide/ 

dual 2.5A 12V and single 2.5A -12V. Uses an off-the-shelf regulator module, URB2412LD-30WR3 from Hi-Link: https://www.hlktech.net/index.php?id=165, which is able to step down up to 24W to 12V. It's powered from a 19.5V 6.92Amp PSU

 

 

100W and 70W PSUs with some of the same style but with additional circuitry

https://konstantlab.audio/shop/hammerpwr-100w-eurorack-modular-power-supply/ 

https://konstantlab.audio/shop/seventypwr-70w-eurorack-modular-power-supply/ 

 

Another 5A variant:

https://www.exploding-shed.com/befaco-trolley-bus/100453

The issue with all these are that they don't give me 5A on the -12V rail.

 

A nifty breadboard version:

https://www.exploding-shed.com/clacktronics-proto-psu/100597

 

Another breadboard version that looks like it can be used for my dual ground standard (outer power rails are +/-12V, inner are ground)

https://www.exploding-shed.com/transient-modules-breadboard-supply/100407

 

----

I consider using separate regulators near every voice card.  

Perhaps an URA2412LD-20WR3 can be used, it's supposedly +/-12V, +/-833mA, which is plenty for each channel.

There is also a similar URB24_YMD-20WR3 which has half the footprint. I can't see what else is different.

 

---

DIY:

https://nozoid.com/diy-eurorack-power-supply/

https://sdiy.info/wiki/Comparison_of_Eurorack_DIY_PSUs 

https://metatronicmods.weebly.com/store/p3/PSU-linear-old.html

https://www.my-adaptor.com/a135a022p-rev03-m-84108.html - adapter 

 

Tuesday, November 4, 2025

Midi proxy/Voice assigner

A tiny but exciting win today - I wrote the first line for the XM8 main controller, more specifically a midi proxy/distributor that copies all incoming midi to separate outs. This will let me support multiple voice cards without rewriting the interface (yet).

More importantly though, the proxy does a round robin voice assignment, so instead of forwarding note on/off messages directly, it sends the next note on to the least recently used voice (and makes sure to send note off first, at least for now. This paves the way for polyphony, which is something I've litteraly been waiting a decade for!

As for the communication, I use the serial TX and RX pins directly, from a Teensy 4.1 TX to a Teensy 4.0 RX. It worked immediately, no external components necessary.

It should also be possible to send MIDI at a higher bitrate, right now it runs at the standard 31250bps, I expect that at least 115200 should be problem free.

Don't cut the red wire! 
I'm actually reusing the original voice card controller prototype board for the MIDI proxy, as it has everything needed for receiving MIDI onboard and exposes pins for all the teensy pins. 

Next up is connecting two full voice cards. I can't wait, this is sooo cool!
 

Sunday, November 2, 2025

A bit about tuning the VCO

I've created a tuning algorithm that measures the VCO at 11 points one octave apart, and calculates the correct DAC voltage and steps between octaves. This is used to generate a lookup table that serves two purposes:

- It maps the 10.67 octaves available through midi to the 12.82 octaves available from the VCO

- It corrects tuning

I've also made a looping quick-measuring of the real frequencies of the oscillator using the expected DAC-voltage-to-frequency ratio. This allows me to extremely quickly tune the VCO response to as close to perfect 2 x ratio between octave voltages - a full 11 octave measurement takes around 1s so I get countinous feedback across the hole range while turning the trimmer pot .

 

There were a few pitfalls along the way. 

First of all, when the VCO hardware trimmer is tuned to the correct response, the VCO doesn't reach all the way down to the needed 8.18Hz. This has been fixed by adding a 1.5MOhm resistor from the secondary 1V/oct input jumper to -12V.

Second, I had a hard time getting a proper base frequency. I set the base note by measuring the frequency at 0V and then again at what should correspond to 1 octave up. This makes it possible to calculate the actual response (steps/octave), and based on this I guess the correct voltage for the base note.

It seems however, that there are some linearity issues at the very bottom of the response curve. By increasing the initial guess-voltage for the lowest frequency to slightly above 0V, I was able to get a consistent lowest-frequency guess, which is crucial for the rest of the tuning.


At the moment, tuning takes around 6 seconds. Whenever tuning is started, I use the default untuned/expected volts per octave. It is possible that instead using the current values may be better and could lead to a progressively more correct tuning, though at the moment this seems unnecessary. 

Monday, October 13, 2025

Testing new cards

I'm testing the new cards that arrived this summer - in particular the digital voice card controller

Errors found

VCO: 

- Square wave is probably inverted, just like on the previous version of the waveshaper

- When properly tuned for 1V/oct, the lowest frequency is > 8.1758 which means the VCO cannot be both calibrated to 1V/Oct using trimmers AND reach all MIDI notes. A fix is to add a 1.5MOhm resistor from the summing point (or before the secondary 1V/Oct input jumper) to -12V.

To do

- Tune VCO

- Trim WS 

To test

- VCO frequency vs waveshaper and square wave generator, is the phase inverted?

- Lowest possible frequency for VCO, looks like trimming down won't go as low as DCOs?

- Recon-filter and wave output from digital board, including switching on the analog board

- Control pins on the digital board, though since midi works we know we can control everything. 

What works

- Distortion! :-D 

- Noise

- VCO waves in general

- WS wave mixing is now perfect 

- Midi input on digital card

- DCO tuning

- CV DAC control with 5V and 50MHz signal (3v3 also works but only at 40MHz)

- Rear soldered header on the Teensy 4.0 

- I2C port expanders 

- Winbond W25Q128 flash chip integration from Teensy. Not tested with DCO but uses same circuit. Was tested with W25Q128_test.ino (adapted from https://github.com/msnbrest/W25Q128)

- PCB output /recon filter on digital board. However, output is 0-2V, e.g. not bipolar and not a big enough range.

Sunday, October 5, 2025

UI Progress

I've cleaned up most of the UI now, though it's far from finished. The output and global fx sections will not be completed until the hardware is developed, the same goes for the center console. Still, it's starting to look pretty good.

Right now, the design is strongly inspired by (some would say ripped off from) the Waldorf Quantum Mk I and II, Matrixbrute and Polybrute, with a dash of Moog One and Alesis Andromeda - incidently, all those synths are designed by Axel Hartmann...

Colors are hard though, and even harder when various degrees of shinyness are not visible in my current model. 

Here are some variations:

 

A Polybrute blue version, with a large black acrylic sheet in the center

 
A more Quantum-esque center display





Slightly darker module color

  
A black with lighter modules version


A version closer to the original Quantum MkI or Matrixbrute, with black modules on a lighter gray background


 
A more color coded version, dark modules are "audio", mid gray are modulation and light gray are global. Keyboard stuff is now directly on the background

Same as above but more Quantum-esque display again


 

Introduced vertical separation lines. They don't line up with the clock/route divider which annoys me. It can be fixed by switching osc modulators and note/detune, but that feels wrong too. I like the divider between noise and ring mod though




Added waveform icons


Wednesday, September 24, 2025

Cleaning up the UI

I'm about to make a prototype of the UI, and have been cleaning up the grid a bit. This is what it looks like right now:

 

Inspired by the Matrixbrute, I started adding boxes around the related parts. The spacings are way off, but it should give some idea as to how things are grouped:


 

I do have some problems with this though. The vertical slits don't line up, in fact, they make the whole thing look a bit messy. 

I did also make a ready-for-lasercutting mode, that adjusts all line thicknesses and changes pot and button sizes to fit the shafts, ready for prototyping. Here is the left panel again:


 Now, to see if I could do any better, and after looking at the Waldorf Quantum, I came up with the following (crude) version


 It follows a slightly more logical structure, with the sources top left, then the mixer and post-mixer effects. It also groups the LFO and Arp in two separate two-row areas which gives a nice balance. There is not quite enough room for everything, especially the Arp controls, and I've decided to remove the bit crusher level pot as it isn't particularly useful. I do like this quite a bit. 

Then I decided to color the current version too, and to my surprise, I think it looks cleaner than it did with the "real" colors:

I think I can make both versions work. I will prototype the finished version now, and perhaps I'll have time to revisit the second version in the future.

As for the right side - it also needs a bit of cleanup, there is no room for the frame here, and I guess I should redo the filter to work better visually on the 8 row grid if I choose to go with the modified left side. 

 

I also think I'll probably ditch the post-filter bit crusher as that can be done by a general DSP, and maybe I'll make a smaller group of pots and no screen (or small oled screens?) for the DSP part. I have even been thinking about doing per-voice FX, or at least chorus, though that would mean not using the FX send lines. We'll see.

Monday, September 22, 2025

Pot and button boards tested

I finally got around to testing the potentiometer and button breakout boards yesterday. They seem to work just fine.

Button board with integrated pullups and diodes on top, potentiometer board (six pots per 10p connector) on bottom

breakout board for buttons, making it easier to connect 8 dupont connector buttons to a single 10 connector

While the buttons worked straight away, the pots gave me a bit of a headache. First time around, nothing worked. Turns out, the connection between the potentiometer and the 10p IDC connector is not particularly good. Pushing on the pot made it work, but releasing it made it lose connecton again.

After comparing with the unmounted connectors I have in my drawer, I realised there are two types. One of them has the sprung contacts further out into the hole:


 

Top: Connectors from the drawer, the contact fills half the hole

Bottom: Connector that failed. The spring fills less of the hole - also, it looks like the edges of the pot legs, being slightly to wide, have dug themselves grooves at the side of the connector, meaning they may not be able to touch the contacts.

Pot inserted into the working connector

Pot inserted into the non-working connector. When pushing the pot down, the legs are pushed up slightly inside the connector, making contact.

pot inserted into the non working connector (slightly more rounded than the working one).


Friday, August 29, 2025

Adding overdrive to the ladder filter

If I need to add overdrive to the ladder filter, I can free up one CV channel already routed to the board by using the same CV for trimming 2 and 4 pole output. I just need to store the trimmer value for each and switch between then when switching mode.

Another thought is - the little phatty does not use trimmers on any of the OTAs, perhaps one can get away without them? 

Thursday, August 28, 2025

Little Phatty overdrive

I can't remember if I've written about this before, but I've certainly looked at it.

EDIT: Turns out I did, in great detail! https://atosynth.blogspot.com/2024/01/moog-overload-circuit.html

I'm trying to retrofit overdrive to my ladder filter, so I had a closer look at how the little (and slim) phatty does it. The slim phatty schematics are available online but the circuit is spread across multiple pages so it's a bit harder to see what is going on.

Here is a simplified schematic:

 

There are a few surprises here:

- there is no feedback from the filter output to input, which is the way the minimoog achieves overdrive.

- it uses a pretty standard soft clipping circuit with CV-controllable clipping - the overdrive of the minimoog filter, if I recall correctly, happens in the transistors of the differential amplifier at the end of the filter.

- the output of the overdrive is fed back to the oscilllator mixer and returned to the soft clipping circuit

- the mixer is not a normal inverting summer op amp - it's just an op amp buffer, with all inputs connected to the positive input. This is possible because the OTA outputs are current, not voltage outputs

- the differential amp of the filter is realised using op amps, which won't give soft clipping when overdriven. 

- the filter has an additional output gain OTA.

It's a pretty neat circuit, and all three OTAs - feedback, distortion and post-filter gain - are driven from the same CV. 

 All CVs are biased in various ways, to +5 or -5, I haven't studied exactly how they work in conjunction. Neither is it clear to me if the post filter OTA contributes to the overdrive in any way or if it just makes up for lost gain during overdrive.

 

Major take-away

Distregarding the post filter gain, all distortion happens before the filter, just as I'm currently doing in my synth. I don't have the additional feedback OTA though, whatever that does. 

 

Monday, August 25, 2025

JX-8P cross modulation

I've visited this before, but I got a bit confused when looking at the circuit, so here is some details of what is going on.

JX-8P cross modulation is in fact amplitude modulation of DCO-1 by DCO-2. 

Here is the circuit:

 

The strange four-triangles are switches. The three NPN transistors work as VCOs. Two of them are controlled by the CV/sample-and-hold circuits and control the output of each DCO to the mixer. The third, AM VCA, is controlled by the output of DCO-2.

As a side note, noise is injected through the 4.7k resistor in the top center, instead of DCO-1 (presumably, they could be mixing though but that makes the whole setup a bit strange.

Standard mode

Then switch D and C are closed (they receive the same control signal), the output of DCO-1 is routed to the DCO-1 VCA through D. C grounds the base of the AM VCA so it shuts off. In practice it works out something like this:


Cross modulation

When D and C are open, the DCO signal is routed through the AM VCA, like this:

 

Note that in both cases, the output of DCO-2 can still be added to the mix - but I'm not sure if the synth actually allows this or if it would have any use in practice.

DCOs

I was confused about the DCO saw generator too, for a little while - I couldn't see how it could have a variable charge current when the negative input was connected through a 100k resistor to the negative supply. That's of course wrong - the downwards arrow points to IC53 (and IC48 on the output) which are the "DCO Self Adj" (IC48) and the "Analog DMUX" (IC53, e.g. CV), so the charging current is CV controllable.

Further reading 

After studying the circuit I found a great explanation over at electric druid that has a more detailed explanation of what is going on - most interestingly, the saw wave is rounded after the VCAs. It is also stated that the output of the AM VCA is turned off when the input is negative, and speculated that this means that the saw wave is in fact never below gnd. This is highly likely, as the control voltages from IC48 and IC53 probably never go below 0.

https://electricdruid.net/roland-cross-mod-metal-sync/