The Roland/Toshiba TC170C140 ESP is a 24bit DSP, presumably fixed point. 24 bit fixed point DSPs store numbers as Q1.23, meaning one sign bit and 23 "value bits". The numbers, while strictly large integers, are treated as numbers from -1.0 to +0.9999999.
The ESP has an internal 24 x 8 multiplier, and does multiply-high, meaning it keeps a full multiplication result and then returns the upper 24 bits of the result (excluding any extended sign bits) by doing a bit shift right after the multiplication.
It is common for DSPs to do multiply high, as it makes the second factor behave like it's in the range -1 to 1 (which, I guess, is fixed point). It makes the multiplier a scaler, scaling the multiplicand between plus/minus the original value.
Multiply high in a 24bit fixed point DSP would multiply two 24bit numbers, then shift >> 23 and store the result as a 24bit integer (keeping the sign bit and 23 MSB of the multiplication).
The fact that the ESP multiplier is 24 x 8 does not mean that the second factor of the multiplication, as seen from the system side, is 8bit. It is actually 24 bits, but only the upper 8 bits (sign + 7 bits) are used during multiplication. The rest of the bits are discarded beforehand using >> 16, to make it fit within an 8 bit variable.
After the multiplication, the result is shifted a further 7 bits for a total of 23, making it a normal 24bit multiply high. However, the precision of the multiplier is only 7 bits + sign, e.g. only +127/-128 steps are available. This, of course, will often not be good enough.
Multiplication "by hand"
Let's take a step back and consider how multiplication is done by hand, to get a more intuitive feeling of what precision loss means:
67 * 52
First multiply 6 by 52 and align with the 10-place
6 * 52 = 312
Then multiply by the 7 and align with the 1-place, below
7 * 52 = 364
Finally, take the sum:
312_
+ _364
= 3484
Now let's pretend we're doing a multiply-high in a decimal system. We only have two places available to store the result, so we're dividing the multiplication with 100 afterwards. The result would be 34.
Then consider a multiplication where the precision of the first factor is a single place only - the resolution is 10s, not 1s.
In this case, the result of the multiplication would be 3120, and the final result 31. We've lost precision.
This is similar to what happens with the 24 x 8 >> 7 multiplication in the ESP.
But there is something interesting here. The full multiplication is done in two steps, one for the 1's and one for the 10's of the first factor, and the ESP is only doing one of them in its normal multiply and accumulate (MAC).
Double precision multiplication
Fortunately, the ESP provides a second multiplication instruction, called DMAC (Double precision multiply and accumulate). This lets us calculate another part of the multiplication using the bits below the 8 MSB.
To see exactly how this works, let's go back to multiply-high:
With A and B being 24 bit, we want to do A * B >> 23.
But as B cannot be more than sign + 7 bits, we need to divide up the multiplication, just like how we did separate multiplications for 10s and 1s in the decimal example above.
Here are the bits of B grouped in chunks of 7 (X, Y, Z, R) preceded by a sign bit S:
SXXXXXXX YYYYYYYZ ZZZZZZRR
Now we can write the expression as:
Product P = A * B >> 23
P = (A * (B[22:16] << 16 + B[15:9] << 9 + B[8:2] << 2 + B[1:0])) >> 23
Substituting the values for the B-bits:
B[22:16] = B >> 16 = b1
B[15:9] = (B >> 9) & 0x7F = b2
B[8:2] = (B >> 2) & 0x7F = b3
B[1:0] = (B & 0x03) = b4
P = (A * ((b1 << 16) + (b2 << 9) + (b3 << 2) + b4)) >> 23
P = (A * (b1 << 16)) >> 23 + (A * (b2 << 9)) >> 23 + (A * (b3 << 2)) >> 23 + (A * b4) >> 23
and finally
P = (A * b1) >> 7 + (A * b2) >> 14 + (A * b3) >> 21 + (A * b4) >> 23
We now have four separate 24 x 8 bit multiplications.
The ESP does not have support for doing all the four parts in hardware, but it can do b1 using MAC and then b2 using DMAC. This increases the precision to 15 bits (sign + 2 * 7 bits), at the cost of an additional cycle.
Substituting b1 back to B shows us how:
MAC:
acc = (A * (B >> 16)) >> 7
During MAC, both A and B are stored in their full 24bit representation so they are available to the next instruction. Then a flag is set that tells the next instruction that the previous one was MAC.
DMAC:
Now, A and B are still available, and we can substitute b2 to get
acc += (A * ((B >> 9) & 0x7F) >> 14
We know that the ESP implicitly does a >> 7 after every multiplication*, so we have to keep that separate. That leaves us with 7 more shift rights. We cannot shift b2 7 bits - it is 7 bits only so the result would be 0. Instead, we have to shift A, leaving us with:
acc += ((A >> 7) * ((B >> 9) & 0x7F)) >> 7
In other words, running MAC and DMAC on A and B means
acc = (A * (B1 >> 16)) >> 7 + ((A >> 7) * ((B >> 9) & 0x7F)) >> 7
The precision has increased from 7 to 14 bits (+sign). It is still not a completely correct 24 x 24 bit multiplication, but it lets us work with a precision that is good enough for rock'n roll.
(* it can do 3,5,6,7 but default is 7)
PS: the accumulator in the ESP may be cleared between every instruction - the flag that decides this is set either in the instruction itself or in a memory location. For MAC + DMAC to work it has to be set to clear = true before MAC and clear = false before DMAC.
No comments:
Post a Comment