Saturday, December 31, 2022

This year's x-mas gift from Mikroelektronika: A compiler bug

I have spent the last month chasing down an extremely strange bug. When I added a new global variable to my DCO, calibration stopped working.

I first spent a long time trying to figure out what was going on, and after a couple of weeks I ended up posting a rather vague description to the MikroC forums. 

What I was seeing was very weird - I had a double loop, and the inner loop ran fine the first time around but the second time the inner loop ran it skipped the first part. The outer loop was never re-executed but sometimes the program continued in the mail function afterwards.

I did a lot of testing but the results were always similar, some code was skipped and sometimes everything crashed. It felt like the program pointer moved to a different address. I started suspecting some kind of memory leak.

After the post in the forums I was asked to provide a minimal example - I normally do this anyway but this time I had a hard time doing so, as everytime I changed the program it started working.

Finally, I resorted to placing all variables and functions at absolute positions, as I suspected that some kind of memory collision was the problem. It took a looong time to get this right, and it wasn't always possible, but at last I managed to get a fully absolute-position program that still bugged in a similar way.

Now I gradually stipped the program down, re-running the code for every small change to make sure it still had the bug and that it could be brought back to working by moving something. I placed empty variables at absolute positions to prevent other, used variables from moving around, and also made sure that the empty spacers were not called upon anywhere in the code that was executed BEFORE the spot where the bug showed up.

This took me every spare moment from the 17th of December and the 13 days. Finally, yesterday, I had a minimal program with only a single function call inside the loop. Removing a single spacer variable made a single library variable (FLASH_write.savedINTCON) move too, and the program would start working. 

I was totally exhaused and posted a new question to the forums.

As I waited for answers, I used a diff tool to see if the two generated asm outputs from the working and non-working programs were different. They were not. I then did the same for the machine code listing file (.lst). Now I could see a single changed line! In the non-working version, a single instruction - MOVLB 0 - was missing. I took a screenshot and prepared to write a follow-up question.

When I logged in to the forum again, the user Janni had just posted an answer - and that was the exact same thing that I found. Janni confirmed that this was actually the error, the instruction changes RAM banks and when it is missing the function returns to the wrong part of the program, exactly what I was seeing.

Adding 'asm movlb 0;' after the call to FLASH_write fixes everything.

So here we are - it turned out to be a compiler bug after all. I am completely drained but incredibly happy that the error was found as I can't have any errors in the DCO.

Unfortunately, it seems Mikroelektronika stopped developing the MikroC compiler in 2019 so it is unlikely that the bug will ever be fixed. At least now I know what to look for if this ever happens again!

Happy new year everybody!

MikroC: Absolute positioning

I've spent the last weeks debugging a strange problem in MikroC, and to do that I had to absolute position every function and variable to make sure the program did not change in RAM and ROM.

Here is how to do that:

Global and local variables

Add absolute 0xXXXX to place at address 0xXXXX. If you have an initialiser it must be placed BEFORE the absolute. E.g.:

    unsigned short a absolute 0x0123;

    unsigned short a = 1 absolute 0x0123;

    unsigned short a[32] absolute 0x0123;

Variables need to be used somewhere to prevent the compiler from removing them. Also, the compiled code is different if the assignment is done on a separate line vs if it is done inline, which changes the size of functions.

For global variables they may be used in main simply by assigning to them. For arrays it's enough to assign something to the first position. E.g.:

    a = 1;

    a[0] = 1;

Function parameters

I have not been able to place these at absolute addresses

Internal library variables and function parameters

I have not been able to place these at absolute addresses

Functions

These are postfixed with org 0xXXXX:

    void myFunc() org 0x0123 { ... }

You may also put the address after an extern declaration in an h file:

    extern void myFunc() org 0x0123;

Library functions

These may also be positioned, just skip the function body:

    void libraryFunction() org 0x0123;

Collisions

Variable collisions are common, in cases where the compiler has decided that they won't conflict. In those cases, it is not possible to absolute only one of the colliding variables, as the compiler won't place anything else on an address occupied by an absolute'd variable.

That means that you have to set all variables to absolute for that particular address - and if one of them is a function parameter you're out of luck it seems as those cannot be absolutely placed (or at least I don't know how)