crossdev AVR target ... can't get away from gcc-3.4.6 ...

eccerr0r

I guess this is not exactly a portage problem, but it is a programming problem...then again it's not Gentoo specific so it may be off topic -- except as a plea to never remove sys-devel/gcc-3.4.6 at least just for AVR! (Or has someone found an even better, older version that generates smaller code?!)

Anyway, what have people been doing to mitigate the bloat as versions go up?

I initially targeted a few of my AVR projects for the old at90s2313 (similar to attiny2313) and at90s4433 as I have a pile of these chips. (also a few others like atmega8515, atmega8, atmega168, atmega328; but haven't really tried targeting them). However my beef is that every version bump of gcc...

THE HEX IMAGE SIZE GETS LARGER AND LARGER!

I had a project that I developed with gcc-3.something initially, but eventually got to 3.4.6. I had squeezed into my 4K 90s4433 - a PID controller, with LCD/pushbutton local control, and serial-UART remote control. I got it packed down to just a few bytes to spare.

I could *not* get gcc-4 to get down to 4K. It exceeds it by 30-odd bytes.

Now I tried gcc-7 since it is now the default system compiler. DANGIT NOW ~60 bytes overflow!

No change in code, other than some compilation error fixes. Also avr-libc changed I suspect but I sparingly used functions in that. In fact using the same avr-libc and different avr-gcc I can get smaller code with 3.4.6.

Now I *really* hope I'm not getting retpolines in my AVR code... yeah, good luck trying to exploit the cache in my AVR (hint: there is none) but this bloating is prohibiting me from upgrading gcc.

Has anyone reasonably dealt with the bloat? I hope to within less than 0.1% code size increase with gcc upgrades?
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?

Hu · Moderator Joined: 06 Mar 2007 Posts: 21624

What do new versions do differently that causes more bloat? Are they using wider padding between useful content? Adding padding to places that previously had none? Making different inlining decisions? Choosing alternative space-inefficient opcodes? A side-by-side disassembly of the good and bad versions might be instructive, if you can read the resulting assembly.

Ant P. · Watchman Joined: 18 Apr 2009 Posts: 6920

The output of avr-objdump -x on the compiled binaries might be informative. Look for differences in the section list, that's where the size probably comes from.

eccerr0r · Posted: Wed Jul 18, 2018 3:29 am Post subject:

Allright I think I figured it out, at least for gcc-7.3 vs gcc-3.4.6.

Staring at the list files I noticed that 7.3 was duplicating the epilogue for whatever reason. Duplications? Unrolling? seems like an optimization of some sort, so I looked at my CFLAGS. Lo and behold, I'm using -O2 for both compilers.

Apparently the -O2 optimization causes tremendous bloat whereas it did not generate as much bloat as before. Using -Os seems to improve code size significantly, though there are portions of the PID controller I need -O2.

For my new project I don't need speed optimization, but size would be good -- so -Os or (-O1) is the fix.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?

Hu · Moderator Joined: 06 Mar 2007 Posts: 21624

-O2 prefers speed over minimizing size, so yes, some duplication is normal in some cases. Newer versions likely identified cases where duplicating the epilogue was projected to produce a faster program than not duplicating it. -Os is the right solution if you need minimal size.

steveL · Posted: Thu Jul 19, 2018 3:33 pm Post subject:

eccerr0r · Posted: Sat Jul 21, 2018 1:14 pm Post subject:

Oh this is sad: WTF:

Unoptimized C source code:

Hu · Moderator Joined: 06 Mar 2007 Posts: 21624

Assuming div is an integer type, and that all the math is done as integers, the refactor you proposed can change the results. Suppose div had value 800. As written, we compute (floor(800/1000)) * 5 = floor(.8) * 5 = 0 * 5 = 0. With your proposed reordering, we compute floor(800 / 200) = floor(4) = 4. Maybe the program is written in such a way that (div / 1000) is always an integer, in which case the reorder would be safe. From the fragment shown, the reorder appears to be unsafe.

eccerr0r · Posted: Sat Jul 21, 2018 7:42 pm Post subject:

Ah yes that would make sense.

Alas, in this case gcc is being too careful, my program actually prefers the "proposed reordering" because I did not want to multiply by 5 first -- to prevent an overflow (I want to compute 5*number/1000. If I multiply first, it would overflow so I divide first. Doing the mathematical equivalence of simply dividing by 200 solves both problems.)

The thing is that gcc has no way of understanding is that div cannot be lower than 1000 (actually the current code would only be wrong from 200 to 999), else it will break the analog portion of the circuit -- I don't have a wideband VCO :)

Not sure what's the best way to tell gcc to not be so careful, I would write in 200 but I will forget why I chose 200 -- writing it as 5*div/1000 would overflow and writing it as (div/1000)*5 would be the best way to self document this code...
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?

Hu · Moderator Joined: 06 Mar 2007 Posts: 21624

There are other cases where it would get an incorrect result than just in [200, 999). Consider div=1500:

eccerr0r · Posted: Sat Jul 21, 2018 8:53 pm Post subject:

Unfortunately, dividing by quantity of (1000/5) makes no sense in self-documentation, though is also mathematically correct.

And back to the original issue, the quantity x/200 is always correct because if I had 32-bit integers, multiplying by 5 would then be possible -- (div*5) is the actual number I want to work with, not (div). Thanks for the warning that gcc is doing the wrong thing for me completely and the only solution is to do x/200, which both saves memory AND is mathematically correct. Fortunately the actual range of x is very limited due to the analog hardware limitations (I wish I can do a 8MHz to 9MHz DDS with a microcontroller with 0.777Hz resolution; I can deal with a square wave but a fractional hertz resolution sine wave is best).

Ultimately I want to pretty print (5*div) -- separate out digits every 3 digits -- easiest way to explain the function of the code. Wasting 4 bytes on a MCU with only 256 bytes of RAM (plus all the downstream bloat to deal with 32-bit ints) is the problem.

...and wow, I should have gotten a warning on precision loss here... need to find that flag to warn me ... So much for gcc assuming that I want to floor everything.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?

Hu · Moderator Joined: 06 Mar 2007 Posts: 21624

Precision loss with integers is normal when dividing and then multiplying back up. I would expect gcc not to warn about that by default, because it is so common and usually is intended. If div were a float, there would be no implicit floor.

eccerr0r · Posted: Sun Jul 22, 2018 12:34 am Post subject:

Oh gawd thanks a lot (sarcastically) :)

Actually after reading my code again, I was taking account the precision loss but forgot about it... because later in the code I added the lost precision back :) Then you confused me and made me forget that I had already taken it into account. But then actually div/200 would be wrong.

Oh well. I opted to rewrite with the smallest mathematically "correct" code with hopefully no overflow losses and without the need for compensation - in the least self-documenting method of code, which probably is more readable than the precision loss compensation anyway. Now this fortunately compiles to the smallest code of all, so all the better.