Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
crossdev AVR target ... can't get away from gcc-3.4.6 ...
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9679
Location: almost Mile High in the USA

PostPosted: Wed Jul 18, 2018 12:46 am    Post subject: crossdev AVR target ... can't get away from gcc-3.4.6 ... Reply with quote

I guess this is not exactly a portage problem, but it is a programming problem...then again it's not Gentoo specific so it may be off topic -- except as a plea to never remove sys-devel/gcc-3.4.6 at least just for AVR! (Or has someone found an even better, older version that generates smaller code?!)

Anyway, what have people been doing to mitigate the bloat as versions go up?

I initially targeted a few of my AVR projects for the old at90s2313 (similar to attiny2313) and at90s4433 as I have a pile of these chips. (also a few others like atmega8515, atmega8, atmega168, atmega328; but haven't really tried targeting them). However my beef is that every version bump of gcc...

THE HEX IMAGE SIZE GETS LARGER AND LARGER!

I had a project that I developed with gcc-3.something initially, but eventually got to 3.4.6. I had squeezed into my 4K 90s4433 - a PID controller, with LCD/pushbutton local control, and serial-UART remote control. I got it packed down to just a few bytes to spare.

I could *not* get gcc-4 to get down to 4K. It exceeds it by 30-odd bytes.

Now I tried gcc-7 since it is now the default system compiler. DANGIT NOW ~60 bytes overflow!

No change in code, other than some compilation error fixes. Also avr-libc changed I suspect but I sparingly used functions in that. In fact using the same avr-libc and different avr-gcc I can get smaller code with 3.4.6.

Now I *really* hope I'm not getting retpolines in my AVR code... yeah, good luck trying to exploit the cache in my AVR (hint: there is none) but this bloating is prohibiting me from upgrading gcc.

Has anyone reasonably dealt with the bloat? I hope to within less than 0.1% code size increase with gcc upgrades?
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 21624

PostPosted: Wed Jul 18, 2018 1:33 am    Post subject: Reply with quote

What do new versions do differently that causes more bloat? Are they using wider padding between useful content? Adding padding to places that previously had none? Making different inlining decisions? Choosing alternative space-inefficient opcodes? A side-by-side disassembly of the good and bad versions might be instructive, if you can read the resulting assembly.
Back to top
View user's profile Send private message
Ant P.
Watchman
Watchman


Joined: 18 Apr 2009
Posts: 6920

PostPosted: Wed Jul 18, 2018 1:52 am    Post subject: Reply with quote

The output of avr-objdump -x on the compiled binaries might be informative. Look for differences in the section list, that's where the size probably comes from.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9679
Location: almost Mile High in the USA

PostPosted: Wed Jul 18, 2018 3:29 am    Post subject: Reply with quote

Allright I think I figured it out, at least for gcc-7.3 vs gcc-3.4.6.

Staring at the list files I noticed that 7.3 was duplicating the epilogue for whatever reason. Duplications? Unrolling? seems like an optimization of some sort, so I looked at my CFLAGS. Lo and behold, I'm using -O2 for both compilers.

Apparently the -O2 optimization causes tremendous bloat whereas it did not generate as much bloat as before. Using -Os seems to improve code size significantly, though there are portions of the PID controller I need -O2.

For my new project I don't need speed optimization, but size would be good -- so -Os or (-O1) is the fix.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 21624

PostPosted: Thu Jul 19, 2018 1:51 am    Post subject: Reply with quote

-O2 prefers speed over minimizing size, so yes, some duplication is normal in some cases. Newer versions likely identified cases where duplicating the epilogue was projected to produce a faster program than not duplicating it. -Os is the right solution if you need minimal size.
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Thu Jul 19, 2018 3:33 pm    Post subject: Reply with quote

eccerr0r wrote:
Apparently the -O2 optimization causes tremendous bloat whereas it did not generate as much bloat as before. Using -Os seems to improve code size significantly, though there are portions of the PID controller I need -O2.
You can do that on a function or file specific basis with GCC-specific macros (that are empty when not using gcc/of the correct version.)

Hmm, I'm not at my workstation, so I don't have files to hand. There's pragmas to push options, then you tell it what you want to switch, and pop when the code segment in question is done.
We have a GCC_OPTIMIZE one that allows -fflags, vs PUSH_GCC_DIAG and GCC_DIAG_IGNORE/ERROR for -Wblah, then POP_GCC_DIAG. The former is under (PUSH|POP)_GCC_OPTS.

Take a look in info gcc for Pragma, iirc under an "Implementation-specific" heading.

The warnings were the first compiler-specific we really had to tackle for builds, and that led to the -foptimize set, which can be very handy -- so long as you take care that things work as expected when they're not available.

Incidentally, clang is supposed to accept some gcc-specifics; but I haven't tested the pragmas (nor indeed other stuff; but we know that works based on __has_attribute; or might be "have", IDR.)

HTH.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9679
Location: almost Mile High in the USA

PostPosted: Sat Jul 21, 2018 1:14 pm    Post subject: Reply with quote

Oh this is sad: WTF:

Unoptimized C source code:
Code:
uint8_t updiv = (div/1000)*5;

Assembly in listing:
Code:
 181 00d8 392F                  mov r19,r25
 182 00da 282F                  mov r18,r24
 183 00dc E8EE                  ldi r30,lo8(-24)
 184 00de F3E0                  ldi r31,lo8(3)
 185 00e0 7F2F                  mov r23,r31
 186 00e2 6E2F                  mov r22,r30
 187 00e4 00D0                  rcall __udivmodhi4
 188 00e6 862F                  mov r24,r22
 189 00e8 65E0                  ldi r22,lo8(5)
 190 00ea 00D0                  rcall __mulqi3

Why didn't it optimize that code with -Os so that it's both smaller AND faster (i.e. divide by 200 and not break it up into two steps)? :( I guess I will have to hand tweak the code to shrink the code even smaller though for this instance, speed isn't a factor...
(The hand optimization saved 4 instructions/8 bytes with -Os, and -O1 is larger than -Os.)
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 21624

PostPosted: Sat Jul 21, 2018 4:38 pm    Post subject: Reply with quote

Assuming div is an integer type, and that all the math is done as integers, the refactor you proposed can change the results. Suppose div had value 800. As written, we compute (floor(800/1000)) * 5 = floor(.8) * 5 = 0 * 5 = 0. With your proposed reordering, we compute floor(800 / 200) = floor(4) = 4. Maybe the program is written in such a way that (div / 1000) is always an integer, in which case the reorder would be safe. From the fragment shown, the reorder appears to be unsafe.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9679
Location: almost Mile High in the USA

PostPosted: Sat Jul 21, 2018 7:42 pm    Post subject: Reply with quote

Ah yes that would make sense.

Alas, in this case gcc is being too careful, my program actually prefers the "proposed reordering" because I did not want to multiply by 5 first -- to prevent an overflow (I want to compute 5*number/1000. If I multiply first, it would overflow so I divide first. Doing the mathematical equivalence of simply dividing by 200 solves both problems.)

The thing is that gcc has no way of understanding is that div cannot be lower than 1000 (actually the current code would only be wrong from 200 to 999), else it will break the analog portion of the circuit -- I don't have a wideband VCO :)

Not sure what's the best way to tell gcc to not be so careful, I would write in 200 but I will forget why I chose 200 -- writing it as 5*div/1000 would overflow and writing it as (div/1000)*5 would be the best way to self document this code...
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 21624

PostPosted: Sat Jul 21, 2018 8:38 pm    Post subject: Reply with quote

There are other cases where it would get an incorrect result than just in [200, 999). Consider div=1500:
Code:
floor(1500 / 1000) * 5
floor(1.5) * 5
1 * 5
5
Code:
floor(1500 / 200)
floor(7.5)
7
I suppose that means we need to better understand which result you want it to compute. I was going to suggest that you just parenthesize it as div / (1000 / 5), with a comment if needed. gcc can solve constant equations like that and derive the / 200 term for you. However, writing up this new counterexample, I now wonder whether switching to an implementation that does floor(div / 200) (regardless of how it is written in the source) has changed the result in a way that changes the correctness of the code.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9679
Location: almost Mile High in the USA

PostPosted: Sat Jul 21, 2018 8:53 pm    Post subject: Reply with quote

Unfortunately, dividing by quantity of (1000/5) makes no sense in self-documentation, though is also mathematically correct.

And back to the original issue, the quantity x/200 is always correct because if I had 32-bit integers, multiplying by 5 would then be possible -- (div*5) is the actual number I want to work with, not (div). Thanks for the warning that gcc is doing the wrong thing for me completely and the only solution is to do x/200, which both saves memory AND is mathematically correct. Fortunately the actual range of x is very limited due to the analog hardware limitations (I wish I can do a 8MHz to 9MHz DDS with a microcontroller with 0.777Hz resolution; I can deal with a square wave but a fractional hertz resolution sine wave is best).

Ultimately I want to pretty print (5*div) -- separate out digits every 3 digits -- easiest way to explain the function of the code. Wasting 4 bytes on a MCU with only 256 bytes of RAM (plus all the downstream bloat to deal with 32-bit ints) is the problem.

...and wow, I should have gotten a warning on precision loss here... need to find that flag to warn me ... So much for gcc assuming that I want to floor everything.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 21624

PostPosted: Sat Jul 21, 2018 9:57 pm    Post subject: Reply with quote

Precision loss with integers is normal when dividing and then multiplying back up. I would expect gcc not to warn about that by default, because it is so common and usually is intended. If div were a float, there would be no implicit floor.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9679
Location: almost Mile High in the USA

PostPosted: Sun Jul 22, 2018 12:34 am    Post subject: Reply with quote

Oh gawd thanks a lot (sarcastically) :)

Actually after reading my code again, I was taking account the precision loss but forgot about it... because later in the code I added the lost precision back :) Then you confused me and made me forget that I had already taken it into account. But then actually div/200 would be wrong.

Oh well. I opted to rewrite with the smallest mathematically "correct" code with hopefully no overflow losses and without the need for compensation - in the least self-documenting method of code, which probably is more readable than the precision loss compensation anyway. Now this fortunately compiles to the smallest code of all, so all the better.
Code:
uint16_t updiv = div/200; // most significant 3 digits of the quantity (5*div)
uint16_t dndiv = 5*(div-updiv*200); // least significant 3 digits of the quantity (5*div)

Now if I could just sprintf("%'ld\n",5*div), this would have been just a memory... lots of it...
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum