Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
CFLAGS for various Athlons?
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2  
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
ligne
n00b
n00b


Joined: 23 Jun 2004
Posts: 4
Location: where am i, and what the fsck is going on? (bristol, europe)

PostPosted: Thu Oct 13, 2005 3:27 pm    Post subject: Reply with quote

playfool wrote:
I know several people build Athlon-MP setups using Athlon-XP and some clever hardware hacking, to the best of my knowledge the chips are very similar and since not many MP chips are required and they are built by the same specs - I assume that the odds that a given set of XPs would work as MPs with the correct gates connected are quite high. (I'm not encouraging this it will void warrenty and can make your system go boom - if you break it you get to keep both pieces)


when i last reseated my heatsink, a couple of years back, it magically changed from being an XP (1900+ to be precise) to an MP.

it doesn't seem to have done anything beyond change the name my BIOS reports on boot though ;)

just thought i'd add to the confusion...
_________________
['Dial' in Welsh means 'revenge'...If a welsh speaker enters an English phone booth, having paid the coin/phone card the next command the l.e.d screen throws at the customer is 'revenge!' thus setting the tone for a confrontational conversation]
Back to top
View user's profile Send private message
iTux
Guru
Guru


Joined: 07 Sep 2004
Posts: 586
Location: Toronto

PostPosted: Sat Oct 15, 2005 6:04 pm    Post subject: Reply with quote

dannysauer wrote:
yngwin wrote:
I don't understand why you're so concerned about space and memory, unless you use an old box with very limited harddisk space and limited ram. I use -O2 -finline-functions (the only extra one in -O3 that's worth using on x86, as I understand it).

I don't have any documentation ready, but I do read up on what knowledgeable users, both on this forum and elsewhere, say and recommend, and I follow up on their links, as well as the GCC doumentation. I don't claim to be an expert, but following the advice of experts and reading on the subject whenever I come across it, has led me to use these flags, with good result. I have a stable and speedy system that I am content with.


The size of the executable isn't due to a concern with drive space (the difference is negligable at that level) or really with RAM. I use -Os on my modern SMP systems rather than -O2, because the only difference is that -Os doesn't use the alignment optimizations. Those aligmnet optimizations do things like inserting extra space before function calls, loop jump targets, etc. While that can result in a very small performance increase (basically due to simplified mathematical operations), it also spreads code out. That spreading out of code increases the chance of cache misses. On a uniprocessor system with a modern processer that has lots of on-chip cache, that's not a huge deal. However, on an SMP system with separate chips - and thus separate on-chip caches - the performance hits of cache misses can be pretty sigificant - significant enough to outweigh the minor alignment benefit. Having to fetch stuff from system memory is *way* slower than fetching from processor cache - it's like the difference between swapping and fitting in to physical memory. It's this space concern that's the main reason why it's a bad idea to buld everything with -O3 (well, that and -funroll-loops results in slower code when the number of iterations isn't know in advance, which is the case in lots of loops that I write - and presumably in other coders' stuff).

I guess I'm technically worrying about memory, after all, but I'm really worrying about L1 and L2 cache usage, rather than system memory. And there's not much you can do to increase the size of the on-chip cache, which is typically on the order of 128-512K. My pretty current Athlon MP system, for example, has 1.5GB RAM, but the chips only have 256K of cache. When you're tyring to fit as much code as possible into 256K, it's worthwhile to worry about space. Not to mention that compilation time is slightly improved over -O2, and significantly improved over -O3. Referring to is as optimizing for size is deceptive, though, since people do usually think of system memory and drive space first - forgetting about the cache which is arguably more important. :)

That said, I was able to find more information on the SSE thing which agreed - basically the SSE implementation on the Athlons isn't all that awesome - it's more for compatability. The Athlons do, however, have a kick-ass 387 unit (which is what's used in place of SSE).


hi there,

What if you have a small loop that if correctly aligned will fit entirely in a cache line and in the fetch buffer?

You want hot loops aligned for performance. The keywords is hot as if the compiler inlines everything, yes, it increases code size and can increases cache misses.

I don't know much about the AMD architecture implementations... But aligning hot loops on a POWER4/POWER5 does make a significant difference in performance.


iTux
Back to top
View user's profile Send private message
tg90nor
n00b
n00b


Joined: 18 Oct 2004
Posts: 9
Location: Norway

PostPosted: Sun Oct 16, 2005 11:49 am    Post subject: Reply with quote

Gentree wrote:
As for the kernel I think it is probably safe to try and you will either a get a kernel that will boot or no. Since it is basically a hamstrung Hammer core it should be safe but dont say I advised you to do it :wink:


I have a mobile sempron 2800+(s754)@1600Mhz with a k8 kernel, works like a charm. :wink:
I have the following CFLAGS: -O2 -march=i686 -pipe -fomit-frame-pointer, gives generally good performance, but I am planning to experiment with them to get a faster system. :D
Back to top
View user's profile Send private message
extraketchup
n00b
n00b


Joined: 21 Jun 2004
Posts: 29
Location: Maine

PostPosted: Sun Oct 16, 2005 1:13 pm    Post subject: Os vs O2 Reply with quote

Howdy Folks,

I thought I'd throw 2 pennies into the Os vs O2 debate. I've been using Os not so much because lack of memory, drive space, or even cache (primarily because I didn't think about the cache), but because for me the bottleneck isn't the speed of the program when it is running in RAM, but when loading a program from the real bottleneck - storage. Once an application is loaded, unless it is a 3D game, I notice no difference between optimizations. Not that there are no differences, as benchmarks will show, but since the computer is already faster than I am, I can't notice those differences. What I do notice is the wait from when I click on an icon until the program actually opens. Consider Behemoth OpenOffice.org, which can take some time to load. If compiled via Os instead of O2 (which last I checked, required some fiddling with the ebuild), it should load faster, which is what interests me.

I suppose if I had an application where a few % of speed matters (like transcode), I'd be more conscious of optimizing for the CPU vs size.

Anyone else consider this aspect of size vs loop optimization?

EK
_________________
There are more things in heaven and earth, Horatio, Than are dreamt of in your philosophy.
Back to top
View user's profile Send private message
nixnut
Bodhisattva
Bodhisattva


Joined: 09 Apr 2004
Posts: 10974
Location: the dutch mountains

PostPosted: Sun Oct 16, 2005 2:49 pm    Post subject: Reply with quote

Have you tried prelinking?
_________________
Please add [solved] to the initial post's subject line if you feel your problem is resolved. Help answer the unanswered

talk is cheap. supply exceeds demand
Back to top
View user's profile Send private message
extraketchup
n00b
n00b


Joined: 21 Jun 2004
Posts: 29
Location: Maine

PostPosted: Sun Oct 16, 2005 4:01 pm    Post subject: Reply with quote

Quote:
Have you tried prelinking?


Yes, I actually use it (I've never benchmarked it, but I've read it works well). However, my understanding is that prelink deals with reducing the overhead of dynamically linking libraries into specific memory addresses. Os, on the other hand, reduces the overall file size, which reduces amount of data flowing over IDE cable (or SATA these days), thus speeding application launch time. I would also think it would make Linux's RAM disk cache more effective (by allowing more files to be stored in the cache at one time).

In other words, I would think combining prelink and Os would reduce load times of applications, which to me is more important than how fast my word processor or browser is when it is running (as long as it is faster than I am :) )

EK
Back to top
View user's profile Send private message
DntKnwHw
n00b
n00b


Joined: 14 Aug 2003
Posts: 74
Location: Philippines

PostPosted: Thu Oct 27, 2005 8:59 am    Post subject: Reply with quote

having read the thread, about aligning and cache misses,

I dont know if this would fit here, from what I know athlonxp has
a very good TLB, so it very not likely that it will have many cache misses.

so in my cflags, I dont use the

-prefetch-loop-arrays

and other prefetch flags . cause it hardware that natively do the job.

from what i remember in amd white papers, that they are promoting the use

-funrol-loop

because they have large cache size.

just my 2c, hope this help
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Goto page Previous  1, 2
Page 2 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum