Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Making full use of cpu registers in CFLAGS
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2, 3, 4, 5, 6, 7  Next  
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks
View previous topic :: View next topic  
Author Message
ghetto
Guru
Guru


Joined: 10 Jul 2002
Posts: 369
Location: BC, Canada

PostPosted: Mon May 05, 2003 7:54 pm    Post subject: Reply with quote

Ok thanks, thats good to know. Its pretty painful to recompile an entire system on a 1ghz cpu. :evil:
_________________
Blizzard you suck.
Back to top
View user's profile Send private message
taskara
Advocate
Advocate


Joined: 10 Apr 2002
Posts: 3763
Location: Australia

PostPosted: Mon May 05, 2003 10:08 pm    Post subject: Reply with quote

thanks gnufsh :)

I think I read on the other post that someone showed how adding -mmmx and -msse etc actually caused them to become -mnommx and -mnosse because they were already enabled in -march=athlon-xp.

so leaving them in won't hurt.. sweet as. ta..
_________________
Kororaa install method - have Gentoo up and running quickly and easily, fully automated with an installer!
Back to top
View user's profile Send private message
Gnufsh
Guru
Guru


Joined: 28 Dec 2002
Posts: 400
Location: Portland, OR

PostPosted: Tue May 06, 2003 3:51 pm    Post subject: Reply with quote

-mno-mmx, -mno-sse, and -mno-3dnow do show up on the "options enabled" part, but the macros that actually impliment the mmx, sse, and 3dnow support are still enabled. And, sse, mmx, and 3dnow are still enabled.
Back to top
View user's profile Send private message
taskara
Advocate
Advocate


Joined: 10 Apr 2002
Posts: 3763
Location: Australia

PostPosted: Tue May 06, 2003 10:37 pm    Post subject: Reply with quote

hmm ok..

so adding -mmmx, etc to -march=athlon-xp makes them look disabled, but they are in fact enabled still ?

crazy

we need to get someone from gnu to clarify optimum settings! ;)
_________________
Kororaa install method - have Gentoo up and running quickly and easily, fully automated with an installer!
Back to top
View user's profile Send private message
wrc1944
Advocate
Advocate


Joined: 15 Aug 2002
Posts: 3435
Location: Gainesville, Florida

PostPosted: Wed May 07, 2003 12:43 am    Post subject: Reply with quote

I agree- if some clarification from THE definitive source was forthcoming, that would be great, although it's hard to recognize what that source is. I guess this is what makes Linux interesting, but it would be nice if what seemingly was a straightforward procedure wasn't so ambiguous. Realizing different hardware is involved in each case, it would seem one could find out exactly what gcc optimizations actually do, or how they interact- but that knowledge is so far, elusive. Maybe it's just that nobody really knows, or the gcc manual has a few errors itself, and leads us astray? We can get conflicting reports, all from sources who obviously know more than we do, but what are we to make of it?

As one who has spent much time trying to optimize to the max, I now realize I don't know what to take as gospel, though I surely appreciate all the help I've gotten. At this point, I'm realizing I simply have to educate myself, and then do it myself, and see what happens. Apparently, when you're on the edge with your own hardware, there is no other way.

wrc1944
_________________
Main box- AsRock x370 Gaming K4
Ryzen 7 3700x, 3.6GHz, 16GB GSkill Flare DDR4 3200mhz
Samsung SATA 1000GB, Radeon HD R7 350 2GB DDR5
OpenRC Gentoo ~amd64 plasma, glibc-2.36-r7, gcc-13.2.1_p20230304
kernel-6.8.4 USE=experimental python3_11
Back to top
View user's profile Send private message
drdabbles
n00b
n00b


Joined: 08 May 2003
Posts: 31
Location: NH, USA

PostPosted: Thu May 08, 2003 3:51 am    Post subject: ~*smacks head*~ Reply with quote

Optimizations are like...well...I can't think of anything they are like. I can only say that optimizations are like recursion. "To understand recursion, you must first understand recursion".

One suggestion I would give to anyone with about 10 minutes of ree time, would be to grab the GCC source from GNU, unpack it, and grep the resulting directory for "i586", or some other optimization flag. Heck, you could even grep for "march" if you wanted to.

This will allow you to find out what file contains the optimization interpretation code. You can then open it in your favorite viewer/editor, and peruse to your hearts content. It is pretty straight-forward. You will be able to see what flags are enabled for which CPUs, etc.

With regard to getting anything from developers...most of them are just as confused as we are. There is no clear, concise documentation that discusses the effects of optimization flags...anywhere. There are people's opinions, and people that look at the macros, etc. I choose to believe the latter of the two, but again, they are just opinions as to the effects.

Also, I would like to say that it might be difficult to get an official response from the developers that work(ed) on the GCC compiler. They are an open source community, and many times it is difficult to pin down the specific person or group responsible for a particular segment of code. However, perhaps someone could try anyway.

Good luck all,
Thomas Cameron
CEI Systems, Inc.
Back to top
View user's profile Send private message
taskara
Advocate
Advocate


Joined: 10 Apr 2002
Posts: 3763
Location: Australia

PostPosted: Thu May 08, 2003 3:59 am    Post subject: Reply with quote

this is a pretty simple issue tho, surely someone who knows programming can test it and work it out ?

we aren't asking what optimisations do, we simply want to know if "-march=athlon-xp" enables "-mmmx, -m3dnow, and -msse" by default.

if it doesn't then we need to put them in our CFLAGS.

if it does, then we don't need to put them in our CFLAGS.

some people have said, put them in anyway. but others have said if you do that and -march DOES automatically put them in, then you will actually DISable them ...

it's all too confusing a situation, but surely there is a clear answer, either -march=athlon-xp DOES autmatically incorporate those flags or it doesn't.

surely someone knows that?
_________________
Kororaa install method - have Gentoo up and running quickly and easily, fully automated with an installer!
Back to top
View user's profile Send private message
ghetto
Guru
Guru


Joined: 10 Jul 2002
Posts: 369
Location: BC, Canada

PostPosted: Thu May 08, 2003 6:22 am    Post subject: Reply with quote

/me goes and writes stallman a letter..

Dear Mr Stallman,
I know your a pretty busy guy with your.. uh.. hmm.. well whatever it is that
your doing, but could you please explain to us how gcc works?
Particularly I would like to know how I can make gcc read my email.. oh wait.. thats Emacs, well then, uh.. could you please explain how to make gcc do proper optimizations? Its very important to me that I run the absolutly fastest binary code imaginable and if I have even the slightest doubt that my code is not completely optimized I break out in a horrid rash. Im sure you understand where Im coming from.
Any help would be appreciated.

Sincerely
Gnu/Ghetto
_________________
Blizzard you suck.
Back to top
View user's profile Send private message
taskara
Advocate
Advocate


Joined: 10 Apr 2002
Posts: 3763
Location: Australia

PostPosted: Thu May 08, 2003 6:47 am    Post subject: Reply with quote

ghetto wrote:
/me goes and writes stallman a letter..

Dear Mr Stallman,
I know your a pretty busy guy with your.. uh.. hmm.. well whatever it is that
your doing, but could you please explain to us how gcc works?
Particularly I would like to know how I can make gcc read my email.. oh wait.. thats Emacs, well then, uh.. could you please explain how to make gcc do proper optimizations? Its very important to me that I run the absolutly fastest binary code imaginable and if I have even the slightest doubt that my code is not completely optimized I break out in a horrid rash. Im sure you understand where Im coming from.
Any help would be appreciated.

Sincerely
Gnu/Ghetto


Dear Mr Clever,

yeeeees...

funny thing is he probably _would_ understand :P

on a serious note, I think this is a handy thing to know.

You obviously don't, but that's ok.

Sincerely,

a guy trying to help other linux users who asked the question.
_________________
Kororaa install method - have Gentoo up and running quickly and easily, fully automated with an installer!
Back to top
View user's profile Send private message
ghetto
Guru
Guru


Joined: 10 Jul 2002
Posts: 369
Location: BC, Canada

PostPosted: Thu May 08, 2003 8:06 am    Post subject: Reply with quote

ok so maybe i was a touch on the sarcastic side with that last comment.

i was only having fun, I honestly think this is pretty important as well.. but I just couldnt help but try to joke because in a way its kind of funny and my little imaginary email is true, some people really do break out in rashes if they think their code might not be optimized to the absolute degree..

:)
_________________
Blizzard you suck.
Back to top
View user's profile Send private message
taskara
Advocate
Advocate


Joined: 10 Apr 2002
Posts: 3763
Location: Australia

PostPosted: Thu May 08, 2003 8:15 am    Post subject: Reply with quote

gr00vy man... I didn't wanna come downl on you hash ;)

seriously tho.. an answer to the problem woudl be cool! ;) hehe
_________________
Kororaa install method - have Gentoo up and running quickly and easily, fully automated with an installer!
Back to top
View user's profile Send private message
ghetto
Guru
Guru


Joined: 10 Jul 2002
Posts: 369
Location: BC, Canada

PostPosted: Thu May 08, 2003 8:19 am    Post subject: Reply with quote

Why dont we send some real email then?
_________________
Blizzard you suck.
Back to top
View user's profile Send private message
taskara
Advocate
Advocate


Joined: 10 Apr 2002
Posts: 3763
Location: Australia

PostPosted: Thu May 08, 2003 11:00 am    Post subject: Reply with quote

cause don't wanna bother them ;) they have more important things to do ehe :)
_________________
Kororaa install method - have Gentoo up and running quickly and easily, fully automated with an installer!
Back to top
View user's profile Send private message
nico--
n00b
n00b


Joined: 29 Jul 2002
Posts: 59

PostPosted: Fri May 09, 2003 5:15 pm    Post subject: Reply with quote

taskara wrote:
cause don't wanna bother them ;) they have more important things to do ehe :)


Obviously, good documentation isn't important.

Look at gentoo... advanced command line installation but the good documentation makes it _much_ easier.
_________________
Quidquid latine dictum sit, altum viditur.
Back to top
View user's profile Send private message
m00dawg
Tux's lil' helper
Tux's lil' helper


Joined: 27 Jan 2003
Posts: 145
Location: Texas

PostPosted: Fri May 09, 2003 8:10 pm    Post subject: mfpmath results Reply with quote

I tried -mfpmath=sse,387 and then ran Pov-Ray. The results showed no real change in the results - two of the three results were even slightly slower. I don't know how this might impact other benchmarks, but for Pov-Ray it seems to make no difference. This is unfortuante since you would think the additional registers would help - if nothing else, temporary data could be placed there.

Malakin wrote:

I doubt using -mfpmath=sse,387 makes any actual performance difference with anything, someone please prove me wrong.
Back to top
View user's profile Send private message
m00dawg
Tux's lil' helper
Tux's lil' helper


Joined: 27 Jan 2003
Posts: 145
Location: Texas

PostPosted: Fri May 09, 2003 8:21 pm    Post subject: A solution? Reply with quote

Perhaps it would be a good idea if someone were to post up some benchmarks using different flags? I have already commented on Pov-Ray benchmarks vs using mfpmath=sse,387, but it would be interesting to see others.

An easy benchmark to run is timing a kernel build. It easy, reasonably fast, and the kernel is already there for you to play with :)
Back to top
View user's profile Send private message
_Edulix
n00b
n00b


Joined: 09 Dec 2002
Posts: 68

PostPosted: Tue Jun 10, 2003 9:48 am    Post subject: Some questions Reply with quote

Hi all!

I've read the whole thread, and I have somethings to say.

There' some people that uses very large CFLAGS, but don't know really what does their
options really do. I have been one of them yesterday when compiled my new gentoo system, but now I'm going to change my flags for some reasons.

Thanks you defconfoo, you have been very helper for me! 'Ive read in http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html the functions which are not included by -Ox, and there some of them about we must read before adding them to our CFLAGS:

Quote:

-fomit-frame-pointer
Don't keep the frame pointer in a register for functions that don't need one. This avoids the instructions to save, set up and restore frame pointers; it also makes an extra register available in many functions. It also makes debugging impossible on some machines.

On some machines, such as the VAX, this flag has no effect, because the standard calling sequence automatically handles the frame pointer and nothing is saved by pretending it doesn't exist. The machine-description macro FRAME_POINTER_REQUIRED controls whether a target machine supports this flag. See Register Usage.

Enabled at levels -O, -O2, -O3, -Os. (so, we don't need to add this because it's already if we use any of the -Ox function, isn't it?)

-ffast-math
Sets -fno-math-errno, -funsafe-math-optimizations,
-fno-trapping-math, -ffinite-math-only and
-fno-signaling-nans.

This option causes the preprocessor macro __FAST_MATH__ to be defined.

This option should never be turned on by any -O option since it can result in incorrect output for programs which depend on an exact implementation of IEEE or ISO rules/specifications for math functions.

(So, we should take care of not use the option if we use -Ox, or not ?)



Finally, this are the CFLAGS I've selected for my two gentoo machines:

Machine 1 with a Athlon XP 2000+, concretly:
Quote:

# cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 6
model name : AMD Athlon(tm) XP 2000+
stepping : 2
cpu MHz : 1673.825
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips : 3342.33

CFLAGS="-march=athlon-xp -O3 -fomit-frame-pointer -pipe-falign-functions=64 -mfpmath=sse,387 -m3dnow -msse -mmmx -ffast-math"

Machine 2, with a Celeron (Coppermine) 900 Mhz,
Quote:

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Celeron (Coppermine)
stepping : 10 cpu
MHz : 902.160
cache size : 128 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsrsse <<<<<---A celeron with sse :?
bogomips : 1765.37

CFLAGS="-march=pentium3 -O2 -pipe -fomit-frame-pointer -falign-functions=16 -ffast-math -m3dnow -msse -mmmx -mfpmath=sse,387"

What do you in adding -mfpmath=sse,387 for a celeron coppermine? does all of we actually need -fomit-frame-pointer if we already have -Ox? Would you change (add, delete..) any option of my my CFLAGS ?


Thanks you all,
Edulix.
Back to top
View user's profile Send private message
wrc1944
Advocate
Advocate


Joined: 15 Aug 2002
Posts: 3435
Location: Gainesville, Florida

PostPosted: Tue Jun 10, 2003 10:52 am    Post subject: Reply with quote

After reading man gcc many times, and going over all the info on the Gentoo forum's Cflags Central thread (about 20 pages, and very informative), and every other forum and info source I could find for months, I finally settled on what I thought was the best set of optflags for athlon-xp platforms. Here they are:

optflags: athlon -O3 -fomit-frame-pointer -pipe -march=athlon-xp -mmmx -msse -m3dnow -falign-functions=16 -falign-labels=1 -falign-loops=16 -falign-jumps=16 -fprefetch-loop-arrays -mfpmath=sse,387 -ffast-math -fforce-addr

I won't go into details about why I included, or removed specific flags here. I did try these on XFree86, and they built and installed fine, with only a few warnings and no errors, and after five days, no apparent problems have surfaced, and fine performance. Curiously, the XFree86 compile dropped the -ffast-math flag, but other packages keep it.

For what it's worth, this was done on Mandrake 9.1, as It's almost impossible to really utilize Gentoo's advantages on a 56k dialup connection without at least using the wvdial "resuming downloads" function (I must share the one phone line with others, so I never get more than 1-2 hours at a time). By the time I downloaded the equivalent of emerge world on dialup without the option of leaving my system on overnight, it would be obsolete.

wrc1944
_________________
Main box- AsRock x370 Gaming K4
Ryzen 7 3700x, 3.6GHz, 16GB GSkill Flare DDR4 3200mhz
Samsung SATA 1000GB, Radeon HD R7 350 2GB DDR5
OpenRC Gentoo ~amd64 plasma, glibc-2.36-r7, gcc-13.2.1_p20230304
kernel-6.8.4 USE=experimental python3_11
Back to top
View user's profile Send private message
xedx
Tux's lil' helper
Tux's lil' helper


Joined: 23 May 2003
Posts: 93

PostPosted: Tue Jun 10, 2003 10:59 am    Post subject: Reply with quote

i read it is advised to add -falign-functions=64 to your CFLAGS if you have an athlon/duron, does it make any difference on a pentium4?

btw how 'bout hyperthreading. I have a pentium4 with [ht] and another one without. ICC does have some optimizations on [ht] IIRC
Any thoughts
_________________
--+//+
Back to top
View user's profile Send private message
TheCoop
Veteran
Veteran


Joined: 15 Jun 2002
Posts: 1814
Location: Where you least expect it

PostPosted: Wed Jun 11, 2003 8:34 am    Post subject: Reply with quote

wrc1944 wrote:
optflags: athlon -O3 -fomit-frame-pointer -pipe -march=athlon-xp -mmmx -msse -m3dnow -falign-functions=16 -falign-labels=1 -falign-loops=16 -falign-jumps=16 -fprefetch-loop-arrays -mfpmath=sse,387 -ffast-math -fforce-addr
Just recompiled the world with these CFLAGS, nothing broke and it seems slightly faster... (except you dont need -m3dnow, -mmmx or -msse as -march=athlon-xp enables those anyway)
_________________
95% of all computer errors occur between chair and keyboard (TM)

"One World, One web, One program" - Microsoft Promo ad.
"Ein Volk, Ein Reich, Ein Führer" - Adolf Hitler

Change the world - move a rock
Back to top
View user's profile Send private message
wrc1944
Advocate
Advocate


Joined: 15 Aug 2002
Posts: 3435
Location: Gainesville, Florida

PostPosted: Wed Jun 11, 2003 12:56 pm    Post subject: Reply with quote

TheCoop,

Glad to here those flags work on Gentoo also. In all my reading, I ran accross many conflicting opinions about whether to include --mmmx, -m3dnow, and -msse. Some implied -march=athlon-xp did not in all cases automatically activate those opts. I decided to err on the side of caution, and add them, even if they aren't really needed with -march=athlon-xp.

I'd sure like to know one way or the other, but many persons who obviously knew more than I did felt you should include them.

wrc1944
_________________
Main box- AsRock x370 Gaming K4
Ryzen 7 3700x, 3.6GHz, 16GB GSkill Flare DDR4 3200mhz
Samsung SATA 1000GB, Radeon HD R7 350 2GB DDR5
OpenRC Gentoo ~amd64 plasma, glibc-2.36-r7, gcc-13.2.1_p20230304
kernel-6.8.4 USE=experimental python3_11
Back to top
View user's profile Send private message
bkeating
Tux's lil' helper
Tux's lil' helper


Joined: 22 Apr 2003
Posts: 77
Location: San Francisco, CA

PostPosted: Thu Jun 12, 2003 4:06 am    Post subject: Reply with quote

I don't quite understand the construction of this line... Im running a Pentium 4 (3.06Ghz) and these are the flags it gives me;

Code:
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm


I see a lot of guys here running P4's as well, where do they come up with "-fomit-frame-pointer" n such? Am I missing the format structure?

Would this be correct for me;

Code:
 

-march=pentium4 -03 -pipe -fpu -vme -de -pse -tsc -msr -pae -mce -cx8 -apic -sep -mtrr -pge -mca -cmov -pat -pse36 -clflush -dts -acpi -mmx -fxsr -sse -sse2 -ss -ht -tm



or can only a few be used?
Back to top
View user's profile Send private message
puddpunk
l33t
l33t


Joined: 20 Jul 2002
Posts: 681
Location: New Zealand

PostPosted: Thu Jun 12, 2003 9:48 am    Post subject: Reply with quote

bkeating wrote:
I don't quite understand the construction of this line... Im running a Pentium 4 (3.06Ghz) and these are the flags it gives me;

Code:
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm


I see a lot of guys here running P4's as well, where do they come up with "-fomit-frame-pointer" n such? Am I missing the format structure?

Would this be correct for me;

Code:
 

-march=pentium4 -03 -pipe -fpu -vme -de -pse -tsc -msr -pae -mce -cx8 -apic -sep -mtrr -pge -mca -cmov -pat -pse36 -clflush -dts -acpi -mmx -fxsr -sse -sse2 -ss -ht -tm



or can only a few be used?


Hi there bkeating.

You don't really have the grasp of it, but I'll be glad to point you in the right direction ;)

The first set of "flags" you pasted, were the CPU flags. It is a list of all the features your CPU supports, performance enhancing or not, it's basically everything that exists on your processor (i.e. mmx, sse, see2, ht (hyper-threading)).

The second lot of "flags" are Compiler flags. They are instructions to the compiler to tell it to compile code a certain way. Most flags start with a letter (i.e. -m, -f or -O).

-O means optimise (1,2,3 or s (small)). These flags turn on a whole lot of other flags (such as -fomit-frame-pointer (telling the compiler to omit the frame pointer, used for debugging only, so it frees up an extra CPU register).

-f options, such as -fomit-frame-pointer, or -ffast-math, tells the compiler to compile code a certain way. Most of the time it is to optimise.

-m options can be thought of as "feature options". Such as -mmmx turns on mmx support, -msse or -msse2 turns on sse(2) support in the compiled code.

You have to take the flags that /proc/cpu gives you and "translate" them into flags that gcc understands, so it can tailor it's code to your CPU. Thats what this thread is about. Also, there is another thread in the Portage & Programming forum about CFLAGS, called CFLAGS central, which would probably help you alot.

Hope this helps,
Chris.
Back to top
View user's profile Send private message
bsolar
Bodhisattva
Bodhisattva


Joined: 12 Jan 2003
Posts: 2764

PostPosted: Thu Jun 12, 2003 10:25 am    Post subject: Reply with quote

puddpunk wrote:
-O means optimise (1,2,3 or s (small)). These flags turn on a whole lot of other flags (such as -fomit-frame-pointer (telling the compiler to omit the frame pointer, used for debugging only, so it frees up an extra CPU register).

It's enabled only if the arch supports debugging without it, so on x86 -fomit-frame-pointer is not enabled with any -O (you must specify it if you want it).
_________________
I may not agree with what you say, but I'll defend to the death your right to say it.
Back to top
View user's profile Send private message
dgrant
Apprentice
Apprentice


Joined: 28 May 2003
Posts: 158
Location: Vancouver, BC, Canada

PostPosted: Fri Jun 13, 2003 2:46 pm    Post subject: Reply with quote

from freehackers.org:

Quote:

Athlon-tbird, aka K7 (AMD)

CFLAGS="-march=athlon-tbird -O3 -pipe -fforce-addr -fomit-frame-pointer
-funroll-loops -falign-functions=4 -maccumulate-outgoing-args"
CXXFLAGS="${CFLAGS}"

note : -m3dnow and -mmmx optimisations are implied by march=athlon-tbird


It says that 3dnow and mmx optimization are implied by athlon-tbird?
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks All times are GMT
Goto page Previous  1, 2, 3, 4, 5, 6, 7  Next
Page 4 of 7

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum