View previous topic :: View next topic |
Author |
Message |
TheCoop Veteran
Joined: 15 Jun 2002 Posts: 1814 Location: Where you least expect it
|
Posted: Tue Mar 25, 2003 12:20 pm Post subject: Making full use of cpu registers in CFLAGS |
|
|
It may suprise you, but -march=<cpu> does not turn on support for 3dnow,mmx, sse or sse2 even if your cpu supports it
Firstly, to check what registers your cpu does support just do a 'cat /proc/cpuinfo' and look for the 'flags:' line, anything your cpu support will be in there, inc mmx, 3dnow and sse/2
Next, to alter your CFLAGS to use those registers:
Code: | -mmmx -m3dnow -msse -msse2 | (delete any your cpu doesnt support)
If youve got sse support you can also add '-mfpmath=sse,387' so the maths uses both the sse and normal coprocessor registers, effectively doubling your math throughput.
This will result in much faster programs as well as more effective use of the cpu. _________________ 95% of all computer errors occur between chair and keyboard (TM)
"One World, One web, One program" - Microsoft Promo ad.
"Ein Volk, Ein Reich, Ein Führer" - Adolf Hitler
Change the world - move a rock |
|
Back to top |
|
|
charlieg Advocate
Joined: 30 Jul 2002 Posts: 2149 Location: Manchester UK
|
Posted: Tue Mar 25, 2003 1:08 pm Post subject: Hmm |
|
|
Can anybody testify as to the stability of this?
Plus, is this try of all flags that appear under cat /proc/cpuinfo?
You can use mine as an example.
Flags only:
Code: | # cat /proc/cpuinfo
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse |
Full output:
Code: | # cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Celeron (Coppermine)
stepping : 6
cpu MHz : 728.292
cache size : 128 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse
bogomips : 1433.60 |
_________________ Want Free games?
Free Gamer - open source games list & commentary
Open source web-enabled rich UI platform: Vexi |
|
Back to top |
|
|
charlieg Advocate
Joined: 30 Jul 2002 Posts: 2149 Location: Manchester UK
|
Posted: Tue Mar 25, 2003 1:15 pm Post subject: Re: Making full use of cpu registers in CFLAGS |
|
|
TheCoop wrote: | It may suprise you, but -march=<cpu> does not turn on support for 3dnow,mmx, sse or sse2 even if your cpu supports it |
The freehackers.org gccflags faq has more info, although I can't work out whether it agrees or disagrees with TheCoop's statement. _________________ Want Free games?
Free Gamer - open source games list & commentary
Open source web-enabled rich UI platform: Vexi |
|
Back to top |
|
|
ZuNBiD n00b
Joined: 17 Oct 2002 Posts: 22 Location: Portugal
|
Posted: Tue Mar 25, 2003 1:20 pm Post subject: |
|
|
... and where do I alter my CFLAGS ?? |
|
Back to top |
|
|
Beekster Apprentice
Joined: 26 Nov 2002 Posts: 268 Location: Sydney
|
Posted: Tue Mar 25, 2003 2:07 pm Post subject: |
|
|
You can set your CFLAGS in /etc/make.conf
from my /etc/make.conf
CFLAGS="-march=pentium4 -O3 -pipe -mmmx -msse -msse2 -fomit-frame-pointer" |
|
Back to top |
|
|
TheCoop Veteran
Joined: 15 Jun 2002 Posts: 1814 Location: Where you least expect it
|
Posted: Tue Mar 25, 2003 2:43 pm Post subject: |
|
|
you can check for yourself by verbosly compiling a very small empty c program, which lists all the flags used. It has the -march but it doesnt list -mmmx -m3dnow or -msse
I know its somewhere in the forums, try searching for 'gcc flags' or something, ill have a more thourough look when I get home
my cflags are Code: | march=athlon-xp -O3 -pipe -fomit-frame-pointer -ffast-math -mmmx -m3dnow -msse -mfpmath=sse,387 | and ive had no problems at all, rock solid stable even with a barton overclocked 2500+ -> 2800+ _________________ 95% of all computer errors occur between chair and keyboard (TM)
"One World, One web, One program" - Microsoft Promo ad.
"Ein Volk, Ein Reich, Ein Führer" - Adolf Hitler
Change the world - move a rock
Last edited by TheCoop on Tue Mar 25, 2003 5:07 pm; edited 1 time in total |
|
Back to top |
|
|
Beekster Apprentice
Joined: 26 Nov 2002 Posts: 268 Location: Sydney
|
Posted: Tue Mar 25, 2003 3:18 pm Post subject: |
|
|
Just a warning that "man gcc" states:
Quote: | sse,387
Attempt to utilize both instruction sets at once. This effectivly double the amount of available registers and on chips with separate execution units for 387 and SSE the execution resources too. Use this option with care, as it is still experimental, because gcc register allocator does not model separate functional units well resulting in instable performance. |
I'm not saying it's unstable, just letting people know it is not claimed to be stable... Sounds like it could give good gains. Anyone else running with "-mfpmath=sse,387"? |
|
Back to top |
|
|
chadders Tux's lil' helper
Joined: 21 Jan 2003 Posts: 113
|
Posted: Wed Mar 26, 2003 6:46 pm Post subject: |
|
|
I compile with these CFLAGS and it works great for me:
CFLAGS="-march=pentium4 -O3 -pipe -mmmx -msse -msse2 -mfpmath=sse -pipe -fomit-frame-pointer -fthread-jumps -fforce-addr -frerun-cse-after-loop -frerun-loop-opt -fexpensive-optimizations -falign-functions=4 -falign-jumps=4"
Chad |
|
Back to top |
|
|
lotusvale Guru
Joined: 06 Mar 2003 Posts: 339 Location: Canada
|
Posted: Thu Mar 27, 2003 12:18 am Post subject: |
|
|
does athlon tbird 1.4 support sse, sse2?
(i know it supports 3dnow and mmx.)
thx _________________
-SuSe shot- | -G shot-
Shadowrider's Lair |
|
Back to top |
|
|
Malakin Veteran
Joined: 14 Apr 2002 Posts: 1692 Location: Victoria BC Canada
|
Posted: Thu Mar 27, 2003 12:32 am Post subject: |
|
|
Quote: | does athlon tbird 1.4 support sse, sse2? | "cat /proc/cpuinfo" to see what's supported. AMD cpu's don't support sse until Athlon-xp/Duron morgan(1ghz+). No AMD cpu's currently available support sse2. |
|
Back to top |
|
|
kappax Apprentice
Joined: 30 Aug 2002 Posts: 273 Location: The Moon
|
Posted: Thu Mar 27, 2003 1:19 am Post subject: |
|
|
Malakin wrote: | Quote: | does athlon tbird 1.4 support sse, sse2? | "cat /proc/cpuinfo" to see what's supported. AMD cpu's don't support sse until Athlon-xp/Duron morgan(1ghz+). No AMD cpu's currently available support sse2. |
xp2400's last i check support sse2 _________________ My Box
glxgears - 4083.400 FPS
OS: GNU/Linux
Distro: Gentoo
kernel: 2.6.0-test9-mm2
----------------------
vi makes me :wq in word pad |
|
Back to top |
|
|
lotusvale Guru
Joined: 06 Mar 2003 Posts: 339 Location: Canada
|
Posted: Thu Mar 27, 2003 1:51 am Post subject: |
|
|
thx.
Quote: | shadowrider@localhost shadowrider $ cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 4
model name : AMD Athlon(tm) processor
stepping : 4
cpu MHz : 1201.492
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr syscall mmxext 3dnowext 3dnow
bogomips : 2398.61
|
so for my tbird 1.4, would this setup be the fastest and best?
(i've read that O3 made the binaries bigger, and also it's slower at the end especially loading up the programs)
Code: | CFLAGS="-march=athlon-tbird -O2 -pipe -mmmx -m3dnow -fomit-frame-pointer -frerun-cse-after-loop -frerun-loop-opt -fexpensive-optimizations -falign-functions=4 -ffast-math -mfpmath=sse,387 |
or if anybody has a better setup? _________________
-SuSe shot- | -G shot-
Shadowrider's Lair |
|
Back to top |
|
|
Malakin Veteran
Joined: 14 Apr 2002 Posts: 1692 Location: Victoria BC Canada
|
|
Back to top |
|
|
kappax Apprentice
Joined: 30 Aug 2002 Posts: 273 Location: The Moon
|
Posted: Thu Mar 27, 2003 3:46 am Post subject: |
|
|
I am not too sure where Mr. tom gets his info rom. but you can gert some XP2400+'s with sse2 support, i know servel ppl that have them enabled, and have the bioses that support it on the k7s5a
check out the sis section and ocworkbench.com _________________ My Box
glxgears - 4083.400 FPS
OS: GNU/Linux
Distro: Gentoo
kernel: 2.6.0-test9-mm2
----------------------
vi makes me :wq in word pad |
|
Back to top |
|
|
Malakin Veteran
Joined: 14 Apr 2002 Posts: 1692 Location: Victoria BC Canada
|
Posted: Thu Mar 27, 2003 3:58 am Post subject: |
|
|
Quote: | I am not too sure where Mr. tom gets his info rom. but you can gert some XP2400+'s with sse2 support, i know servel ppl that have them enabled, and have the bioses that support it on the k7s5a | You're likely confusing this with plain sse support. Do a search for "sse2 barton" on google and you can read lots of reviews etc that mention the Barton doesn't have sse2 support. There have been no significant instruction set changes to the Athlon since the Palomino. sse2 support from AMD won't be seen until hammer. |
|
Back to top |
|
|
kappax Apprentice
Joined: 30 Aug 2002 Posts: 273 Location: The Moon
|
Posted: Thu Mar 27, 2003 4:11 am Post subject: |
|
|
Malakin wrote: | Quote: | I am not too sure where Mr. tom gets his info rom. but you can gert some XP2400+'s with sse2 support, i know servel ppl that have them enabled, and have the bioses that support it on the k7s5a | You're likely confusing this with plain sse support. Do a search for "sse2 barton" on google and you can read lots of reviews etc that mention the Barton doesn't have sse2 support. There have been no significant instruction set changes to the Athlon since the Palomino. sse2 support from AMD won't be seen until hammer. |
gaahh, i am getting confized ( i was at school so i could not look in my book marks :/ o well :/ _________________ My Box
glxgears - 4083.400 FPS
OS: GNU/Linux
Distro: Gentoo
kernel: 2.6.0-test9-mm2
----------------------
vi makes me :wq in word pad |
|
Back to top |
|
|
Vazagi n00b
Joined: 07 Jan 2003 Posts: 43 Location: Denmark
|
Posted: Thu Mar 27, 2003 4:56 pm Post subject: |
|
|
Quote: | It may suprise you, but -march=<cpu> does not turn on support for 3dnow,mmx, sse or sse2 even if your cpu supports it. |
I'm getting curious. How can it be that the '-mno-sse2' flag fixes problems with '-march=pentium4' using GCC 3.2.2, if '-march=<cpu-type>' doesn't enable these flags in the first place? The '-mno-sse2' fix seems to indicate that '-march=<cpu-type>' does enable these flags. =/ |
|
Back to top |
|
|
jesterspet Apprentice
Joined: 05 Feb 2003 Posts: 215 Location: Atlanta
|
Posted: Fri Mar 28, 2003 4:32 am Post subject: |
|
|
Vazagi wrote: | I'm getting curious. How can it be that the '-mno-sse2' flag fixes problems with '-march=pentium4' using GCC 3.2.2, if '-march=<cpu-type>' doesn't enable these flags in the first place? The '-mno-sse2' fix seems to indicate that '-march=<cpu-type>' does enable these flags. =/ |
Even though '-march=<cpu-type>' does not enable the sse2 flags, other optimisations can enable them when used in conjunctoin with '-march=<cpu-type>'. Exactly what those 'other optimisations' are is beyond me. I think you would have to play around with them to figure out what they are.
I used these flags and have had no problems* with GCC 3.2.2:
Code: | CFLAGS="-s -march=pentium4 -mmmx -msse -msse2 -Os -fomit-frame-pointer -pipe -fexpensive-optimizations -fpic -frerun-cse-after-loop -frerun-loop-opt -foptimize-register-move -masm=intel" |
* This includes the bug addresed here _________________ (X) Yes! I am a brain damaged lemur on crack, and would like to buy your software package for $499.95 |
|
Back to top |
|
|
vikwiz n00b
Joined: 01 Mar 2003 Posts: 50 Location: Budapest
|
Posted: Fri Mar 28, 2003 10:13 am Post subject: |
|
|
jesterspet wrote: | I used these flags and have had no problems* with GCC 3.2.2:
Code: | CFLAGS="-s -march=pentium4 -mmmx -msse -msse2 -Os -fomit-frame-pointer -pipe -fexpensive-optimizations -fpic -frerun-cse-after-loop -frerun-loop-opt -foptimize-register-move -masm=intel" |
* This includes the bug addresed here |
Are you *very* sure of this? Did you try that python int conversion stuff also?
I did compile a gentoo tree with -march=pentium4, and at the finish I saw that thread, tried the 'int' code, and it really fails. I didn't give a try to boot with this tree so. What makes the difference for you? All the other flags? I did only '-march=pentium4 -O2 -pipe'. |
|
Back to top |
|
|
cerri Bodhisattva
Joined: 05 Mar 2003 Posts: 2957 Location: # init S
|
Posted: Sat Mar 29, 2003 1:02 pm Post subject: |
|
|
There are a lot of people asking "which is the best setup for XYZ cpu?". So, why not post the best setup indexed by cpu?
IE: Pentium III (M) =
CFLAGS="-march=pentium3 -O3 -pipe -fomit-frame-pointeri -fforce-addr -falign-functions=4 -fprefetch-loop-arrays"
Anyway, as reported by freehackers.org, -march=pentium3 implies -mmmx -msse... _________________ Enjoy your freedom.
Sex is like hacking. You get in, you get out, and you hope you didnt leave something behind that can be traced back to you.
<----------------------->
Andrea Cerrito |
|
Back to top |
|
|
tempy n00b
Joined: 21 Mar 2003 Posts: 15
|
Posted: Sun Mar 30, 2003 7:42 pm Post subject: |
|
|
Okay, the source of the confusion seems to be that -march enables some CPU defines (e.g. -D__SSE__ -D__MMX__ -D__3dNOW__ -D__3dNOW_A__) but does not tell GCC to actually generate its own SSE, MMX or 3DNow assembly.
What's odd is that when you specify -mmmx or any of the other CPU features, gcc -v -Q decides you are actually requesting "-mmmx -mno-mmx" at the same time. Neither is present without it, and I don't know which takes precedence. :/
-mfpmath=sse doesn't override itself, but it may require -mno-80387 to actually work.
My CPU is an Athlon XP 1700+ and my default cflags are "-march=athlon-xp -O2 -ggdb -pipe". Short and to the point. |
|
Back to top |
|
|
link97381 n00b
Joined: 29 Mar 2003 Posts: 34 Location: Silverton Oregon
|
Posted: Mon Mar 31, 2003 4:16 pm Post subject: |
|
|
Ahhh man now I'm gonna have to recompile EVERYTHING!!! I have Dual Athlon XP's so should I set mine to Code: | march=athlon-xp -O3 -pipe -fomit-frame-pointer -ffast-math -mmmx -m3dnow -msse -mfpmath=sse,387 | or do you have any other suggestions? |
|
Back to top |
|
|
TheCoop Veteran
Joined: 15 Jun 2002 Posts: 1814 Location: Where you least expect it
|
Posted: Mon Mar 31, 2003 4:28 pm Post subject: |
|
|
At last!
to check which cflags are implied by certain options, create an empty.c:
and run:
Code: | gcc -v -Q empty.c <options> | , and all the cflags passed are in the output
with no options you get:
Code: |
*snip*
GNU C version 3.2.2 20030322 (Gentoo Linux 1.4 3.2.2-r2) (i686-pc-linux-gnu)
compiled by GNU C version 3.2.2 20030322 (Gentoo Linux 1.4 3.2.2-r2).
options passed: -lang-c -v -D__GNUC__=3 -D__GNUC_MINOR__=2
-D__GNUC_PATCHLEVEL__=2 -D__GXX_ABI_VERSION=102 -D__ELF__ -Dunix
-D__gnu_linux__ -Dlinux -D__ELF__ -D__unix__ -D__gnu_linux__ -D__linux__
-D__unix -D__linux -Asystem=posix -D__NO_INLINE__ -D__STDC_HOSTED__=1
-Acpu=i386 -Amachine=i386 -Di386 -D__i386 -D__i386__ -D__tune_i686__
-D__tune_pentiumpro__
options enabled: -fpeephole -ffunction-cse -fkeep-static-consts
-fpcc-struct-return -fgcse-lm -fgcse-sm -fsched-interblock -fsched-spec
-fbranch-count-reg -fcommon -fgnu-linker -fargument-alias -fident
-fmath-errno -ftrapping-math -m80387 -mhard-float -mno-soft-float
-mieee-fp -mfp-ret-in-387 -mcpu=pentiumpro -march=i386
*snip*
|
and with -march=athlon-xp added you get:
Code: |
*snip*
GNU C version 3.2.2 20030322 (Gentoo Linux 1.4 3.2.2-r2) (i686-pc-linux-gnu)
compiled by GNU C version 3.2.2 20030322 (Gentoo Linux 1.4 3.2.2-r2).
options passed: -lang-c -v -D__GNUC__=3 -D__GNUC_MINOR__=2
-D__GNUC_PATCHLEVEL__=2 -D__GXX_ABI_VERSION=102 -D__ELF__ -Dunix
-D__gnu_linux__ -Dlinux -D__ELF__ -D__unix__ -D__gnu_linux__ -D__linux__
-D__unix -D__linux -Asystem=posix -D__NO_INLINE__ -D__STDC_HOSTED__=1
-Acpu=i386 -Amachine=i386 -Di386 -D__i386 -D__i386__ -D__athlon
-D__athlon__ -D__athlon_sse__ -D__tune_athlon__ -D__tune_athlon_sse__
-D__SSE__ -D__MMX__ -D__3dNOW__ -D__3dNOW_A__ -march=athlon-xp
options enabled: -fpeephole -ffunction-cse -fkeep-static-consts
-fpcc-struct-return -fgcse-lm -fgcse-sm -fsched-interblock -fsched-spec
-fbranch-count-reg -fcommon -fgnu-linker -fargument-alias -fident
-fmath-errno -ftrapping-math -m80387 -mhard-float -mno-soft-float
-mieee-fp -mfp-ret-in-387 -mcpu=athlon-xp -march=athlon-xp
*snip*
|
Note the lack of any -mmmx, -m3dnow or -msse _________________ 95% of all computer errors occur between chair and keyboard (TM)
"One World, One web, One program" - Microsoft Promo ad.
"Ein Volk, Ein Reich, Ein Führer" - Adolf Hitler
Change the world - move a rock |
|
Back to top |
|
|
wharper n00b
Joined: 25 Mar 2003 Posts: 2 Location: Salt Lake City
|
Posted: Tue Apr 01, 2003 3:16 am Post subject: |
|
|
hmm, it seems that gcc set's the optimizations correctly within the compiler?
This is a snippet from: http://www.freehackers.org/gentoo/gccflags/faq.html
I have not looked at the source but this seems correct. This is also why you did not see the compiler flags on the command line when looking at the compiled binary.
In the gcc source, have a look at the file gcc-3.2/gcc/config/i386/i386.c Here's an excerpt :
Options implied by -march=
Code: | const processor_alias_table[] =
{
{"i386", PROCESSOR_I386, 0},
{"i486", PROCESSOR_I486, 0},
{"i586", PROCESSOR_PENTIUM, 0},
{"pentium", PROCESSOR_PENTIUM, 0},
{"pentium-mmx", PROCESSOR_PENTIUM, PTA_MMX},
{"i686", PROCESSOR_PENTIUMPRO, 0},
{"pentiumpro", PROCESSOR_PENTIUMPRO, 0},
{"pentium2", PROCESSOR_PENTIUMPRO, PTA_MMX},
{"pentium3", PROCESSOR_PENTIUMPRO, PTA_MMX | PTA_SSE | PTA_PREFETCH_SSE},
{"pentium4", PROCESSOR_PENTIUM4, PTA_SSE | PTA_SSE2 |
PTA_MMX | PTA_PREFETCH_SSE},
{"k6", PROCESSOR_K6, PTA_MMX},
{"k6-2", PROCESSOR_K6, PTA_MMX | PTA_3DNOW},
{"k6-3", PROCESSOR_K6, PTA_MMX | PTA_3DNOW},
{"athlon", PROCESSOR_ATHLON, PTA_MMX | PTA_PREFETCH_SSE | PTA_3DNOW
| PTA_3DNOW_A},
{"athlon-tbird", PROCESSOR_ATHLON, PTA_MMX | PTA_PREFETCH_SSE
| PTA_3DNOW | PTA_3DNOW_A},
{"athlon-4", PROCESSOR_ATHLON, PTA_MMX | PTA_PREFETCH_SSE | PTA_3DNOW
| PTA_3DNOW_A | PTA_SSE},
{"athlon-xp", PROCESSOR_ATHLON, PTA_MMX | PTA_PREFETCH_SSE | PTA_3DNOW
| PTA_3DNOW_A | PTA_SSE},
{"athlon-mp", PROCESSOR_ATHLON, PTA_MMX | PTA_PREFETCH_SSE | PTA_3DNOW
| PTA_3DNOW_A | PTA_SSE},
}; |
It also seems that -O3 does all of the other nifty flags as well?
Code: | if (optimize >= 1)
{
flag_defer_pop = 1;
flag_thread_jumps = 1;
#ifdef DELAY_SLOTS
flag_delayed_branch = 1;
#endif
#ifdef CAN_DEBUG_WITHOUT_FP
flag_omit_frame_pointer = 1;
#endif
flag_guess_branch_prob = 1;
flag_cprop_registers = 1;
}
if (optimize >= 2)
{
flag_optimize_sibling_calls = 1;
flag_cse_follow_jumps = 1;
flag_cse_skip_blocks = 1;
flag_gcse = 1;
flag_expensive_optimizations = 1;
flag_strength_reduce = 1;
flag_rerun_cse_after_loop = 1;
flag_rerun_loop_opt = 1;
flag_caller_saves = 1;
flag_force_mem = 1;
flag_peephole2 = 1;
#ifdef INSN_SCHEDULING
flag_schedule_insns = 1;
flag_schedule_insns_after_reload = 1;
#endif
flag_regmove = 1;
flag_strict_aliasing = 1;
flag_delete_null_pointer_checks = 1;
flag_reorder_blocks = 1;
}
if (optimize >= 3)
{
flag_inline_functions = 1;
flag_rename_registers = 1;
} |
my .02 |
|
Back to top |
|
|
janderson n00b
Joined: 01 Apr 2003 Posts: 1
|
Posted: Tue Apr 01, 2003 6:12 am Post subject: |
|
|
jesterspet wrote: |
I used these flags and have had no problems* with GCC 3.2.2:
Code: | CFLAGS="-s -march=pentium4 -mmmx -msse -msse2 -Os -fomit-frame-pointer -pipe -fexpensive-optimizations -fpic -frerun-cse-after-loop -frerun-loop-opt -foptimize-register-move -masm=intel" |
|
Seems you've gone to some trouble to optimize for speed, but you may be incurring some not-so-good penalties by using -Os. I believe -Os causes certain data types, functions and perhaps some other things to be misaligned to make the binary smaller. The misalignment will mean that you take a significant performance hit when fetching misaligned data. Since you probably have a good chunk of memory and a fast CPU, you probably don't want to use -Os.
Cheers,
jon |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|