View previous topic :: View next topic |
Author |
Message |
Lloeki Guru


Joined: 14 Jun 2006 Posts: 437 Location: France
|
Posted: Sun Sep 24, 2006 2:26 pm Post subject: |
|
|
concerning the family to choose, the same discussion as before should apply (that is ideal case is -march=conroe which doesn't exist yet), though if you want a 64bit kernel I can't tell you exactly what to do.
concerning frequency scaling:
this shows Conroe seems to support EIST (Enhanced Intel SpeedStep Technology), see *note about availability down the page though.
p4 clockmod is a no-go, but you could try enhanced-speedstep module. once laoded you should have the usual
/sys/devices/system/cpu/cpu*/cpufreq/ available. _________________ Moved to using Arch Linux
Life is meant to be lived, not given up...
HOLY COW I'M TOTALLY GOING SO FAST OH F***  |
|
Back to top |
|
 |
lcj Tux's lil' helper


Joined: 25 Apr 2004 Posts: 82 Location: Opole, Poland
|
Posted: Sun Sep 24, 2006 6:55 pm Post subject: |
|
|
Choose Intel Enchanced Speedsetp. Works fine on both cores, both with cupfreqd or gnome applet. _________________ --
Lukasz C. Jokiel via web |
|
Back to top |
|
 |
rhill Retired Dev


Joined: 22 Oct 2004 Posts: 1629 Location: sk.ca
|
Posted: Mon Sep 25, 2006 12:14 am Post subject: |
|
|
ok, i did one simple c++ benchmark using TraMP3d-v4. keep in mind it's just one benchmark.
the system used was a Toshiba Satellite A100 laptop with a Core Duo T2300 @ 1.66GHz (Yonah), 2MiB shared L2 cache, and 1GiB of memory. the GCC version used was 4.1-branch svn built yesterday.
-O2 -march=prescott -fomit-frame-pointer -pipe
Code: | dirtyepic@tycho ~/tmp $ /usr/bin/time /usr/bin/g++-4.1.2-pre20060923 -O2 -march=prescott -fomit-frame-pointer -pipe -Dleafify=flatten tramp3d-v4.cpp -o tramp3d-v4-prescott
95.45user 0.84system 1:35.69elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+202080minor)pagefaults 0swaps
dirtyepic@tycho ~/tmp $ ./tramp3d-v4-prescott -n 25 --cartvis 1.0 0.0 --rhomin 1e-8
Using
using [1,1,1] block setup for computation on domain [0:63:1,0:63:1,0:63:1]
solving eeq
time increments from [0, 1.79769e+308], cfl 0.5
starting at t = 0, i = 1
cell physical/total domain [0:62:1,0:62:1,0:62:1], [-2:64:1,-2:64:1,-2:64:1]
face physical/total domain [0:62:1,0:62:1,0:62:1], [-2:64:1,-2:64:1,-2:64:1]
periodic boundaries in X Y Z
i = 1 t = 0.00209225 dt = 0.00209225 (0.07124s/it)
i = 2 t = 0.00410537 dt = 0.00201312 (0.946142s/it)
i = 3 t = 0.00603889 dt = 0.00193352 (0.966466s/it)
i = 4 t = 0.00794139 dt = 0.00190251 (0.975241s/it)
i = 5 t = 0.00984636 dt = 0.00190497 (0.97465s/it)
i = 6 t = 0.0117508 dt = 0.00190449 (0.985882s/it)
i = 7 t = 0.013681 dt = 0.00193011 (1.0047s/it)
i = 8 t = 0.0156598 dt = 0.0019788 (1.00467s/it)
i = 9 t = 0.0176706 dt = 0.00201081 (1.00171s/it)
i = 10 t = 0.0197364 dt = 0.0020658 (1.0184s/it)
i = 11 t = 0.0218716 dt = 0.0021352 (1.01445s/it)
i = 12 t = 0.0240721 dt = 0.00220057 (1.00954s/it)
i = 13 t = 0.0263471 dt = 0.002275 (1.01139s/it)
i = 14 t = 0.0287159 dt = 0.00236875 (1.01714s/it)
i = 15 t = 0.0311533 dt = 0.00243738 (1.01269s/it)
i = 16 t = 0.0336768 dt = 0.0025235 (1.01118s/it)
i = 17 t = 0.0362863 dt = 0.00260952 (1.00748s/it)
i = 18 t = 0.0389715 dt = 0.00268521 (1.00433s/it)
i = 19 t = 0.0417381 dt = 0.00276665 (1.00053s/it)
i = 20 t = 0.0445873 dt = 0.00284919 (1.00177s/it)
i = 21 t = 0.0475216 dt = 0.0029343 (0.989871s/it)
i = 22 t = 0.0505258 dt = 0.00300413 (0.997915s/it)
i = 23 t = 0.0535938 dt = 0.00306807 (0.98717s/it)
i = 24 t = 0.0567043 dt = 0.0031105 (0.989589s/it)
i = 25 t = 0.0598233 dt = 0.00311892 (0.987146s/it)
Time spent in iteration: 23.9913
Correctness:
sum(rh) difference = 1.45519e-11
sum(vx) = -0.242582
sum(vy) = -0.295116
sum(vz) = -0.335474
sum(rh*T) difference = -297.099
dirtyepic@tycho ~/tmp $ analyze-x86 tramp3d-v4-prescott
Checking vendor_id string... GenuineIntel
Disassembling tramp3d-v4-prescott, please wait...
i486: 0 i586: 0 ppro: 130 mmx: 0 sse: 0 sse2: 0 sse3: 2
tramp3d-v4-prescott will run on Pentium IV (pentium4) w/ SSE3 or higher processor. |
-O2 -march=pentium-m -fomit-frame-pointer -pipe
Code: | dirtyepic@tycho ~/tmp $ /usr/bin/time /usr/bin/g++-4.1.2-pre20060923 -O2 -march=pentium-m -fomit-frame-pointer -pipe -Dleafify=flatten tramp3d-v4.cpp -o tramp3d-v4-pentiumm-plain
97.74user 0.74system 1:38.47elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (11major+200253minor)pagefaults 0swaps
dirtyepic@tycho ~/tmp $ ./tramp3d-v4-pentiumm-plain -n 25 --cartvis 1.0 0.0 --rhomin 1e-8
Using
using [1,1,1] block setup for computation on domain [0:63:1,0:63:1,0:63:1]
solving eeq
time increments from [0, 1.79769e+308], cfl 0.5
starting at t = 0, i = 1
cell physical/total domain [0:62:1,0:62:1,0:62:1], [-2:64:1,-2:64:1,-2:64:1]
face physical/total domain [0:62:1,0:62:1,0:62:1], [-2:64:1,-2:64:1,-2:64:1]
periodic boundaries in X Y Z
i = 1 t = 0.00209225 dt = 0.00209225 (0.0692961s/it)
i = 2 t = 0.00410537 dt = 0.00201312 (0.992859s/it)
i = 3 t = 0.00603889 dt = 0.00193352 (1.0033s/it)
i = 4 t = 0.00794139 dt = 0.00190251 (0.975363s/it)
i = 5 t = 0.00984636 dt = 0.00190497 (0.98926s/it)
i = 6 t = 0.0117508 dt = 0.00190449 (0.986304s/it)
i = 7 t = 0.013681 dt = 0.00193011 (0.997433s/it)
i = 8 t = 0.0156598 dt = 0.0019788 (0.99804s/it)
i = 9 t = 0.0176706 dt = 0.00201081 (1.00585s/it)
i = 10 t = 0.0197364 dt = 0.0020658 (1.00463s/it)
i = 11 t = 0.0218716 dt = 0.0021352 (1.01035s/it)
i = 12 t = 0.0240721 dt = 0.00220057 (1.00643s/it)
i = 13 t = 0.0263471 dt = 0.002275 (1.00908s/it)
i = 14 t = 0.0287159 dt = 0.00236875 (1.00359s/it)
i = 15 t = 0.0311533 dt = 0.00243738 (1.00683s/it)
i = 16 t = 0.0336768 dt = 0.0025235 (1.0018s/it)
i = 17 t = 0.0362863 dt = 0.00260952 (1.00395s/it)
i = 18 t = 0.0389715 dt = 0.00268521 (0.994894s/it)
i = 19 t = 0.0417381 dt = 0.00276665 (0.995252s/it)
i = 20 t = 0.0445873 dt = 0.00284919 (0.992024s/it)
i = 21 t = 0.0475216 dt = 0.0029343 (0.989914s/it)
i = 22 t = 0.0505258 dt = 0.00300413 (0.984155s/it)
i = 23 t = 0.0535938 dt = 0.00306807 (0.986609s/it)
i = 24 t = 0.0567043 dt = 0.0031105 (0.981239s/it)
i = 25 t = 0.0598233 dt = 0.00311892 (0.986686s/it)
Time spent in iteration: 23.9751
Correctness:
sum(rh) difference = 1.45519e-11
sum(vx) = -0.242582
sum(vy) = -0.295116
sum(vz) = -0.335474
sum(rh*T) difference = -297.099
dirtyepic@tycho ~/tmp $ analyze-x86 tramp3d-v4-pentiumm-plain Checking vendor_id string... GenuineIntel
Disassembling tramp3d-v4-pentiumm-plain, please wait...
i486: 0 i586: 0 ppro: 135 mmx: 0 sse: 0 sse2: 4 sse3: 0
tramp3d-v4-pentiumm-plain will run on Pentium IV (pentium4) or higher processor. |
-O2 -march=pentium-m -msse3 -fomit-frame-pointer -pipe
Code: | dirtyepic@tycho ~/tmp $ /usr/bin/time /usr/bin/g++-4.1.2-pre20060923 -O2 -march=pentium-m -msse3 -fomit-frame-pointer -pipe -Dleafify=flatten tramp3d-v4.cpp -o tramp3d-v4-pentiumm
97.73user 1.01system 1:38.05elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+197280minor)pagefaults 0swaps
dirtyepic@tycho ~/tmp $ ./tramp3d-v4-pentiumm -n 25 --cartvis 1.0 0.0 --rhomin 1e-8
Using
using [1,1,1] block setup for computation on domain [0:63:1,0:63:1,0:63:1]
solving eeq
time increments from [0, 1.79769e+308], cfl 0.5
starting at t = 0, i = 1
cell physical/total domain [0:62:1,0:62:1,0:62:1], [-2:64:1,-2:64:1,-2:64:1]
face physical/total domain [0:62:1,0:62:1,0:62:1], [-2:64:1,-2:64:1,-2:64:1]
periodic boundaries in X Y Z
i = 1 t = 0.00209225 dt = 0.00209225 (0.069342s/it)
i = 2 t = 0.00410537 dt = 0.00201312 (0.968165s/it)
i = 3 t = 0.00603889 dt = 0.00193352 (0.985737s/it)
i = 4 t = 0.00794139 dt = 0.00190251 (0.999364s/it)
i = 5 t = 0.00984636 dt = 0.00190497 (1.01105s/it)
i = 6 t = 0.0117508 dt = 0.00190449 (1.01161s/it)
i = 7 t = 0.013681 dt = 0.00193011 (1.02449s/it)
i = 8 t = 0.0156598 dt = 0.0019788 (1.02412s/it)
i = 9 t = 0.0176706 dt = 0.00201081 (1.02851s/it)
i = 10 t = 0.0197364 dt = 0.0020658 (1.02592s/it)
i = 11 t = 0.0218716 dt = 0.0021352 (1.03424s/it)
i = 12 t = 0.0240721 dt = 0.00220057 (1.0353s/it)
i = 13 t = 0.0263471 dt = 0.002275 (1.03373s/it)
i = 14 t = 0.0287159 dt = 0.00236875 (1.03266s/it)
i = 15 t = 0.0311533 dt = 0.00243738 (1.03526s/it)
i = 16 t = 0.0336768 dt = 0.0025235 (1.02011s/it)
i = 17 t = 0.0362863 dt = 0.00260952 (1.0232s/it)
i = 18 t = 0.0389715 dt = 0.00268521 (1.02476s/it)
i = 19 t = 0.0417381 dt = 0.00276665 (1.0153s/it)
i = 20 t = 0.0445873 dt = 0.00284919 (1.00431s/it)
i = 21 t = 0.0475216 dt = 0.0029343 (1.00313s/it)
i = 22 t = 0.0505258 dt = 0.00300413 (0.989761s/it)
i = 23 t = 0.0535938 dt = 0.00306807 (0.99909s/it)
i = 24 t = 0.0567043 dt = 0.0031105 (0.989536s/it)
i = 25 t = 0.0598233 dt = 0.00311892 (0.996134s/it)
Time spent in iteration: 24.3848
Correctness:
sum(rh) difference = 1.45519e-11
sum(vx) = -0.242582
sum(vy) = -0.295116
sum(vz) = -0.335474
sum(rh*T) difference = -297.099
dirtyepic@tycho ~/tmp $ analyze-x86 tramp3d-v4-pentiumm
Checking vendor_id string... GenuineIntel
Disassembling tramp3d-v4-pentiumm, please wait...
i486: 0 i586: 0 ppro: 135 mmx: 0 sse: 0 sse2: 0 sse3: 2
tramp3d-v4-pentiumm will run on Pentium IV (pentium4) w/ SSE3 or higher processor. |
-O2 -march=pentium-m -msse3 -mfpmath=sse -fomit-frame-pointer -pipe
Code: | dirtyepic@tycho ~/tmp $ /usr/bin/time /usr/bin/g++-4.1.2-pre20060923 -O2 -march=pentium-m -msse3 -mfpmath=sse -fomit-frame-pointer -pipe -Dleafify=flatten tramp3d-v4.cpp -o tramp3d-v4-pentiumm-sse
98.40user 0.94system 1:39.15elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (3major+198438minor)pagefaults 0swaps
dirtyepic@tycho ~/tmp $ ./tramp3d-v4-pentiumm-sse -n 25 --cartvis 1.0 0.0 --rhomin 1e-8
Using
using [1,1,1] block setup for computation on domain [0:63:1,0:63:1,0:63:1]
solving eeq
time increments from [0, 1.79769e+308], cfl 0.5
starting at t = 0, i = 1
cell physical/total domain [0:62:1,0:62:1,0:62:1], [-2:64:1,-2:64:1,-2:64:1]
face physical/total domain [0:62:1,0:62:1,0:62:1], [-2:64:1,-2:64:1,-2:64:1]
periodic boundaries in X Y Z
i = 1 t = 0.00209225 dt = 0.00209225 (0.0617449s/it)
i = 2 t = 0.00410537 dt = 0.00201312 (0.897831s/it)
i = 3 t = 0.00603889 dt = 0.00193352 (0.964484s/it)
i = 4 t = 0.00794139 dt = 0.00190251 (0.94189s/it)
i = 5 t = 0.00984636 dt = 0.00190497 (0.972172s/it)
i = 6 t = 0.0117508 dt = 0.00190449 (0.973818s/it)
i = 7 t = 0.013681 dt = 0.00193011 (0.984364s/it)
i = 8 t = 0.0156598 dt = 0.0019788 (0.988743s/it)
i = 9 t = 0.0176706 dt = 0.00201081 (0.996885s/it)
i = 10 t = 0.0197364 dt = 0.0020658 (0.997118s/it)
i = 11 t = 0.0218716 dt = 0.0021352 (1.00016s/it)
i = 12 t = 0.0240721 dt = 0.00220057 (0.99685s/it)
i = 13 t = 0.0263471 dt = 0.002275 (0.998231s/it)
i = 14 t = 0.0287159 dt = 0.00236875 (1.00025s/it)
i = 15 t = 0.0311533 dt = 0.00243738 (0.987068s/it)
i = 16 t = 0.0336768 dt = 0.0025235 (0.981898s/it)
i = 17 t = 0.0362863 dt = 0.00260952 (0.990963s/it)
i = 18 t = 0.0389715 dt = 0.00268521 (0.986071s/it)
i = 19 t = 0.0417381 dt = 0.00276665 (0.980461s/it)
i = 20 t = 0.0445873 dt = 0.00284919 (0.982345s/it)
i = 21 t = 0.0475216 dt = 0.0029343 (1.00055s/it)
i = 22 t = 0.0505258 dt = 0.00300413 (0.995297s/it)
i = 23 t = 0.0535938 dt = 0.00306807 (1.00189s/it)
i = 24 t = 0.0567043 dt = 0.0031105 (1.00527s/it)
i = 25 t = 0.0598233 dt = 0.00311892 (1.01299s/it)
Time spent in iteration: 23.6994
Correctness:
sum(rh) difference = 1.28966e-08
sum(vx) = -0.242582
sum(vy) = -0.295116
sum(vz) = -0.335474
sum(rh*T) difference = -297.099
dirtyepic@tycho ~/tmp $ analyze-x86 tramp3d-v4-pentiumm-sse
Checking vendor_id string... GenuineIntel
Disassembling tramp3d-v4-pentiumm-sse, please wait...
i486: 0 i586: 0 ppro: 84 mmx: 44 sse: 0 sse2: 3089 sse3: 0
tramp3d-v4-pentiumm-sse will run on Pentium IV (pentium4) or higher processor. |
Keep in mind that anything that does strip-flags (ie. GCC, glibc, kernel, etc.) will remove both -msse3 and -mfpmath from your C[XX]FLAGS
Very little difference in runtimes, maybe half a second, and next to no difference in compile time. Surprisingly,
-O2 -march=pentium-m -msse3 -fomit-frame-pointer -pipe was the slowest. I reran the test to be sure and it was slightly worse (24.5397s) than the original run.
It also appears -mfpmath=sse does not generate sse3 instructions. _________________ by design, by neglect
for a fact or just for effect |
|
Back to top |
|
 |
Lloeki Guru


Joined: 14 Jun 2006 Posts: 437 Location: France
|
Posted: Mon Sep 25, 2006 2:26 pm Post subject: |
|
|
hmm, some data
as you mentioned before, there is some interest for Core Duo (yonah) to add -mfpmath=sse, as -march=pentium-m alone will by default favor x87 for performance:
Quote: | -march=pentium-m prefers x87 over sse scalar code, because pentium-m can
decode sse at only half the rate of x87. You should see the speed
advantage clearly on pentium-m, presumably not on Core Duo. |
so using -mfpmath=sse takes advantage of a full-rate sse on Core Duo, which is observed.
Quote: | It also appears -mfpmath=sse does not generate sse3 instructions. |
this one is really interesting, too.
I have received my merom, but unfortunately at work a project is nearing completion (or should I say its deadline), so I don't even know if I'll have time to set it up, let alone benchmark it. _________________ Moved to using Arch Linux
Life is meant to be lived, not given up...
HOLY COW I'M TOTALLY GOING SO FAST OH F***  |
|
Back to top |
|
 |
wu-s n00b

Joined: 22 May 2006 Posts: 41
|
Posted: Mon Oct 02, 2006 5:16 pm Post subject: |
|
|
Lloeki wrote: | as a side note, mtune is redundant, as march implies enhancements of mtune, plus specifics. I don't know what precedence gcc gives to each one, but it may as well disable your glorious march optimisations, in favor of safer mtune ones. |
Maybe just to stress that point. I think one has to distinguish between those two options. Carefully read the third paragraph for the "generic" cpu-type:
The gcc-manpage tells us:
Quote: | Intel 386 and AMD x86-64 Options
These -m options are defined for the i386 and x86-64 family of comput-
ers:
-mtune=cpu-type
Tune to cpu-type everything applicable about the generated code,
except for the ABI and the set of available instructions. The
choices for cpu-type are:
generic
Produce code optimized for the most common IA32/AMD64/EM64T
processors. If you know the CPU on which your code will run,
then you should use the corresponding -mtune option instead of
-mtune=generic. But, if you do not know exactly what CPU users
of your application will have, then you should use this option.
As new processors are deployed in the marketplace, the behavior
of this option will change. Therefore, if you upgrade to a
newer version of GCC, the code generated option will change to
reflect the processors that were most common when that version
of GCC was released.
There is no -march=generic option because -march indicates the
instruction set the compiler can use, and there is no generic
instruction set applicable to all processors. In contrast,
-mtune indicates the processor (or, in this case, collection of
processors) for which the code is optimized.
...
pentium-m
Low power version of Intel Pentium3 CPU with MMX, SSE and SSE2
instruction set support. Used by Centrino notebooks.
pentium4, pentium4m
Intel Pentium4 CPU with MMX, SSE and SSE2 instruction set sup-
port.
prescott
Improved version of Intel Pentium4 CPU with MMX, SSE, SSE2 and
SSE3 instruction set support.
nocona
Improved version of Intel Pentium4 CPU with 64-bit extensions,
MMX, SSE, SSE2 and SSE3 instruction set support.
...
-march=cpu-type
Generate instructions for the machine type cpu-type. The choices
for cpu-type are the same as for -mtune. Moreover, specifying
-march=cpu-type implies -mtune=cpu-type.
|
So here is my interpretation of the options' semantics: "-march" selects the assembler-instructions available for compiling, e.g. gcc _can_ use sse3 instructions if you select nocona. In contrast, "-mtune" is used to optimize the assembly for the given cpu-type under the allowed instructions from "-march".
It´s trivial that "-march" implies "-mtune" if not otherwise stated. _However_ you can optimize for a different cpu-type with the allowed instruction set from "-march", can´t you?
So
Quote: | CFLAGS="-march=nocona -mtune=pentium-m -02 -pipe |
as stated in http://gentoo-wiki.com/Safe_Cflags#Intel_Core_2_Solo.2FDuo_.28Allendale.2C_Conroe.2C_Merom.29 seems to be promising, right?
-dirtyepic: Maybe you can run some benchmarks with that setting?
Cheers,
Sven |
|
Back to top |
|
 |
magoscuro n00b

Joined: 18 Mar 2005 Posts: 32 Location: Santiago de Chile
|
Posted: Mon Oct 02, 2006 6:59 pm Post subject: |
|
|
any benchmark? |
|
Back to top |
|
 |
wu-s n00b

Joined: 22 May 2006 Posts: 41
|
Posted: Mon Oct 02, 2006 11:39 pm Post subject: |
|
|
The benchmarks by Dirtyepic couldn´t disprove the common thesis that the CFLAGS are not that crusial for the overall system performance. I will receive my conroe-box at the end of the week. Compared to my current Athlon Thunderbird 1.3GHz the performance increase will be amazing whatever gcc options are set.
A much more fundamental decision is between Gentoo/x86 and Gentoo/AMD64, which has also been addressed to in this thread. Maybe http://www.linuxhardware.org/article.pl?sid=06/08/22/0415251 is a first source of information. It´s a comparison between "-march=pentium-m -msse3 -O2 -pipe" on 32bit Linux against "-march=nocona -O2 -pipe" on 64bit.
To sum it up, Conroe is performing pretty well under 64bit Linux. I think you should give him a try. The issues with 32bit browser-plugins and video-codecs seem to be manageable.
wu |
|
Back to top |
|
 |
amattas n00b

Joined: 25 Sep 2006 Posts: 25 Location: Kalamazoo, MI
|
Posted: Tue Oct 03, 2006 6:17 pm Post subject: |
|
|
I agree with the previous poster, it does run nicely under 64 bit mode. Now the 965G motherboard on the other hand is a treat in itself to get running. It took much patching of the kernel, and ~AMD64 drivers to get all the hardware working |
|
Back to top |
|
 |
rmh3093 Advocate


Joined: 06 Aug 2003 Posts: 2138 Location: Albany, NY
|
Posted: Wed Oct 04, 2006 12:51 am Post subject: |
|
|
xentric wrote: | I have the E6300 Core2 Duo (Allendale) in my system.
What's best to be used as "Processor Family" when configuring my kernel, Pentium-M or Pentium-4?
And does this processor support "CPU frequency scaling" with Intel Enhanced Speedstep or Intel Pentium-4 clock modulation? |
enhanced speedstep is the best choice but acpu p-states would also work, not pentium-4 clock modulation...... p4 clock mod changes frequency only, enhanced speedstep changes voltage resulting in lower frequency and better power saving _________________ Do not meddle in the affairs of wizards, for they are subtle and quick to anger. |
|
Back to top |
|
 |
irondog l33t


Joined: 07 Jul 2003 Posts: 715 Location: Voor mijn TV. Achter mijn pc.
|
Posted: Wed Oct 25, 2006 2:09 pm Post subject: |
|
|
I also did some benches of my CORE 2 Duo machine (Gentoo x86).
I did this in a tmpfs:
rm -f rand.gz && time gzip -c9 rand > rand.gz
-O2 -march=i686 -fomit-frame-pointer -pipe
Code: |
real 0m22.760s
user 0m22.311s
sys 0m0.443s
|
-O2 -march=nocona -fomit-frame-pointer -pipe
Code: |
real 0m26.353s
user 0m25.703s
sys 0m0.611s
|
-O2 -march=pentium-m -fomit-frame-pointer -pipe
Code: |
real 0m22.796s
user 0m22.332s
sys 0m0.459s
|
-O2 -march=athlon-xp -fomit-frame-pointer -pipe
Code: |
real 0m22.676s
user 0m22.205s
sys 0m0.473s
|
The only relevant thing to say is, you definitely don't want to use nocona on Core 2 Duo Gentoo x86. Besides that, I think it will hurt users on x86_64 also.
I discovered "by accident" that athlon-xp is the fastest (OK, the difference is about "nothing"). I moved from athlon-xp to Core 2 duo and I'm compiling for two boxes on my Core 2 duo system. I'll keep optimizing for my older computer (=athlon-xp) and the processor I sold recently . So, I',m not a fool playing around with -march=athlon-xp on an Intel system.
For the ricers interested (I don't know if it's safe):
-O3 -march=athlon-xp -fomit-frame-pointer -pipe
Code: |
real 0m20.949s
user 0m20.453s
sys 0m0.489s
|
mfpmath=sse and -msse3 don't seem to influence the results very much for me. Maybe because of the -march=athlon-xp. _________________ Alle dingen moeten onzin zijn. |
|
Back to top |
|
 |
rhill Retired Dev


Joined: 22 Oct 2004 Posts: 1629 Location: sk.ca
|
Posted: Sun Dec 03, 2006 6:13 am Post subject: |
|
|
I spoke to someone at Intel who works on GCC and he confirmed that for Core Solo/Duo, -march=prescott is the correct microarchitecture. Core 2 Solo/Duo (and if you're lucky enough to have one, the quad-core Core 2 Duo Extreme X6700) should use -march=nocona with GCC 4.1.
With GCC 4.2 you can use the new -march=core2 which enables the also new -mssse3 (say that three times fast) instruction set.
Lloeki: i was wrong about -mfpmath=sse not generating sse3 instructions. it just didn't with that particular code, which is weird but so is GCC sometimes
irondog: you can't make general claims like that based on one benchmark, especially a I/O based one like gzip. and SSE won't affect it because you're not doing anything that requires floating point calculations. _________________ by design, by neglect
for a fact or just for effect |
|
Back to top |
|
 |
ECantona n00b

Joined: 26 Apr 2005 Posts: 65
|
Posted: Sun Dec 03, 2006 12:14 pm Post subject: |
|
|
dirtyepic: what about mobile processors like core 2 duo merom? |
|
Back to top |
|
 |
Lloeki Guru


Joined: 14 Jun 2006 Posts: 437 Location: France
|
Posted: Thu Dec 07, 2006 2:58 pm Post subject: |
|
|
dirtyepic,
indeed, that's great and interesting news and so, I was wrong...
I'll put prescott for now on my gf's core duo and my own core 2 duo (remember, i'm going 32bits for now ). but won't go as far as rebuilding world (no ricer mode here).
I guess we'll now have to wait for gcc 4.2 for some time, since I'm running stable...
anyway, thanks a lot for the research  _________________ Moved to using Arch Linux
Life is meant to be lived, not given up...
HOLY COW I'M TOTALLY GOING SO FAST OH F***  |
|
Back to top |
|
 |
rhill Retired Dev


Joined: 22 Oct 2004 Posts: 1629 Location: sk.ca
|
Posted: Sun Dec 10, 2006 11:56 pm Post subject: |
|
|
oops, make that GCC 4.3.
ECantona: yeah, this is for all Core CPUs from Yonah to Merom to Kentsfield. _________________ by design, by neglect
for a fact or just for effect |
|
Back to top |
|
 |
alphamaennchen n00b


Joined: 05 Sep 2005 Posts: 40
|
Posted: Wed Dec 13, 2006 11:18 am Post subject: T7200 |
|
|
I hava a T7200 (Core2Duo 2,0 Mobile Version).
It is built into a Asus A8jp Notebook, together with ATI X1700.
Here is my makefile:
# These settings were set by the catalyst build script that automatically built
this stage
# Please consult /etc/make.conf.example for a more detailed example
CHOST="x86_64-pc-linux-gnu"
CFLAGS="-O2 -pipe"
CXXFLAGS="${CFLAGS}"
#
USE="aac acpi alsa apache2 arts avi beagle bzlib cdr dbus dmix directfb dvdcss d
vd
dvdread encode fam firefox ffmpeg fortran gif gpm gtk gtk2 hal jpeg kde
linguas_de mad math motif mmx mp3 mpeg mysql nls ntpl ntplonly nsplugin
ogg openal opengl oss pcre pdf pdflib php pnf png pstricks qt qt3 rtsp samba sdl
slang
sockets sse sse2 sqlite threads truetype udev unicode usb userlocales utf8 vcd v
hosts
win32codecs xine xml xscreensaver xv xvid zlib X
-esd -fPIC"
#
ACCEPT_KEYWORDS="amd64"
MAKEOPTS="-j3"
GENTOO_MIRRORS="ftp://sunsite.informatik.rwth-aachen.de/pub/Linux/gentoo ftp://f
tp.uni-erlangen.de/pub/mirrors/gentoo "
#
LINGUAS="de en"
FEATURES="parallel-fetch"
#
ALSA_CARDS="hda-intel"
#
VIDEO_CARDS="fglrx"
INPUT_DEVICES="keyboard synaptics mouse"
#
PORTDIR_OVERLAY="/usr/local/portage"
source /usr/portage/local/layman/make.conf
Everything is fine!
And: using ati-drivers-8.29.6 works fine for me, later versions don't!
No compile errors, speed is goot... only some problems with snd_hda but that is another topic... _________________ linux is a wigwam: no windows, no gates, apache inside!
Desktop: AMD64 3400+, GeForce 7800GS, Gentoo
Notebook (Asus A8jp): Core 2 Duo 2,0 (T7200), ATI X1700, Kubuntu
PDA: Zaurus SL-6000
linux is user friendly! however, it is not idiot friendly.... |
|
Back to top |
|
 |
Dirk.R.Gently Guru


Joined: 29 Jan 2007 Posts: 546 Location: Titan
|
Posted: Wed Mar 07, 2007 8:22 am Post subject: |
|
|
Any luck resolving this? The MacBook Wiki actually recommends the nacona for 32 bit. _________________ • Helpful Linux Tidbits |
|
Back to top |
|
 |
llavalle n00b

Joined: 28 Nov 2004 Posts: 38 Location: Montréal, Quebec, Canada
|
|
Back to top |
|
 |
alphamaennchen n00b


Joined: 05 Sep 2005 Posts: 40
|
Posted: Wed Apr 11, 2007 6:57 pm Post subject: I have Linux on my Merom Laptop |
|
|
And it works flawlessly.
Let me know if you need make.conf or else... _________________ linux is a wigwam: no windows, no gates, apache inside!
Desktop: AMD64 3400+, GeForce 7800GS, Gentoo
Notebook (Asus A8jp): Core 2 Duo 2,0 (T7200), ATI X1700, Kubuntu
PDA: Zaurus SL-6000
linux is user friendly! however, it is not idiot friendly.... |
|
Back to top |
|
 |
crisandbea Veteran

Joined: 03 Jul 2005 Posts: 1778 Location: BOSCO (SA) ... ma domiciliato a Bologna....
|
Posted: Fri May 25, 2007 10:21 am Post subject: |
|
|
Blank I have as soon as bought notebook dell a D620 with Core2-Duo T7200 (Merom),
I ask you some councils:
1) to use the minimal-cd x86 or amd64?
2) which CFLAGS to set up?
thanks |
|
Back to top |
|
 |
michel7 Guru

Joined: 04 May 2006 Posts: 461 Location: localhost
|
Posted: Fri May 25, 2007 10:47 am Post subject: |
|
|
crisandbea wrote: | Blank I have as soon as bought notebook dell a D620 with Core2-Duo T7200 (Merom),
I ask you some councils:
1) to use the minimal-cd x86 or amd64?
2) which CFLAGS to set up?
thanks |
1) i would suggest to use x86 because its more safely
2) my CFLAGS on my T7200 (MEROM) are: CFLAGS="-march=prescott -O2 -pipe -fomit-frame-pointer"
and my system is very stable, no compilation issuess and other complains ... _________________ Software is like sex. It's better when it's free |
|
Back to top |
|
 |
progman32 n00b

Joined: 16 Aug 2006 Posts: 50
|
Posted: Mon Jul 02, 2007 9:36 pm Post subject: |
|
|
Here is an article on LinuxHardware.org showing some data (benchmarks). It's really a comparison between different CPUs, but it has some hard data on 64 vs 32 bit for those wondering about the performance differences, and, most importantly, the GCC flags that were used in said benchmarks.
Notice, however, that they didn't use -msse3 or -mfpmath=sse. Would be interesting to know what difference it makes.
Also, anyone know what CPU setting to use in the kernel, as asked above?
I'm waiting for my new core 2 duo, once I have Gentoo installed I will post some benchmarks. Anyone know a good way of benchmarking kernel performance in various tasks? |
|
Back to top |
|
 |
lodewj n00b

Joined: 03 Aug 2004 Posts: 12
|
Posted: Wed Aug 22, 2007 2:10 pm Post subject: |
|
|
I just wanted to try gcc-4.3.0_alpha20070817 on my testserver.
Intel pentium dual core E2160 (a 1Mb L2 cache conroe).
Code: |
Portage 2.1.2.12 (default-linux/amd64/2007.0/server, gcc-4.1.2, glibc-2.5-r4, 2.6.22-gentoo-r2 x86_64)
=================================================================
System uname: 2.6.22-gentoo-r2 x86_64 Genuine Intel(R) CPU 2160 @ 1.80GHz
Gentoo Base System release 1.12.9
Timestamp of tree: Wed, 22 Aug 2007 11:50:01 +0000
dev-lang/python: 2.4.4-r4
dev-python/pycrypto: 2.0.1-r6
sys-apps/sandbox: 1.2.17
sys-devel/autoconf: 2.13, 2.61
sys-devel/automake: 1.7.9-r1, 1.9.6-r2, 1.10
sys-devel/binutils: 2.17
sys-devel/gcc-config: 1.3.16
sys-devel/libtool: 1.5.24
virtual/os-headers: 2.6.21
ACCEPT_KEYWORDS="amd64"
AUTOCLEAN="yes"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=nocona -msse3 -mfpmath=sse -O2 -fomit-frame-pointer -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc"
CONFIG_PROTECT_MASK="/etc/env.d /etc/gconf /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/terminfo"
CXXFLAGS="-march=nocona -msse3 -mfpmath=sse -O2 -fomit-frame-pointer -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="ccache distlocks metadata-transfer sandbox sfperms strict"
GENTOO_MIRRORS="ftp.belnet.be/linux/gentoo ftp.snt.utwente.nl/pub/linux/gentoo"
MAKEOPTS="-j3"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --filter=H_**/files/digest-*"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage"
USE="/ 3dnow 3dnowext acl acpi amd64 apache2 berkdb bitmap-fonts bzip2 cgi cli cracklib crypt dri fortran gdbm glibc-omitfp gpm hash iconv ipv6 isdnlog kerberos midi mmx mmxext mudflap ncurses nls nptl nptlonly openmp openntpd pam pcre perl php posix postgres pppd python readline reflection samba session spl sse sse2 ssl tcpd truetype truetype-fonts type1-fonts unicode xml xorg zip zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" USERLAND="GNU" VIDEO_CARDS="apm ark chips cirrus cyrix dummy fbdev glint i128 i810 mach64 mga neomagic nv r128 radeon rendition s3 s3virge savage siliconmotion sis sisusb tdfx tga trident tseng v4l vesa vga via vmware voodoo"
Unset: CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
|
but compiling gcc fails
these are the last lines:
Code: |
/var/tmp/portage/sys-devel/gcc-4.3.0_alpha20070817/work/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/char_traits.h: In static member function 'static $
/var/tmp/portage/sys-devel/gcc-4.3.0_alpha20070817/work/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/char_traits.h:141: error: 'EOF' was not declared $
/var/tmp/portage/sys-devel/gcc-4.3.0_alpha20070817/work/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/char_traits.h: In static member function 'static $
/var/tmp/portage/sys-devel/gcc-4.3.0_alpha20070817/work/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/char_traits.h:293: error: 'EOF' was not declared $
make[4]: *** [codecvt.lo] Error 1
make[4]: Leaving directory `/var/tmp/portage/sys-devel/gcc-4.3.0_alpha20070817/work/build/x86_64-pc-linux-gnu/libstdc++-v3/src'
make[3]: *** [all-recursive] Error 1
make[3]: Leaving directory `/var/tmp/portage/sys-devel/gcc-4.3.0_alpha20070817/work/build/x86_64-pc-linux-gnu/libstdc++-v3'
make[2]: *** [all] Error 2
make[2]: Leaving directory `/var/tmp/portage/sys-devel/gcc-4.3.0_alpha20070817/work/build/x86_64-pc-linux-gnu/libstdc++-v3'
make[1]: *** [all-target-libstdc++-v3] Error 2
make[1]: Leaving directory `/var/tmp/portage/sys-devel/gcc-4.3.0_alpha20070817/work/build'
make: *** [profiledbootstrap] Error 2
!!! ERROR: sys-devel/gcc-4.3.0_alpha20070817 failed.
Call stack:
ebuild.sh, line 1638: Called dyn_compile
ebuild.sh, line 985: Called qa_call 'src_compile'
ebuild.sh, line 44: Called src_compile
ebuild.sh, line 1328: Called toolchain_src_compile
toolchain.eclass, line 26: Called gcc_src_compile
toolchain.eclass, line 1546: Called gcc_do_make
toolchain.eclass, line 1420: Called die
!!! emake failed with profiledbootstrap
!!! If you need support, post the topmost build error, and the call stack if relevant.
!!! A complete build log is located at '/var/tmp/portage/sys-devel/gcc-4.3.0_alpha20070817/temp/build.log'.
|
you can find the full output of build.log here.
any ideas? |
|
Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|