Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Core 2 Duo - Merom
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2  
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Lloeki
Guru
Guru


Joined: 14 Jun 2006
Posts: 437
Location: France

PostPosted: Sun Sep 24, 2006 2:26 pm    Post subject: Reply with quote

concerning the family to choose, the same discussion as before should apply (that is ideal case is -march=conroe which doesn't exist yet), though if you want a 64bit kernel I can't tell you exactly what to do.

concerning frequency scaling:
this shows Conroe seems to support EIST (Enhanced Intel SpeedStep Technology), see *note about availability down the page though.
p4 clockmod is a no-go, but you could try enhanced-speedstep module. once laoded you should have the usual
/sys/devices/system/cpu/cpu*/cpufreq/ available.
_________________
Moved to using Arch Linux
Life is meant to be lived, not given up...
HOLY COW I'M TOTALLY GOING SO FAST OH F*** ;)
Back to top
View user's profile Send private message
lcj
Tux's lil' helper
Tux's lil' helper


Joined: 25 Apr 2004
Posts: 82
Location: Opole, Poland

PostPosted: Sun Sep 24, 2006 6:55 pm    Post subject: Reply with quote

Choose Intel Enchanced Speedsetp. Works fine on both cores, both with cupfreqd or gnome applet.
_________________
--
Lukasz C. Jokiel via web
Back to top
View user's profile Send private message
rhill
Retired Dev
Retired Dev


Joined: 22 Oct 2004
Posts: 1629
Location: sk.ca

PostPosted: Mon Sep 25, 2006 12:14 am    Post subject: Reply with quote

ok, i did one simple c++ benchmark using TraMP3d-v4. keep in mind it's just one benchmark.

the system used was a Toshiba Satellite A100 laptop with a Core Duo T2300 @ 1.66GHz (Yonah), 2MiB shared L2 cache, and 1GiB of memory. the GCC version used was 4.1-branch svn built yesterday.

-O2 -march=prescott -fomit-frame-pointer -pipe
Code:
dirtyepic@tycho ~/tmp $ /usr/bin/time /usr/bin/g++-4.1.2-pre20060923 -O2 -march=prescott -fomit-frame-pointer -pipe -Dleafify=flatten tramp3d-v4.cpp  -o tramp3d-v4-prescott
95.45user 0.84system 1:35.69elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+202080minor)pagefaults 0swaps

dirtyepic@tycho ~/tmp $ ./tramp3d-v4-prescott -n 25 --cartvis 1.0 0.0 --rhomin 1e-8
Using
  using [1,1,1] block setup for computation on domain [0:63:1,0:63:1,0:63:1]
  solving eeq
  time increments from [0, 1.79769e+308], cfl 0.5
  starting at t = 0, i = 1
  cell physical/total domain [0:62:1,0:62:1,0:62:1], [-2:64:1,-2:64:1,-2:64:1]
  face  physical/total domain [0:62:1,0:62:1,0:62:1], [-2:64:1,-2:64:1,-2:64:1]
  periodic boundaries in X Y Z
i = 1    t = 0.00209225  dt = 0.00209225 (0.07124s/it)
i = 2    t = 0.00410537  dt = 0.00201312 (0.946142s/it)
i = 3    t = 0.00603889  dt = 0.00193352 (0.966466s/it)
i = 4    t = 0.00794139  dt = 0.00190251 (0.975241s/it)
i = 5    t = 0.00984636  dt = 0.00190497 (0.97465s/it)
i = 6    t = 0.0117508   dt = 0.00190449 (0.985882s/it)
i = 7    t = 0.013681    dt = 0.00193011 (1.0047s/it)
i = 8    t = 0.0156598   dt = 0.0019788 (1.00467s/it)
i = 9    t = 0.0176706   dt = 0.00201081 (1.00171s/it)
i = 10   t = 0.0197364   dt = 0.0020658 (1.0184s/it)
i = 11   t = 0.0218716   dt = 0.0021352 (1.01445s/it)
i = 12   t = 0.0240721   dt = 0.00220057 (1.00954s/it)
i = 13   t = 0.0263471   dt = 0.002275 (1.01139s/it)
i = 14   t = 0.0287159   dt = 0.00236875 (1.01714s/it)
i = 15   t = 0.0311533   dt = 0.00243738 (1.01269s/it)
i = 16   t = 0.0336768   dt = 0.0025235 (1.01118s/it)
i = 17   t = 0.0362863   dt = 0.00260952 (1.00748s/it)
i = 18   t = 0.0389715   dt = 0.00268521 (1.00433s/it)
i = 19   t = 0.0417381   dt = 0.00276665 (1.00053s/it)
i = 20   t = 0.0445873   dt = 0.00284919 (1.00177s/it)
i = 21   t = 0.0475216   dt = 0.0029343 (0.989871s/it)
i = 22   t = 0.0505258   dt = 0.00300413 (0.997915s/it)
i = 23   t = 0.0535938   dt = 0.00306807 (0.98717s/it)
i = 24   t = 0.0567043   dt = 0.0031105 (0.989589s/it)
i = 25   t = 0.0598233   dt = 0.00311892 (0.987146s/it)
Time spent in iteration: 23.9913
Correctness:
        sum(rh) difference = 1.45519e-11
        sum(vx) = -0.242582
        sum(vy) = -0.295116
        sum(vz) = -0.335474
        sum(rh*T) difference = -297.099

dirtyepic@tycho ~/tmp $ analyze-x86 tramp3d-v4-prescott
Checking vendor_id string... GenuineIntel
Disassembling tramp3d-v4-prescott, please wait...
i486:    0 i586:    0 ppro:  130 mmx:    0 sse:    0 sse2:    0 sse3:    2
tramp3d-v4-prescott will run on Pentium IV (pentium4) w/ SSE3 or higher processor.


-O2 -march=pentium-m -fomit-frame-pointer -pipe
Code:
dirtyepic@tycho ~/tmp $ /usr/bin/time /usr/bin/g++-4.1.2-pre20060923 -O2 -march=pentium-m -fomit-frame-pointer -pipe -Dleafify=flatten tramp3d-v4.cpp  -o tramp3d-v4-pentiumm-plain
97.74user 0.74system 1:38.47elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (11major+200253minor)pagefaults 0swaps

dirtyepic@tycho ~/tmp $ ./tramp3d-v4-pentiumm-plain -n 25 --cartvis 1.0 0.0 --rhomin 1e-8
Using
  using [1,1,1] block setup for computation on domain [0:63:1,0:63:1,0:63:1]
  solving eeq
  time increments from [0, 1.79769e+308], cfl 0.5
  starting at t = 0, i = 1
  cell physical/total domain [0:62:1,0:62:1,0:62:1], [-2:64:1,-2:64:1,-2:64:1]
  face  physical/total domain [0:62:1,0:62:1,0:62:1], [-2:64:1,-2:64:1,-2:64:1]
  periodic boundaries in X Y Z
i = 1    t = 0.00209225  dt = 0.00209225 (0.0692961s/it)
i = 2    t = 0.00410537  dt = 0.00201312 (0.992859s/it)
i = 3    t = 0.00603889  dt = 0.00193352 (1.0033s/it)
i = 4    t = 0.00794139  dt = 0.00190251 (0.975363s/it)
i = 5    t = 0.00984636  dt = 0.00190497 (0.98926s/it)
i = 6    t = 0.0117508   dt = 0.00190449 (0.986304s/it)
i = 7    t = 0.013681    dt = 0.00193011 (0.997433s/it)
i = 8    t = 0.0156598   dt = 0.0019788 (0.99804s/it)
i = 9    t = 0.0176706   dt = 0.00201081 (1.00585s/it)
i = 10   t = 0.0197364   dt = 0.0020658 (1.00463s/it)
i = 11   t = 0.0218716   dt = 0.0021352 (1.01035s/it)
i = 12   t = 0.0240721   dt = 0.00220057 (1.00643s/it)
i = 13   t = 0.0263471   dt = 0.002275 (1.00908s/it)
i = 14   t = 0.0287159   dt = 0.00236875 (1.00359s/it)
i = 15   t = 0.0311533   dt = 0.00243738 (1.00683s/it)
i = 16   t = 0.0336768   dt = 0.0025235 (1.0018s/it)
i = 17   t = 0.0362863   dt = 0.00260952 (1.00395s/it)
i = 18   t = 0.0389715   dt = 0.00268521 (0.994894s/it)
i = 19   t = 0.0417381   dt = 0.00276665 (0.995252s/it)
i = 20   t = 0.0445873   dt = 0.00284919 (0.992024s/it)
i = 21   t = 0.0475216   dt = 0.0029343 (0.989914s/it)
i = 22   t = 0.0505258   dt = 0.00300413 (0.984155s/it)
i = 23   t = 0.0535938   dt = 0.00306807 (0.986609s/it)
i = 24   t = 0.0567043   dt = 0.0031105 (0.981239s/it)
i = 25   t = 0.0598233   dt = 0.00311892 (0.986686s/it)
Time spent in iteration: 23.9751
Correctness:
        sum(rh) difference = 1.45519e-11
        sum(vx) = -0.242582
        sum(vy) = -0.295116
        sum(vz) = -0.335474
        sum(rh*T) difference = -297.099

dirtyepic@tycho ~/tmp $ analyze-x86 tramp3d-v4-pentiumm-plain                                                                          Checking vendor_id string... GenuineIntel
Disassembling tramp3d-v4-pentiumm-plain, please wait...
i486:    0 i586:    0 ppro:  135 mmx:    0 sse:    0 sse2:    4 sse3:    0
tramp3d-v4-pentiumm-plain will run on Pentium IV (pentium4) or higher processor.


-O2 -march=pentium-m -msse3 -fomit-frame-pointer -pipe
Code:
dirtyepic@tycho ~/tmp $ /usr/bin/time /usr/bin/g++-4.1.2-pre20060923 -O2 -march=pentium-m -msse3 -fomit-frame-pointer -pipe -Dleafify=flatten tramp3d-v4.cpp  -o tramp3d-v4-pentiumm
97.73user 1.01system 1:38.05elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+197280minor)pagefaults 0swaps

dirtyepic@tycho ~/tmp $ ./tramp3d-v4-pentiumm -n 25 --cartvis 1.0 0.0 --rhomin 1e-8
Using
  using [1,1,1] block setup for computation on domain [0:63:1,0:63:1,0:63:1]
  solving eeq
  time increments from [0, 1.79769e+308], cfl 0.5
  starting at t = 0, i = 1
  cell physical/total domain [0:62:1,0:62:1,0:62:1], [-2:64:1,-2:64:1,-2:64:1]
  face  physical/total domain [0:62:1,0:62:1,0:62:1], [-2:64:1,-2:64:1,-2:64:1]
  periodic boundaries in X Y Z
i = 1    t = 0.00209225  dt = 0.00209225 (0.069342s/it)
i = 2    t = 0.00410537  dt = 0.00201312 (0.968165s/it)
i = 3    t = 0.00603889  dt = 0.00193352 (0.985737s/it)
i = 4    t = 0.00794139  dt = 0.00190251 (0.999364s/it)
i = 5    t = 0.00984636  dt = 0.00190497 (1.01105s/it)
i = 6    t = 0.0117508   dt = 0.00190449 (1.01161s/it)
i = 7    t = 0.013681    dt = 0.00193011 (1.02449s/it)
i = 8    t = 0.0156598   dt = 0.0019788 (1.02412s/it)
i = 9    t = 0.0176706   dt = 0.00201081 (1.02851s/it)
i = 10   t = 0.0197364   dt = 0.0020658 (1.02592s/it)
i = 11   t = 0.0218716   dt = 0.0021352 (1.03424s/it)
i = 12   t = 0.0240721   dt = 0.00220057 (1.0353s/it)
i = 13   t = 0.0263471   dt = 0.002275 (1.03373s/it)
i = 14   t = 0.0287159   dt = 0.00236875 (1.03266s/it)
i = 15   t = 0.0311533   dt = 0.00243738 (1.03526s/it)
i = 16   t = 0.0336768   dt = 0.0025235 (1.02011s/it)
i = 17   t = 0.0362863   dt = 0.00260952 (1.0232s/it)
i = 18   t = 0.0389715   dt = 0.00268521 (1.02476s/it)
i = 19   t = 0.0417381   dt = 0.00276665 (1.0153s/it)
i = 20   t = 0.0445873   dt = 0.00284919 (1.00431s/it)
i = 21   t = 0.0475216   dt = 0.0029343 (1.00313s/it)
i = 22   t = 0.0505258   dt = 0.00300413 (0.989761s/it)
i = 23   t = 0.0535938   dt = 0.00306807 (0.99909s/it)
i = 24   t = 0.0567043   dt = 0.0031105 (0.989536s/it)
i = 25   t = 0.0598233   dt = 0.00311892 (0.996134s/it)
Time spent in iteration: 24.3848
Correctness:
        sum(rh) difference = 1.45519e-11
        sum(vx) = -0.242582
        sum(vy) = -0.295116
        sum(vz) = -0.335474
        sum(rh*T) difference = -297.099

dirtyepic@tycho ~/tmp $ analyze-x86 tramp3d-v4-pentiumm
Checking vendor_id string... GenuineIntel
Disassembling tramp3d-v4-pentiumm, please wait...
i486:    0 i586:    0 ppro:  135 mmx:    0 sse:    0 sse2:    0 sse3:    2
tramp3d-v4-pentiumm will run on Pentium IV (pentium4) w/ SSE3 or higher processor.


-O2 -march=pentium-m -msse3 -mfpmath=sse -fomit-frame-pointer -pipe
Code:
dirtyepic@tycho ~/tmp $ /usr/bin/time /usr/bin/g++-4.1.2-pre20060923 -O2 -march=pentium-m -msse3 -mfpmath=sse -fomit-frame-pointer -pipe -Dleafify=flatten tramp3d-v4.cpp  -o tramp3d-v4-pentiumm-sse
98.40user 0.94system 1:39.15elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (3major+198438minor)pagefaults 0swaps

dirtyepic@tycho ~/tmp $ ./tramp3d-v4-pentiumm-sse -n 25 --cartvis 1.0 0.0 --rhomin 1e-8
Using
  using [1,1,1] block setup for computation on domain [0:63:1,0:63:1,0:63:1]
  solving eeq
  time increments from [0, 1.79769e+308], cfl 0.5
  starting at t = 0, i = 1
  cell physical/total domain [0:62:1,0:62:1,0:62:1], [-2:64:1,-2:64:1,-2:64:1]
  face  physical/total domain [0:62:1,0:62:1,0:62:1], [-2:64:1,-2:64:1,-2:64:1]
  periodic boundaries in X Y Z
i = 1    t = 0.00209225  dt = 0.00209225 (0.0617449s/it)
i = 2    t = 0.00410537  dt = 0.00201312 (0.897831s/it)
i = 3    t = 0.00603889  dt = 0.00193352 (0.964484s/it)
i = 4    t = 0.00794139  dt = 0.00190251 (0.94189s/it)
i = 5    t = 0.00984636  dt = 0.00190497 (0.972172s/it)
i = 6    t = 0.0117508   dt = 0.00190449 (0.973818s/it)
i = 7    t = 0.013681    dt = 0.00193011 (0.984364s/it)
i = 8    t = 0.0156598   dt = 0.0019788 (0.988743s/it)
i = 9    t = 0.0176706   dt = 0.00201081 (0.996885s/it)
i = 10   t = 0.0197364   dt = 0.0020658 (0.997118s/it)
i = 11   t = 0.0218716   dt = 0.0021352 (1.00016s/it)
i = 12   t = 0.0240721   dt = 0.00220057 (0.99685s/it)
i = 13   t = 0.0263471   dt = 0.002275 (0.998231s/it)
i = 14   t = 0.0287159   dt = 0.00236875 (1.00025s/it)
i = 15   t = 0.0311533   dt = 0.00243738 (0.987068s/it)
i = 16   t = 0.0336768   dt = 0.0025235 (0.981898s/it)
i = 17   t = 0.0362863   dt = 0.00260952 (0.990963s/it)
i = 18   t = 0.0389715   dt = 0.00268521 (0.986071s/it)
i = 19   t = 0.0417381   dt = 0.00276665 (0.980461s/it)
i = 20   t = 0.0445873   dt = 0.00284919 (0.982345s/it)
i = 21   t = 0.0475216   dt = 0.0029343 (1.00055s/it)
i = 22   t = 0.0505258   dt = 0.00300413 (0.995297s/it)
i = 23   t = 0.0535938   dt = 0.00306807 (1.00189s/it)
i = 24   t = 0.0567043   dt = 0.0031105 (1.00527s/it)
i = 25   t = 0.0598233   dt = 0.00311892 (1.01299s/it)
Time spent in iteration: 23.6994
Correctness:
        sum(rh) difference = 1.28966e-08
        sum(vx) = -0.242582
        sum(vy) = -0.295116
        sum(vz) = -0.335474
        sum(rh*T) difference = -297.099

dirtyepic@tycho ~/tmp $ analyze-x86 tramp3d-v4-pentiumm-sse
Checking vendor_id string... GenuineIntel
Disassembling tramp3d-v4-pentiumm-sse, please wait...
i486:    0 i586:    0 ppro:   84 mmx:   44 sse:    0 sse2: 3089 sse3:    0
tramp3d-v4-pentiumm-sse will run on Pentium IV (pentium4) or higher processor.


Keep in mind that anything that does strip-flags (ie. GCC, glibc, kernel, etc.) will remove both -msse3 and -mfpmath from your C[XX]FLAGS

Very little difference in runtimes, maybe half a second, and next to no difference in compile time. Surprisingly,
-O2 -march=pentium-m -msse3 -fomit-frame-pointer -pipe was the slowest. I reran the test to be sure and it was slightly worse (24.5397s) than the original run.

It also appears -mfpmath=sse does not generate sse3 instructions.
_________________
by design, by neglect
for a fact or just for effect
Back to top
View user's profile Send private message
Lloeki
Guru
Guru


Joined: 14 Jun 2006
Posts: 437
Location: France

PostPosted: Mon Sep 25, 2006 2:26 pm    Post subject: Reply with quote

hmm, some data :)

as you mentioned before, there is some interest for Core Duo (yonah) to add -mfpmath=sse, as -march=pentium-m alone will by default favor x87 for performance:
Quote:
-march=pentium-m prefers x87 over sse scalar code, because pentium-m can
decode sse at only half the rate of x87. You should see the speed
advantage clearly on pentium-m, presumably not on Core Duo.

so using -mfpmath=sse takes advantage of a full-rate sse on Core Duo, which is observed.

Quote:
It also appears -mfpmath=sse does not generate sse3 instructions.

this one is really interesting, too.

I have received my merom, but unfortunately at work a project is nearing completion (or should I say its deadline), so I don't even know if I'll have time to set it up, let alone benchmark it.
_________________
Moved to using Arch Linux
Life is meant to be lived, not given up...
HOLY COW I'M TOTALLY GOING SO FAST OH F*** ;)
Back to top
View user's profile Send private message
wu-s
n00b
n00b


Joined: 22 May 2006
Posts: 41

PostPosted: Mon Oct 02, 2006 5:16 pm    Post subject: Reply with quote

Lloeki wrote:
as a side note, mtune is redundant, as march implies enhancements of mtune, plus specifics. I don't know what precedence gcc gives to each one, but it may as well disable your glorious march optimisations, in favor of safer mtune ones.


Maybe just to stress that point. I think one has to distinguish between those two options. Carefully read the third paragraph for the "generic" cpu-type:

The gcc-manpage tells us:
Quote:
Intel 386 and AMD x86-64 Options

These -m options are defined for the i386 and x86-64 family of comput-
ers:

-mtune=cpu-type
Tune to cpu-type everything applicable about the generated code,
except for the ABI and the set of available instructions. The
choices for cpu-type are:

generic
Produce code optimized for the most common IA32/AMD64/EM64T
processors. If you know the CPU on which your code will run,
then you should use the corresponding -mtune option instead of
-mtune=generic. But, if you do not know exactly what CPU users
of your application will have, then you should use this option.

As new processors are deployed in the marketplace, the behavior
of this option will change. Therefore, if you upgrade to a
newer version of GCC, the code generated option will change to
reflect the processors that were most common when that version
of GCC was released.

There is no -march=generic option because -march indicates the
instruction set the compiler can use, and there is no generic
instruction set applicable to all processors. In contrast,
-mtune indicates the processor (or, in this case, collection of
processors) for which the code is optimized.

...

pentium-m
Low power version of Intel Pentium3 CPU with MMX, SSE and SSE2
instruction set support. Used by Centrino notebooks.

pentium4, pentium4m
Intel Pentium4 CPU with MMX, SSE and SSE2 instruction set sup-
port.

prescott
Improved version of Intel Pentium4 CPU with MMX, SSE, SSE2 and
SSE3 instruction set support.

nocona
Improved version of Intel Pentium4 CPU with 64-bit extensions,
MMX, SSE, SSE2 and SSE3 instruction set support.
...

-march=cpu-type
Generate instructions for the machine type cpu-type. The choices
for cpu-type are the same as for -mtune. Moreover, specifying
-march=cpu-type implies -mtune=cpu-type.


So here is my interpretation of the options' semantics: "-march" selects the assembler-instructions available for compiling, e.g. gcc _can_ use sse3 instructions if you select nocona. In contrast, "-mtune" is used to optimize the assembly for the given cpu-type under the allowed instructions from "-march".

It´s trivial that "-march" implies "-mtune" if not otherwise stated. _However_ you can optimize for a different cpu-type with the allowed instruction set from "-march", can´t you?

So
Quote:
CFLAGS="-march=nocona -mtune=pentium-m -02 -pipe

as stated in http://gentoo-wiki.com/Safe_Cflags#Intel_Core_2_Solo.2FDuo_.28Allendale.2C_Conroe.2C_Merom.29 seems to be promising, right?

-dirtyepic: Maybe you can run some benchmarks with that setting?

Cheers,

Sven
Back to top
View user's profile Send private message
magoscuro
n00b
n00b


Joined: 18 Mar 2005
Posts: 32
Location: Santiago de Chile

PostPosted: Mon Oct 02, 2006 6:59 pm    Post subject: Reply with quote

any benchmark?
Back to top
View user's profile Send private message
wu-s
n00b
n00b


Joined: 22 May 2006
Posts: 41

PostPosted: Mon Oct 02, 2006 11:39 pm    Post subject: Reply with quote

The benchmarks by Dirtyepic couldn´t disprove the common thesis that the CFLAGS are not that crusial for the overall system performance. I will receive my conroe-box at the end of the week. Compared to my current Athlon Thunderbird 1.3GHz the performance increase will be amazing whatever gcc options are set.

A much more fundamental decision is between Gentoo/x86 and Gentoo/AMD64, which has also been addressed to in this thread. Maybe http://www.linuxhardware.org/article.pl?sid=06/08/22/0415251 is a first source of information. It´s a comparison between "-march=pentium-m -msse3 -O2 -pipe" on 32bit Linux against "-march=nocona -O2 -pipe" on 64bit.

To sum it up, Conroe is performing pretty well under 64bit Linux. I think you should give him a try. The issues with 32bit browser-plugins and video-codecs seem to be manageable.

wu
Back to top
View user's profile Send private message
amattas
n00b
n00b


Joined: 25 Sep 2006
Posts: 25
Location: Kalamazoo, MI

PostPosted: Tue Oct 03, 2006 6:17 pm    Post subject: Reply with quote

I agree with the previous poster, it does run nicely under 64 bit mode. Now the 965G motherboard on the other hand is a treat in itself to get running. It took much patching of the kernel, and ~AMD64 drivers to get all the hardware working
Back to top
View user's profile Send private message
rmh3093
Advocate
Advocate


Joined: 06 Aug 2003
Posts: 2138
Location: Albany, NY

PostPosted: Wed Oct 04, 2006 12:51 am    Post subject: Reply with quote

xentric wrote:
I have the E6300 Core2 Duo (Allendale) in my system.
What's best to be used as "Processor Family" when configuring my kernel, Pentium-M or Pentium-4?

And does this processor support "CPU frequency scaling" with Intel Enhanced Speedstep or Intel Pentium-4 clock modulation?


enhanced speedstep is the best choice but acpu p-states would also work, not pentium-4 clock modulation...... p4 clock mod changes frequency only, enhanced speedstep changes voltage resulting in lower frequency and better power saving
_________________
Do not meddle in the affairs of wizards, for they are subtle and quick to anger.
Back to top
View user's profile Send private message
irondog
l33t
l33t


Joined: 07 Jul 2003
Posts: 715
Location: Voor mijn TV. Achter mijn pc.

PostPosted: Wed Oct 25, 2006 2:09 pm    Post subject: Reply with quote

I also did some benches of my CORE 2 Duo machine (Gentoo x86).

I did this in a tmpfs:
rm -f rand.gz && time gzip -c9 rand > rand.gz

-O2 -march=i686 -fomit-frame-pointer -pipe
Code:

real    0m22.760s
user    0m22.311s
sys     0m0.443s


-O2 -march=nocona -fomit-frame-pointer -pipe
Code:

real    0m26.353s
user    0m25.703s
sys     0m0.611s


-O2 -march=pentium-m -fomit-frame-pointer -pipe
Code:

real    0m22.796s
user    0m22.332s
sys     0m0.459s


-O2 -march=athlon-xp -fomit-frame-pointer -pipe
Code:

real    0m22.676s
user    0m22.205s
sys     0m0.473s


The only relevant thing to say is, you definitely don't want to use nocona on Core 2 Duo Gentoo x86. Besides that, I think it will hurt users on x86_64 also.

I discovered "by accident" that athlon-xp is the fastest (OK, the difference is about "nothing"). I moved from athlon-xp to Core 2 duo and I'm compiling for two boxes on my Core 2 duo system. I'll keep optimizing for my older computer (=athlon-xp) and the processor I sold recently :). So, I',m not a fool playing around with -march=athlon-xp on an Intel system.

For the ricers interested (I don't know if it's safe):
-O3 -march=athlon-xp -fomit-frame-pointer -pipe
Code:

real    0m20.949s
user    0m20.453s
sys     0m0.489s


mfpmath=sse and -msse3 don't seem to influence the results very much for me. Maybe because of the -march=athlon-xp.
_________________
Alle dingen moeten onzin zijn.
Back to top
View user's profile Send private message
rhill
Retired Dev
Retired Dev


Joined: 22 Oct 2004
Posts: 1629
Location: sk.ca

PostPosted: Sun Dec 03, 2006 6:13 am    Post subject: Reply with quote

I spoke to someone at Intel who works on GCC and he confirmed that for Core Solo/Duo, -march=prescott is the correct microarchitecture. Core 2 Solo/Duo (and if you're lucky enough to have one, the quad-core Core 2 Duo Extreme X6700) should use -march=nocona with GCC 4.1.

With GCC 4.2 you can use the new -march=core2 which enables the also new -mssse3 (say that three times fast) instruction set.

Lloeki: i was wrong about -mfpmath=sse not generating sse3 instructions. it just didn't with that particular code, which is weird but so is GCC sometimes :wink:

irondog: you can't make general claims like that based on one benchmark, especially a I/O based one like gzip. :P and SSE won't affect it because you're not doing anything that requires floating point calculations.
_________________
by design, by neglect
for a fact or just for effect
Back to top
View user's profile Send private message
ECantona
n00b
n00b


Joined: 26 Apr 2005
Posts: 65

PostPosted: Sun Dec 03, 2006 12:14 pm    Post subject: Reply with quote

dirtyepic: what about mobile processors like core 2 duo merom?
Back to top
View user's profile Send private message
Lloeki
Guru
Guru


Joined: 14 Jun 2006
Posts: 437
Location: France

PostPosted: Thu Dec 07, 2006 2:58 pm    Post subject: Reply with quote

dirtyepic,

indeed, that's great and interesting news :) and so, I was wrong...
I'll put prescott for now on my gf's core duo and my own core 2 duo (remember, i'm going 32bits for now ;) ). but won't go as far as rebuilding world (no ricer mode here).

I guess we'll now have to wait for gcc 4.2 for some time, since I'm running stable...

anyway, thanks a lot for the research :)
_________________
Moved to using Arch Linux
Life is meant to be lived, not given up...
HOLY COW I'M TOTALLY GOING SO FAST OH F*** ;)
Back to top
View user's profile Send private message
rhill
Retired Dev
Retired Dev


Joined: 22 Oct 2004
Posts: 1629
Location: sk.ca

PostPosted: Sun Dec 10, 2006 11:56 pm    Post subject: Reply with quote

oops, make that GCC 4.3. :(

ECantona: yeah, this is for all Core CPUs from Yonah to Merom to Kentsfield.
_________________
by design, by neglect
for a fact or just for effect
Back to top
View user's profile Send private message
alphamaennchen
n00b
n00b


Joined: 05 Sep 2005
Posts: 40

PostPosted: Wed Dec 13, 2006 11:18 am    Post subject: T7200 Reply with quote

I hava a T7200 (Core2Duo 2,0 Mobile Version).
It is built into a Asus A8jp Notebook, together with ATI X1700.

Here is my makefile:

# These settings were set by the catalyst build script that automatically built
this stage
# Please consult /etc/make.conf.example for a more detailed example
CHOST="x86_64-pc-linux-gnu"
CFLAGS="-O2 -pipe"
CXXFLAGS="${CFLAGS}"
#
USE="aac acpi alsa apache2 arts avi beagle bzlib cdr dbus dmix directfb dvdcss d
vd
dvdread encode fam firefox ffmpeg fortran gif gpm gtk gtk2 hal jpeg kde
linguas_de mad math motif mmx mp3 mpeg mysql nls ntpl ntplonly nsplugin
ogg openal opengl oss pcre pdf pdflib php pnf png pstricks qt qt3 rtsp samba sdl
slang
sockets sse sse2 sqlite threads truetype udev unicode usb userlocales utf8 vcd v
hosts
win32codecs xine xml xscreensaver xv xvid zlib X
-esd -fPIC"
#
ACCEPT_KEYWORDS="amd64"
MAKEOPTS="-j3"
GENTOO_MIRRORS="ftp://sunsite.informatik.rwth-aachen.de/pub/Linux/gentoo ftp://f
tp.uni-erlangen.de/pub/mirrors/gentoo "
#
LINGUAS="de en"
FEATURES="parallel-fetch"
#
ALSA_CARDS="hda-intel"
#
VIDEO_CARDS="fglrx"
INPUT_DEVICES="keyboard synaptics mouse"
#
PORTDIR_OVERLAY="/usr/local/portage"
source /usr/portage/local/layman/make.conf

Everything is fine!

And: using ati-drivers-8.29.6 works fine for me, later versions don't!
No compile errors, speed is goot... only some problems with snd_hda but that is another topic...
_________________
linux is a wigwam: no windows, no gates, apache inside!

Desktop: AMD64 3400+, GeForce 7800GS, Gentoo
Notebook (Asus A8jp): Core 2 Duo 2,0 (T7200), ATI X1700, Kubuntu
PDA: Zaurus SL-6000

linux is user friendly! however, it is not idiot friendly....
Back to top
View user's profile Send private message
Dirk.R.Gently
Guru
Guru


Joined: 29 Jan 2007
Posts: 546
Location: Titan

PostPosted: Wed Mar 07, 2007 8:22 am    Post subject: Reply with quote

Any luck resolving this? The MacBook Wiki actually recommends the nacona for 32 bit.
_________________
Helpful Linux Tidbits
Back to top
View user's profile Send private message
llavalle
n00b
n00b


Joined: 28 Nov 2004
Posts: 38
Location: Montréal, Quebec, Canada

PostPosted: Wed Apr 04, 2007 3:16 am    Post subject: Reply with quote

fyi, take a look at this page :

http://gcc.gnu.org/gcc-4.3/changes.html

Quote:

New Targets and Target Specific Improvements

IA-32/x86-64

* Tuning for Intel Core 2 processors is available via -mtune=core2 and -march=core2.

_________________
Dual P2-450 Server running from a Stage1
Back to top
View user's profile Send private message
alphamaennchen
n00b
n00b


Joined: 05 Sep 2005
Posts: 40

PostPosted: Wed Apr 11, 2007 6:57 pm    Post subject: I have Linux on my Merom Laptop Reply with quote

And it works flawlessly.
Let me know if you need make.conf or else...
_________________
linux is a wigwam: no windows, no gates, apache inside!

Desktop: AMD64 3400+, GeForce 7800GS, Gentoo
Notebook (Asus A8jp): Core 2 Duo 2,0 (T7200), ATI X1700, Kubuntu
PDA: Zaurus SL-6000

linux is user friendly! however, it is not idiot friendly....
Back to top
View user's profile Send private message
crisandbea
Veteran
Veteran


Joined: 03 Jul 2005
Posts: 1778
Location: BOSCO (SA) ... ma domiciliato a Bologna....

PostPosted: Fri May 25, 2007 10:21 am    Post subject: Reply with quote

Blank I have as soon as bought notebook dell a D620 with Core2-Duo T7200 (Merom),
I ask you some councils:
1) to use the minimal-cd x86 or amd64?
2) which CFLAGS to set up?
thanks
Back to top
View user's profile Send private message
michel7
Guru
Guru


Joined: 04 May 2006
Posts: 461
Location: localhost

PostPosted: Fri May 25, 2007 10:47 am    Post subject: Reply with quote

crisandbea wrote:
Blank I have as soon as bought notebook dell a D620 with Core2-Duo T7200 (Merom),
I ask you some councils:
1) to use the minimal-cd x86 or amd64?
2) which CFLAGS to set up?
thanks


1) i would suggest to use x86 because its more safely
2) my CFLAGS on my T7200 (MEROM) are: CFLAGS="-march=prescott -O2 -pipe -fomit-frame-pointer"

and my system is very stable, no compilation issuess and other complains ...
_________________
Software is like sex. It's better when it's free
Back to top
View user's profile Send private message
progman32
n00b
n00b


Joined: 16 Aug 2006
Posts: 50

PostPosted: Mon Jul 02, 2007 9:36 pm    Post subject: Reply with quote

Here is an article on LinuxHardware.org showing some data (benchmarks). It's really a comparison between different CPUs, but it has some hard data on 64 vs 32 bit for those wondering about the performance differences, and, most importantly, the GCC flags that were used in said benchmarks.

Notice, however, that they didn't use -msse3 or -mfpmath=sse. Would be interesting to know what difference it makes.

Also, anyone know what CPU setting to use in the kernel, as asked above?

I'm waiting for my new core 2 duo, once I have Gentoo installed I will post some benchmarks. Anyone know a good way of benchmarking kernel performance in various tasks?
Back to top
View user's profile Send private message
lodewj
n00b
n00b


Joined: 03 Aug 2004
Posts: 12

PostPosted: Wed Aug 22, 2007 2:10 pm    Post subject: Reply with quote

I just wanted to try gcc-4.3.0_alpha20070817 on my testserver.

Intel pentium dual core E2160 (a 1Mb L2 cache conroe).

Code:

Portage 2.1.2.12 (default-linux/amd64/2007.0/server, gcc-4.1.2, glibc-2.5-r4, 2.6.22-gentoo-r2 x86_64)
=================================================================
System uname: 2.6.22-gentoo-r2 x86_64 Genuine Intel(R) CPU 2160 @ 1.80GHz
Gentoo Base System release 1.12.9
Timestamp of tree: Wed, 22 Aug 2007 11:50:01 +0000
dev-lang/python:     2.4.4-r4
dev-python/pycrypto: 2.0.1-r6
sys-apps/sandbox:    1.2.17
sys-devel/autoconf:  2.13, 2.61
sys-devel/automake:  1.7.9-r1, 1.9.6-r2, 1.10
sys-devel/binutils:  2.17
sys-devel/gcc-config: 1.3.16
sys-devel/libtool:   1.5.24
virtual/os-headers:  2.6.21
ACCEPT_KEYWORDS="amd64"
AUTOCLEAN="yes"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=nocona -msse3 -mfpmath=sse -O2 -fomit-frame-pointer -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc"
CONFIG_PROTECT_MASK="/etc/env.d /etc/gconf /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/terminfo"
CXXFLAGS="-march=nocona -msse3 -mfpmath=sse -O2 -fomit-frame-pointer -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="ccache distlocks metadata-transfer sandbox sfperms strict"
GENTOO_MIRRORS="ftp.belnet.be/linux/gentoo ftp.snt.utwente.nl/pub/linux/gentoo"
MAKEOPTS="-j3"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --filter=H_**/files/digest-*"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage"
USE="/ 3dnow 3dnowext acl acpi amd64 apache2 berkdb bitmap-fonts bzip2 cgi cli cracklib crypt dri fortran gdbm glibc-omitfp gpm hash iconv ipv6 isdnlog kerberos midi mmx mmxext mudflap ncurses nls nptl nptlonly openmp openntpd pam pcre perl php posix postgres pppd python readline reflection samba session spl sse sse2 ssl tcpd truetype truetype-fonts type1-fonts unicode xml xorg zip zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" USERLAND="GNU" VIDEO_CARDS="apm ark chips cirrus cyrix dummy fbdev glint i128 i810 mach64 mga neomagic nv r128 radeon rendition s3 s3virge savage siliconmotion sis sisusb tdfx tga trident tseng v4l vesa vga via vmware voodoo"
Unset:  CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS



but compiling gcc fails :(

these are the last lines:

Code:

/var/tmp/portage/sys-devel/gcc-4.3.0_alpha20070817/work/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/char_traits.h: In static member function 'static $
/var/tmp/portage/sys-devel/gcc-4.3.0_alpha20070817/work/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/char_traits.h:141: error: 'EOF' was not declared $
/var/tmp/portage/sys-devel/gcc-4.3.0_alpha20070817/work/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/char_traits.h: In static member function 'static $
/var/tmp/portage/sys-devel/gcc-4.3.0_alpha20070817/work/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/char_traits.h:293: error: 'EOF' was not declared $
make[4]: *** [codecvt.lo] Error 1
make[4]: Leaving directory `/var/tmp/portage/sys-devel/gcc-4.3.0_alpha20070817/work/build/x86_64-pc-linux-gnu/libstdc++-v3/src'
make[3]: *** [all-recursive] Error 1
make[3]: Leaving directory `/var/tmp/portage/sys-devel/gcc-4.3.0_alpha20070817/work/build/x86_64-pc-linux-gnu/libstdc++-v3'
make[2]: *** [all] Error 2
make[2]: Leaving directory `/var/tmp/portage/sys-devel/gcc-4.3.0_alpha20070817/work/build/x86_64-pc-linux-gnu/libstdc++-v3'
make[1]: *** [all-target-libstdc++-v3] Error 2
make[1]: Leaving directory `/var/tmp/portage/sys-devel/gcc-4.3.0_alpha20070817/work/build'
make: *** [profiledbootstrap] Error 2

!!! ERROR: sys-devel/gcc-4.3.0_alpha20070817 failed.
Call stack:
  ebuild.sh, line 1638:   Called dyn_compile
  ebuild.sh, line 985:   Called qa_call 'src_compile'
  ebuild.sh, line 44:   Called src_compile
  ebuild.sh, line 1328:   Called toolchain_src_compile
  toolchain.eclass, line 26:   Called gcc_src_compile
  toolchain.eclass, line 1546:   Called gcc_do_make
  toolchain.eclass, line 1420:   Called die

!!! emake failed with profiledbootstrap
!!! If you need support, post the topmost build error, and the call stack if relevant.
!!! A complete build log is located at '/var/tmp/portage/sys-devel/gcc-4.3.0_alpha20070817/temp/build.log'.


you can find the full output of build.log here.

any ideas?
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Goto page Previous  1, 2
Page 2 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum