Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
CFlags for Intel Atom?
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2, 3  
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
rufnut
Apprentice
Apprentice


Joined: 16 May 2005
Posts: 185

PostPosted: Thu May 28, 2009 8:26 am    Post subject: Reply with quote

gringo wrote:


i´m not sure i get what you mean, that bug explicitly states that distcc will be disabled if -march=native is used, and that´s how it should work IMO.



My sentiments are pretty much the same as this guy here in his last paragraph:

https://bugs.launchpad.net/distcc/+bug/188813

When you look at the bug report:

http://bugs.gentoo.org/223159

Looks like they tried to implement the patch and due to problems it may have been pulled out in later versions of distcc, or maybe it was just modified to fail if "march=native" is detected.
Back to top
View user's profile Send private message
gringo
Advocate
Advocate


Joined: 27 Apr 2003
Posts: 3747

PostPosted: Thu May 28, 2009 8:49 am    Post subject: Reply with quote

Quote:
My sentiments are pretty much the same as this guy here in his last paragraph:


don´t know exactly what you mean, the guy in that bugs explains pretty well the problem and a workaround is available.
I you want to disable distcc for a few packages "manually", there are a few bash hacks available.

Quote:
Looks like they tried to implement the patch and due to problems it may have been pulled out in later versions of distcc, or maybe it was just modified to fail if "march=native" is detected.


don´t know if sth. has changed in the latest version of distcc, i use distcc quite a lot and last time i tried -march=native with distcc ( which was with the first distcc-3.x release) all jobs were processed locally, which is how it should work IMO.

it´s quite easy to test if this is still the case, right ? ;)

cheers
_________________
Error: Failing not supported by current locale
Back to top
View user's profile Send private message
rufnut
Apprentice
Apprentice


Joined: 16 May 2005
Posts: 185

PostPosted: Fri May 29, 2009 5:05 am    Post subject: Reply with quote

From :

https://bugs.launchpad.net/distcc/+bug/188813

Quote:
or
(preferably?) rewrite them to read -march=arch-of-the-build-machine so the
target architecture is the same on all build nodes


I reckon this should be the way it is done.

It creates a bit of work for distcc, as if the "arch" is unknown to say stable gcc and/or distcc (-march=atom)
then maybe it could drop to prescott or whatever the concensus is until gcc 4.5.x is stable.

I am not real keen upgrading all nodes to gcc 4.5.x is the reason for the above statement.

There is nothing stopping me manually setting eg (-march=prescott) for a machine but I would have preferred some automation, which I guess is the reason (-march=native) was introduced.

:)
Back to top
View user's profile Send private message
gringo
Advocate
Advocate


Joined: 27 Apr 2003
Posts: 3747

PostPosted: Fri May 29, 2009 8:25 am    Post subject: Reply with quote

Quote:
or (preferably?) rewrite them to read -march=arch-of-the-build-machine so the
target architecture is the same on all build nodes


do you really want an app like distcc to rewrite your -march setting ? Why don´t you set the correct one in first place ?
And how is that supposed to work if you are crosscompiling f.ex. ?
That doesn´t make any sense to me and in any case i don´t think rewriting compiler parameters is distcc´s job.

That said, i would like to have a better workaround too and just set -march=native everywhere, but it really isn´t that easy.

cheers :)
_________________
Error: Failing not supported by current locale
Back to top
View user's profile Send private message
Rony
n00b
n00b


Joined: 12 Oct 2003
Posts: 20
Location: Hong Kong, China

PostPosted: Fri May 29, 2009 10:16 am    Post subject: Reply with quote

GCC-optimized: adding the suggested GCC compiler flags for Intel® Atom™
Code:
-Wall -O1 -msse3 -march=core2 -mfpmath=sse -pedantic -pipe -fstrength-reduce -fexpensive-optimizations -finline-functions -funroll-loops -foptimize-register-move


I am testing with the above on with Intel's D945GCLF2D (Atom 330).

Regards.
Back to top
View user's profile Send private message
gringo
Advocate
Advocate


Joined: 27 Apr 2003
Posts: 3747

PostPosted: Fri May 29, 2009 11:21 am    Post subject: Reply with quote

if that numbers are correct, that isn´t that bad i would say, i was expecting way more difference between icc and gcc.
Would be great to see the same benchmark with the new atom target included.

I found some time ago a discussion about what would be the best options for gcc when building for an atom and Arjan van de Ven ( intel kernel hacker) suggested -march=core2 -mtune=generic. Note that this was before the atom target was even in development IIRC.

http://lkml.indiana.edu/hypermail/linux/kernel/0810.1/2015.html

cheers guys
_________________
Error: Failing not supported by current locale
Back to top
View user's profile Send private message
rufnut
Apprentice
Apprentice


Joined: 16 May 2005
Posts: 185

PostPosted: Fri May 29, 2009 4:01 pm    Post subject: Reply with quote

gringo wrote:
Quote:
or (preferably?) rewrite them to read -march=arch-of-the-build-machine so the
target architecture is the same on all build nodes


do you really want an app like distcc to rewrite your -march setting ? Why don´t you set the correct one in first place ?


He does say "read -march=arch-of-the-build-machine" not rewrite ?

:?
Back to top
View user's profile Send private message
gringo
Advocate
Advocate


Joined: 27 Apr 2003
Posts: 3747

PostPosted: Fri May 29, 2009 4:08 pm    Post subject: Reply with quote

Quote:
He does say "read -march=arch-of-the-build-machine" not rewrite ?


no, he says "rewrite them to read".
English is not my main language but i get that as "rewriting".

and this starts to be a bit pointless and complete OT.

cheers
_________________
Error: Failing not supported by current locale
Back to top
View user's profile Send private message
rufnut
Apprentice
Apprentice


Joined: 16 May 2005
Posts: 185

PostPosted: Sat May 30, 2009 12:26 am    Post subject: Reply with quote

gringo wrote:
Quote:
He does say "read -march=arch-of-the-build-machine" not rewrite ?


no, he says "rewrite them to read".
English is not my main language but i get that as "rewriting".



He is talking about rewriting distcc.


:)
Back to top
View user's profile Send private message
Mr_Maniac
Guru
Guru


Joined: 10 Jun 2004
Posts: 534

PostPosted: Mon Jun 01, 2009 1:19 pm    Post subject: Reply with quote

Rony wrote:
GCC-optimized: adding the suggested GCC compiler flags for Intel® Atom™
Code:
-Wall -O1 -msse3 -march=core2 -mfpmath=sse -pedantic -pipe -fstrength-reduce -fexpensive-optimizations -finline-functions -funroll-loops -foptimize-register-move


I am testing with the above on with Intel's D945GCLF2D (Atom 330).

Regards.


I have a Intel D945GCLF2, too. System compiled with
Code:
CFLAGS="-march=nocona -O2 -pipe"

GCC: gcc (Gentoo 4.3.3-r2 p1.2, pie-10.1.5) 4.3.3
GLIBC: glibc-2.10.1-r0
Kernel: 2.6.29-r5 - CONFIG_MCORE2=y
64bit-System

With the CFLAGS mentioned by you I get following results:
Code:

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          193.76  :       4.97  :       1.63
STRING SORT         :          44.542  :      19.90  :       3.08
BITFIELD            :      5.6065e+07  :       9.62  :       2.01
FP EMULATION        :          17.793  :       8.54  :       1.97
FOURIER             :            6628  :       7.54  :       4.23
ASSIGNMENT          :           2.757  :      10.49  :       2.72
IDEA                :           739.7  :      11.31  :       3.36
HUFFMAN             :          354.47  :       9.83  :       3.14
NEURAL NET          :          1.9554  :       3.14  :       1.32
LU DECOMPOSITION    :          62.884  :       3.26  :       2.35
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 9.923
FLOATING-POINT INDEX: 4.257
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU                 : 4 CPU GenuineIntel Intel(R) Atom(TM) CPU  330   @ 1.60GHz 1596MHz
L2 Cache            : 512 KB
OS                  : Linux 2.6.29-gentoo-r5
C compiler          : x86_64-pc-linux-gnu-gcc
libc                :
MEMORY INDEX        : 2.563
INTEGER INDEX       : 2.413
FLOATING-POINT INDEX: 2.361
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.


With my standard-CFLAGS
CFLAGS="-march=nocona -O2 -pipe"
I have
Code:

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          497.52  :      12.76  :       4.19
STRING SORT         :          62.335  :      27.85  :       4.31
BITFIELD            :      2.0232e+08  :      34.71  :       7.25
FP EMULATION        :          52.817  :      25.34  :       5.85
FOURIER             :          6763.3  :       7.69  :       4.32
ASSIGNMENT          :          9.4219  :      35.85  :       9.30
IDEA                :          2106.5  :      32.22  :       9.57
HUFFMAN             :          913.79  :      25.34  :       8.09
NEURAL NET          :           8.498  :      13.65  :       5.74
LU DECOMPOSITION    :          311.92  :      16.16  :      11.67
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 26.488
FLOATING-POINT INDEX: 11.927
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU                 : 4 CPU GenuineIntel Intel(R) Atom(TM) CPU  330   @ 1.60GHz 1596MHz
L2 Cache            : 512 KB
OS                  : Linux 2.6.29-gentoo-r5
C compiler          : x86_64-pc-linux-gnu-gcc
libc                :
MEMORY INDEX        : 6.624
INTEGER INDEX       : 6.599
FLOATING-POINT INDEX: 6.615
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.


CFLAGS="-march=core2 -O2 -pipe"]:
Code:

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          510.72  :      13.10  :       4.30
STRING SORT         :          61.406  :      27.44  :       4.25
BITFIELD            :      2.3122e+08  :      39.66  :       8.28
FP EMULATION        :            54.4  :      26.10  :       6.02
FOURIER             :          6757.9  :       7.69  :       4.32
ASSIGNMENT          :          8.7198  :      33.18  :       8.61
IDEA                :          2157.4  :      33.00  :       9.80
HUFFMAN             :          908.02  :      25.18  :       8.04
NEURAL NET          :          10.043  :      16.13  :       6.79
LU DECOMPOSITION    :          408.24  :      21.15  :      15.27
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 26.924
FLOATING-POINT INDEX: 13.790
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU                 : 4 CPU GenuineIntel Intel(R) Atom(TM) CPU  330   @ 1.60GHz 1596MHz
L2 Cache            : 512 KB
OS                  : Linux 2.6.29-gentoo-r5
C compiler          : x86_64-pc-linux-gnu-gcc
libc                :
MEMORY INDEX        : 6.715
INTEGER INDEX       : 6.721
FLOATING-POINT INDEX: 7.648
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.


And the best results so far
CFLAGS="-march=native -O2 -pipe"
Code:

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          508.56  :      13.04  :       4.28
STRING SORT         :          60.862  :      27.20  :       4.21
BITFIELD            :       2.314e+08  :      39.69  :       8.29
FP EMULATION        :          54.498  :      26.15  :       6.03
FOURIER             :          6778.9  :       7.71  :       4.33
ASSIGNMENT          :          9.5709  :      36.42  :       9.45
IDEA                :          2164.4  :      33.10  :       9.83
HUFFMAN             :          911.25  :      25.27  :       8.07
NEURAL NET          :          10.083  :      16.20  :       6.81
LU DECOMPOSITION    :          412.64  :      21.38  :      15.44
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 27.270
FLOATING-POINT INDEX: 13.872
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU                 : 4 CPU GenuineIntel Intel(R) Atom(TM) CPU  330   @ 1.60GHz 1596MHz
L2 Cache            : 512 KB
OS                  : Linux 2.6.29-gentoo-r5
C compiler          : x86_64-pc-linux-gnu-gcc
libc                :
MEMORY INDEX        : 6.908
INTEGER INDEX       : 6.729
FLOATING-POINT INDEX: 7.694
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.


Can someone post results with gcc-4.5 and "-march=atom"? My System is in use (Router/Server), so i don't want to make too big changes...
_________________
Intel Core2Quad Q9450
MSI P35 Neo2-FR
4GB DDR2-800 RAM - Dual Channel
SB X-Fi Fatality Gamer
GeForce GTX 460
Gentoo Linux (kernel 2.6.x (was gerade stable im Portage ist ;) ), amd64, Stage 1)
Windows 7 x64
Windows XP x64 SP2
Back to top
View user's profile Send private message
Bircoph
Apprentice
Apprentice


Joined: 27 Jun 2008
Posts: 260
Location: Moscow

PostPosted: Wed Jul 08, 2009 2:40 am    Post subject: Reply with quote

I use the following for my Atom N270 (on Asus Eee PC 1000H):
Code:

CFLAGS="-march=core2 -m32 --param l1-cache-line-size=64
--param l1-cache-size=32 --param l2-cache-size=512
-O2 -funswitch-loops -fpredictive-commoning
-fgcse-after-reload -ftree-vectorize -fomit-frame-pointer
-mfpmath=sse -pipe"

Some explanation why exactly these flags are used. (I use gcc-4.3.3-r2 ATM: the latest unmasked gcc for Gentoo.)

1) Why not "-march=native"?
That's obvious: a) because current gcc doesn't understand atom properly and will fail to detect it in the best way; b) this will make distcc unusable.

2) Why "-march=core2 -m32"?
Just learn this CPU instruction set, actually it equals to core2 with the exception of x86_64 instructions (also -m32 is required for distcc crosscompilation on amd64):
Code:

% x86info -f
x86info v1.24.  Dave Jones 2001-2009
Feedback to <davej@redhat.com>.

Found 2 CPUs
--------------------------------------------------------------------------
CPU #1
EFamily: 0 EModel: 1 Family: 6 Model: 28 Stepping: 2
CPU Model: Unknown model.
Processor name string: Intel(R) Atom(TM) CPU N270   @ 1.60GHz
Type: 0 (Original OEM)  Brand: 0 (Unsupported)
Number of cores per physical package=1
Number of logical processors per socket=2
Number of logical processors per core=2
APIC ID: 0x0    Package: 0  Core: 0   SMT ID 0
Feature flags:
 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflsh ds acpi mmx fxsr sse sse2 ss ht tm pbe
Extended feature flags:
 sse3 [2] monitor ds-cpl est tm2 ssse3 xTPR [15] [22]

--------------------------------------------------------------------------
CPU #2
EFamily: 0 EModel: 1 Family: 6 Model: 28 Stepping: 2
CPU Model: Unknown model.
Processor name string: Intel(R) Atom(TM) CPU N270   @ 1.60GHz
Type: 0 (Original OEM)  Brand: 0 (Unsupported)
Number of cores per physical package=1
Number of logical processors per socket=2
Number of logical processors per core=2
APIC ID: 0x1    Package: 0  Core: 0   SMT ID 1
Feature flags:
 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflsh ds acpi mmx fxsr sse sse2 ss ht tm pbe
Extended feature flags:
 sse3 [2] monitor ds-cpl est tm2 ssse3 xTPR [15] [22]

--------------------------------------------------------------------------


3) Why "--param l1-cache-line-size=64 --param l1-cache-size=32 --param l2-cache-size=512"?
Because N270 isn't core2: in have smaller l1/l2 cache, thus code generated for core2 will not be so efficient for Atom because of improper cache use: data/code blocks may be too long, etc.
Specifying CPU cache is also always important for distcc: compiler on the other host don't know what CPU you actually use.

4) Why "-O2 -funswitch-loops -fpredictive-commoning -fgcse-after-reload -ftree-vectorize"?
This is actually -O3 -fno-inline-functions. Atom CPU provides relatively small L1/L2 cache, thus its efficiency will be decreased due to extra inlining dramastically, CPU cache should be used for better purposes.

5) Why "-fomit-frame-pointer"?
Because it gains extra free register, this is extremely important because on x86 you have only 4 free-to-use general registers. (JFYI: access to register is 3 times faster that even to L1 cache). If you'll really want to debug something, you'll need recompile it with -g/-g3 anyway.
Isn't it enabled by default? No, it isn't, because it interferes with debugging, read gcc manual.

6) Why -mfpmath=sse?
SSE unit is significantly more efficient than i387 used by default for x86, mostly more due to enhanced instructions. The only problem that i387 unit allows 80-bit width floats, but SSE allows maximum width of 64 bits. In theory this may be a problem for applications relying on 80-bit width floats, but not specifying this explicitly for gcc. Practically I use tons of scientific software (such as root, maxima, R, octave,...) compiled with -mfpmath=sse (in make.conf CFLAGS) for years without any problems.
Ideally -mfpmath=see,i387 as it actually doubles amount of available registers (i387 and sse units are implemented separately by Intel), but gcc register allocator can't model separate units utilization at once, so it is quite risky from the performance POW to use -mfpmath=see,i387 everywhere, you should implement an appropriate assembly by hand.

7) Why "-pipe"?
This speeds compilation up via pipes utilization to avoid temporary files usage. This doesn't affect generated code itself.
_________________
Per aspera ad astra!
Back to top
View user's profile Send private message
Mr_Maniac
Guru
Guru


Joined: 10 Jun 2004
Posts: 534

PostPosted: Wed Jul 08, 2009 5:43 am    Post subject: Reply with quote

Code:

~ # CFLAGS="-march=core2 --param l1-cache-line-size=64 --param l1-cache-size=32 --param l2-cache-size=512 -O2 -funswitch-loops -fpredictive-commoning -fgcse-after-reload -ftree-vectorize -fomit-frame-pointer -mfpmath=sse -pipe" emerge nbench

~ # nbench

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          516.24  :      13.24  :       4.35
STRING SORT         :           62.19  :      27.79  :       4.30
BITFIELD            :      2.3393e+08  :      40.13  :       8.38
FP EMULATION        :          54.894  :      26.34  :       6.08
FOURIER             :          6778.9  :       7.71  :       4.33
ASSIGNMENT          :          9.7533  :      37.11  :       9.63
IDEA                :          2172.2  :      33.22  :       9.86
HUFFMAN             :          921.62  :      25.56  :       8.16
NEURAL NET          :          9.8775  :      15.87  :       6.67
LU DECOMPOSITION    :           418.4  :      21.68  :      15.65
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 27.617
FLOATING-POINT INDEX: 13.841
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU                 : 4 CPU GenuineIntel Intel(R) Atom(TM) CPU  330   @ 1.60GHz 1596MHz
L2 Cache            : 512 KB
OS                  : Linux 2.6.29-gentoo-r5
C compiler          : x86_64-pc-linux-gnu-gcc
libc                :
MEMORY INDEX        : 7.027
INTEGER INDEX       : 6.791
FLOATING-POINT INDEX: 7.676
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.


Okay... It really is a bit faster, but really only a bit ;)
_________________
Intel Core2Quad Q9450
MSI P35 Neo2-FR
4GB DDR2-800 RAM - Dual Channel
SB X-Fi Fatality Gamer
GeForce GTX 460
Gentoo Linux (kernel 2.6.x (was gerade stable im Portage ist ;) ), amd64, Stage 1)
Windows 7 x64
Windows XP x64 SP2
Back to top
View user's profile Send private message
s4e8
Apprentice
Apprentice


Joined: 29 Jul 2006
Posts: 214

PostPosted: Thu Jul 09, 2009 1:38 am    Post subject: Reply with quote

gcc 4.5.0 snapshot 20090702. -march=atom -O3 -mfpmath=sse -fomit-frame-pointer, ATOM N270
nbench score: 539.71 59.888 2.2706e8 87.08 7800.8 13.017 2276.4 979.22, NEURAL NET crashed
compare to Bircoph's CFLAGS, it win: 0.4% 1.4% 28.8% 65.4% 1.7% 8.4% 20% 4.2%
Back to top
View user's profile Send private message
Bircoph
Apprentice
Apprentice


Joined: 27 Jun 2008
Posts: 260
Location: Moscow

PostPosted: Wed Jul 15, 2009 7:56 am    Post subject: Reply with quote

s4e8 wrote:
gcc 4.5.0 snapshot 20090702. -march=atom -O3 -mfpmath=sse -fomit-frame-pointer, ATOM N270
nbench score: 539.71 59.888 2.2706e8 87.08 7800.8 13.017 2276.4 979.22, NEURAL NET crashed
compare to Bircoph's CFLAGS, it win: 0.4% 1.4% 28.8% 65.4% 1.7% 8.4% 20% 4.2%

This result is very interesting. Could you please post
Code:

gcc -Q --help=target -march=atom

?

And be aware of two important aspects:

1) All measurement data should be provided with errors (either absolute with confidence probability or errors in term of standard deviation), otherwise your benefits may be just a game of statistics, nothing more. Of course, you should run tests several times to be able to calculate errors. This way I can't tell that my options are better than Mr_Maniac's: statistical error is higher test delta in my case.

2) nbench is very, eh, specific benchmark: it covers only some aspects of real-world tasks, thus you should be critical to its results. Some small example.
I have two boxes:
a) Athlon-XP 3200+ (2205 MHZ), 64KB L1 512KB L2, 32bit.
b) Celeron D (2533 MHz), 16KB L1, 256KB L2, 64bit.

Here are nbench results (memory/integer/floating indices) with errors in standard deviations:
a) 12.187 \pm 0.021; 14.068 \pm 0.014; 23.135 \pm 0.025
b) 10.36 \pm 0.18; 8.84 \pm 0.05; 13.75 \pm 0.04

As you can see, host (b) is significantly worse host (a) beyond any errors with nbench.
But wait! Try to generate 16KBit RSA key on both hosts. Host (b) appears to be ~8x times faster: due to 64bit mode and 3x more general use registers it strikes in long arithmetic tasks, particularly in anything related to asymmetric cryptography.

Thus be very careful estimating performance only via tests: you should perform really hard work to say (a) better (b): performance varies greatly depending on task in question.
_________________
Per aspera ad astra!
Back to top
View user's profile Send private message
s4e8
Apprentice
Apprentice


Joined: 29 Jul 2006
Posts: 214

PostPosted: Wed Jul 15, 2009 8:22 am    Post subject: Reply with quote

here is results.
Code:

bin # ./gcc -Q --help=target -march=atom 
The following options are target specific:
  -m128bit-long-double                  [disabled]
  -m32                                  [enabled]
  -m3dnow                               [disabled]
  -m3dnowa                              [disabled]
  -m64                                  [disabled]
  -m80387                               [enabled]
  -m96bit-long-double                   [enabled]
  -mabi=                     
  -mabm                                 [disabled]
  -maccumulate-outgoing-args            [disabled]
  -maes                                 [disabled]
  -malign-double                        [disabled]
  -malign-functions=         
  -malign-jumps=             
  -malign-loops=             
  -malign-stringops                     [enabled]
  -march=                               atom
  -masm=                     
  -mavx                                 [disabled]
  -mbranch-cost=             
  -mcld                                 [disabled]
  -mcmodel=                   
  -mcrc32                               [disabled]
  -mcx16                                [disabled]
  -mfancy-math-387                      [enabled]
  -mfma                                 [disabled]
  -mforce-drap                          [disabled]
  -mfp-ret-in-387                       [enabled]
  -mfpmath=                   
  -mfused-madd                          [enabled]
  -mglibc                               [enabled]
  -mhard-float                          [enabled]
  -mieee-fp                             [enabled]
  -mincoming-stack-boundary= 
  -minline-all-stringops                [disabled]
  -minline-stringops-dynamically        [disabled]
  -mintel-syntax                        [disabled]
  -mlarge-data-threshold=     
  -mmmx                                 [disabled]
  -mmovbe                               [disabled]
  -mms-bitfields                        [disabled]
  -mno-align-stringops                  [disabled]
  -mno-fancy-math-387                   [disabled]
  -mno-fused-madd                       [disabled]
  -mno-push-args                        [disabled]
  -mno-red-zone                         [disabled]
  -mno-sse4                             [enabled]
  -momit-leaf-frame-pointer             [disabled]
  -mpc                       
  -mpclmul                              [disabled]
  -mpopcnt                              [disabled]
  -mpreferred-stack-boundary=
  -mpush-args                           [enabled]
  -mrecip                               [disabled]
  -mred-zone                            [enabled]
  -mregparm=                 
  -mrtd                                 [disabled]
  -msahf                                [disabled]
  -msoft-float                          [disabled]
  -msse                                 [disabled]
  -msse2                                [disabled]
  -msse2avx                             [disabled]
  -msse3                                [disabled]
  -msse4                                [disabled]
  -msse4.1                              [disabled]
  -msse4.2                              [disabled]
  -msse4a                               [disabled]
  -msse5                                [disabled]
  -msseregparm                          [disabled]
  -mssse3                               [disabled]
  -mstack-arg-probe                     [disabled]
  -mstackrealign                        [enabled]
  -mstringop-strategy=       
  -mtls-dialect=             
  -mtls-direct-seg-refs                 [enabled]
  -mtune=                     
  -muclibc                              [disabled]
  -mveclibabi=               
Back to top
View user's profile Send private message
Bircoph
Apprentice
Apprentice


Joined: 27 Jun 2008
Posts: 260
Location: Moscow

PostPosted: Wed Jul 15, 2009 8:46 am    Post subject: Reply with quote

This is odd, I can't see any significant difference.
I wonder what the've done...
_________________
Per aspera ad astra!
Back to top
View user's profile Send private message
s4e8
Apprentice
Apprentice


Joined: 29 Jul 2006
Posts: 214

PostPosted: Wed Jul 15, 2009 9:33 am    Post subject: Reply with quote

Bircoph wrote:
This is odd, I can't see any significant difference.
I wonder what the've done...

There's new file atom.md, define some atom specific behavior.
Code:

......
;; Atom is an in-order core with two integer pipelines.


(define_attr "atom_unit" "sishuf,simul,jeu,complex,other"
  (const_string "other"))

(define_attr "atom_sse_attr" "rcp,movdup,lfence,fence,prefetch,sqrt,mxcsr,other"
  (const_string "other"))

(define_automaton "atom")

;;  Atom has two ports: port 0 and port 1 connecting to all execution units
(define_cpu_unit "atom-port-0,atom-port-1" "atom")

;;  EU: Execution Unit
;;  Atom EUs are connected by port 0 or port 1.
......
Back to top
View user's profile Send private message
hielvc
Advocate
Advocate


Joined: 19 Apr 2002
Posts: 2801
Location: Oceanside, Ca

PostPosted: Wed Jul 15, 2009 7:39 pm    Post subject: Reply with quote

s4e8 I ran your code on my AMD Athlon(tm) X2 Dual Core Processor BE-2300. No matter what I put in for "target I got the same output
Quote:
gcc -Q --help=target -march=k8 |awk '/enabled/ {print $1}'
-m64
-m80387
-m96bit-long-double
-malign-stringops
-mfancy-math-387
-mfp-ret-in-387
-mfused-madd
-mglibc
-mhard-float
-mieee-fp
-mno-sse4
-mpush-args
-mred-zone
-mtls-direct-seg-refs


Using this code
Code:
echo 'int main(){return 0;}' > test.c && gcc -v -Q -march=native -O2   test.c -o test && rm test.c test
Using built-in specs.
Target: x86_64-pc-linux-gnu
Configured with: /var/tmp/portage/sys-devel/gcc-4.3.3-r2/work/gcc-4.3.3/configure --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/4.3.3 --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/include --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.3 --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.3/man --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.3/info --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/include/g++-v4 --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --disable-altivec --disable-fixed-point --disable-nls --with-system-zlib --disable-checking --disable-werror --enable-secureplt --enable-multilib --enable-libmudflap --disable-libssp --enable-libgomp --enable-cld --disable-libgcj --enable-languages=c,c++,treelang --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --with-bugurl=http://bugs.gentoo.org/ --with-pkgversion='Gentoo 4.3.3-r2 p1.2, pie-10.1.5'
Thread model: posix
gcc version 4.3.3 (Gentoo 4.3.3-r2 p1.2, pie-10.1.5)
COLLECT_GCC_OPTIONS='-v' '-Q'  '-O2' '-o' 'test'
 /usr/libexec/gcc/x86_64-pc-linux-gnu/4.3.3/cc1 -v test.c -D_FORTIFY_SOURCE=2 -march=k8-sse3 -mcx16 -msahf --param l1-cache-size=64 --param l1-cache-line-size=64 -mtune=k8 -dumpbase test.c -auxbase test -O2 -version -o /tmp/ccoIvqwu.s
ignoring nonexistent directory "/usr/local/include"
ignoring nonexistent directory "/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../x86_64-pc-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/include
 /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/include-fixed
 /usr/include
End of search list.
GNU C (Gentoo 4.3.3-r2 p1.2, pie-10.1.5) version 4.3.3 (x86_64-pc-linux-gnu)
   compiled by GNU C version 4.3.3, GMP version 4.3.1, MPFR version 2.4.1-p5.
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
options passed:  -v test.c -D_FORTIFY_SOURCE=2 -march=k8-sse3 -mcx16
 -msahf --param l1-cache-size=64 --param l1-cache-line-size=64 -mtune=k8
 -O2
options enabled:  -falign-labels -falign-loops -fargument-alias
 -fasynchronous-unwind-tables -fauto-inc-dec -fbranch-count-reg
 -fcaller-saves -fcommon -fcprop-registers -fcrossjumping
 -fcse-follow-jumps -fdefer-pop -fdelete-null-pointer-checks
 -fearly-inlining -feliminate-unused-debug-types -fexpensive-optimizations
 -fforward-propagate -ffunction-cse -fgcse -fgcse-lm
 -fguess-branch-probability -fident -fif-conversion -fif-conversion2
 -finline-functions-called-once -finline-small-functions -fipa-pure-const
 -fipa-reference -fivopts -fkeep-static-consts -fleading-underscore
 -fmath-errno -fmerge-constants -fmerge-debug-strings
 -fmove-loop-invariants -fomit-frame-pointer -foptimize-register-move
 -foptimize-sibling-calls -fpeephole -fpeephole2 -freg-struct-return
 -fregmove -freorder-blocks -freorder-functions -frerun-cse-after-loop
 -fsched-interblock -fsched-spec -fsched-stalled-insns-dep
 -fschedule-insns2 -fsigned-zeros -fsplit-ivs-in-unroller
 -fsplit-wide-types -fstrict-aliasing -fstrict-overflow -fthread-jumps
 -ftoplevel-reorder -ftrapping-math -ftree-ccp -ftree-ch -ftree-copy-prop
 -ftree-copyrename -ftree-cselim -ftree-dce -ftree-dominator-opts
 -ftree-dse -ftree-fre -ftree-loop-im -ftree-loop-ivcanon
 -ftree-loop-optimize -ftree-parallelize-loops= -ftree-pre -ftree-reassoc
 -ftree-salias -ftree-scev-cprop -ftree-sink -ftree-sra -ftree-store-ccp
 -ftree-ter -ftree-vect-loop-version -ftree-vrp -funit-at-a-time
 -funwind-tables -fvar-tracking -fvect-cost-model -fzero-initialized-in-bss
 -m128bit-long-double -m3dnow -m64 -m80387 -maccumulate-outgoing-args
 -malign-stringops -mcx16 -mfancy-math-387 -mfp-ret-in-387 -mfused-madd
 -mglibc -mieee-fp -mmmx -mno-sse4 -mpush-args -mred-zone -msahf -msse
 -msse2 -msse3 -mtls-direct-seg-refs
Compiler executable checksum: f6e169a902c79329927a6921bcb422f4
 main
Analyzing compilation unit
Performing interprocedural optimizations
 <visibility> <early_local_cleanups> <inline> <static-var> <pure-const>Assembling functions:
 main
Execution times (seconds)
 parser                :   0.01 (100%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall      76 kB ( 7%) ggc
 global alloc          :   0.00 ( 0%) usr   0.01 (100%) sys   0.01 (33%) wall       0 kB ( 0%) ggc
 TOTAL                 :   0.01             0.01             0.03               1118 kB
Internal checks disabled; compiler is not suited for release.
Configure with --enable-checking=release to enable checks.
COLLECT_GCC_OPTIONS='-v' '-Q'  '-O2' '-o' 'test'
 /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../x86_64-pc-linux-gnu/bin/as -V -Qy -o /tmp/ccW4B8bR.o /tmp/ccoIvqwu.s
GNU assembler version 2.19.1 (x86_64-pc-linux-gnu) using BFD version (GNU Binutils) 2.19.1
COMPILER_PATH=/usr/libexec/gcc/x86_64-pc-linux-gnu/4.3.3/:/usr/libexec/gcc/x86_64-pc-linux-gnu/4.3.3/:/usr/libexec/gcc/x86_64-pc-linux-gnu/:/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/:/usr/lib/gcc/x86_64-pc-linux-gnu/:/usr/libexec/gcc/x86_64-pc-linux-gnu/4.3.3/:/usr/libexec/gcc/x86_64-pc-linux-gnu/:/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/:/usr/lib/gcc/x86_64-pc-linux-gnu/:/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../x86_64-pc-linux-gnu/bin/
LIBRARY_PATH=/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/:/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/:/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../lib64/:/lib/../lib64/:/usr/lib/../lib64/:/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../x86_64-pc-linux-gnu/lib/:/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-v' '-Q'  '-O2' '-o' 'test'
 /usr/libexec/gcc/x86_64-pc-linux-gnu/4.3.3/collect2 --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o test /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../lib64/crt1.o /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../lib64/crti.o /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/crtbegin.o -L/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3 -L/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3 -L/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../x86_64-pc-linux-gnu/lib -L/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../.. /tmp/ccW4B8bR.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/crtend.o /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../lib64/crtn.o


As you can see its an x86_64-pc-linux-gnu running gcc-4.3.3 using march=ntive which defaults to k8-sse3. As you can see 3dnow and company plus a bunch more are actually enabled. I like my output :wink:
_________________
An A-Z Index of the Linux BASH command line
Back to top
View user's profile Send private message
s4e8
Apprentice
Apprentice


Joined: 29 Jul 2006
Posts: 214

PostPosted: Thu Jul 16, 2009 1:39 am    Post subject: Reply with quote

hielvc wrote:
s4e8 I ran your code on my AMD Athlon(tm) X2 Dual Core Processor BE-2300. No matter what I put in for "target I got the same output

OK, here 's the -Q -v output:
Code:

GNU C (GCC) version 4.5.0 20090702 (experimental) (i686-pc-linux-gnu)
        compiled by GNU C version 4.5.0 20090702 (experimental), GMP version 4.2.4, MPFR version 2.4.1-p1
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
options passed:  -v a.c -march=atom -mfpmath=sse -O3 -fomit-frame-pointer
options enabled:  -falign-labels -falign-loops -fargument-alias
 -fauto-inc-dec -fbranch-count-reg -fcaller-saves -fcommon
 -fcprop-registers -fcrossjumping -fcse-follow-jumps -fdefer-pop
 -fdelete-null-pointer-checks -fdwarf2-cfi-asm -fearly-inlining
 -feliminate-unused-debug-types -fexpensive-optimizations
 -fforward-propagate -ffunction-cse -fgcse -fgcse-after-reload -fgcse-lm
 -fguess-branch-probability -fident -fif-conversion -fif-conversion2
 -findirect-inlining -finline -finline-functions
 -finline-functions-called-once -finline-small-functions -fipa-cp
 -fipa-cp-clone -fipa-pure-const -fipa-reference -fira-share-save-slots
 -fira-share-spill-slots -fivopts -fkeep-static-consts -fleading-underscore
 -fmath-errno -fmerge-constants -fmerge-debug-strings
 -fmove-loop-invariants -fomit-frame-pointer -foptimize-register-move
 -foptimize-sibling-calls -fpcc-struct-return -fpeephole -fpeephole2
 -fpredictive-commoning -fregmove -freorder-blocks -freorder-functions
 -frerun-cse-after-loop -fsched-interblock -fsched-spec
 -fsched-stalled-insns-dep -fschedule-insns2 -fshow-column -fsigned-zeros
 -fsplit-ivs-in-unroller -fsplit-wide-types -fstrict-aliasing
 -fstrict-overflow -fthread-jumps -ftoplevel-reorder -ftrapping-math
 -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-copy-prop
 -ftree-copyrename -ftree-cselim -ftree-dce -ftree-dominator-opts
 -ftree-dse -ftree-forwprop -ftree-fre -ftree-loop-im -ftree-loop-ivcanon
 -ftree-loop-optimize -ftree-parallelize-loops= -ftree-phiprop -ftree-pre
 -ftree-pta -ftree-reassoc -ftree-scev-cprop -ftree-sink
 -ftree-slp-vectorize -ftree-sra -ftree-switch-conversion -ftree-ter
 -ftree-vect-loop-version -ftree-vectorize -ftree-vrp -funit-at-a-time
 -funswitch-loops -fvar-tracking -fvect-cost-model
 -fzero-initialized-in-bss -m32 -m80387 -m96bit-long-double
 -maccumulate-outgoing-args -malign-stringops -mcx16 -mfancy-math-387
 -mfp-ret-in-387 -mfused-madd -mglibc -mieee-fp -mmmx -mmovbe -mno-red-zone
 -mno-sse4 -mpush-args -msahf -msse -msse2 -msse3 -mssse3
 -mtls-direct-seg-refs
Compiler executable checksum: f142bf44665c008856fda3c64386a6ca
 main
Analyzing compilation unit
Performing interprocedural optimizations
 <visibility> <early_local_cleanups> <summary generate> <cp> <inline> <static-var> <pure-const>Assembling functions:
 main
Execution times (seconds)
 callgraph construction:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.03 (11%) wall       0 kB ( 0%) ggc
 parser                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.08 (30%) wall     192 kB (23%) ggc
 tree gimplify         :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 4%) wall       0 kB ( 0%) ggc
 tree CFG construction :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 7%) wall       0 kB ( 0%) ggc
 tree CFG cleanup      :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 4%) wall       0 kB ( 0%) ggc
 tree SSA other        :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 4%) wall       0 kB ( 0%) ggc
 tree CCP              :   0.00 ( 0%) usr   0.01 (100%) sys   0.01 ( 4%) wall       0 kB ( 0%) ggc
 expand                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.03 (11%) wall       3 kB ( 0%) ggc
 combiner              :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 7%) wall       0 kB ( 0%) ggc
 scheduling 2          :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.03 (11%) wall       0 kB ( 0%) ggc
 machine dep reorg     :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 4%) wall       0 kB ( 0%) ggc
 TOTAL                 :   0.01             0.01             0.27                847 kB
Extra diagnostic checks enabled; compiler may run slowly
Back to top
View user's profile Send private message
BillyBoy
Tux's lil' helper
Tux's lil' helper


Joined: 26 Nov 2003
Posts: 101
Location: USA

PostPosted: Thu Jul 30, 2009 9:14 pm    Post subject: My recent results Reply with quote

Code:

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          537.24  :      13.78  :       4.52
STRING SORT         :          58.753  :      26.25  :       4.06
BITFIELD            :      1.7623e+08  :      30.23  :       6.31
FP EMULATION        :          54.418  :      26.11  :       6.03
FOURIER             :          7294.8  :       8.30  :       4.66
ASSIGNMENT          :          11.767  :      44.78  :      11.61
IDEA                :          2044.5  :      31.27  :       9.28
HUFFMAN             :           978.9  :      27.14  :       8.67
NEURAL NET          :          7.4568  :      11.98  :       5.04
LU DECOMPOSITION    :           396.2  :      20.53  :      14.82
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 27.142
FLOATING-POINT INDEX: 12.682
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU                 : 4 CPU GenuineIntel Intel(R) Atom(TM) CPU  330   @ 1.60GHz 1596MHz
L2 Cache            : 512 KB
OS                  : Linux 2.6.29-gentoo-r5
C compiler          : i686-pc-linux-gnu-gcc
libc                :
MEMORY INDEX        : 6.679
INTEGER INDEX       : 6.844
FLOATING-POINT INDEX: 7.034
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.


My CFLAGS:
Code:
CFLAGS="-O2 -march=prescott -mtune=core2 -fomit-frame-pointer -pipe"


My uname:
Code:
Linux atom 2.6.29-gentoo-r5 #3 SMP Wed Jul 29 22:40:06 PDT 2009 i686 Intel(R) Atom(TM) CPU 330 @ 1.60GHz GenuineIntel GNU/Linux


My portage:
Code:
Portage 2.1.6.13 (default/linux/x86/2008.0, gcc-4.3.2, glibc-2.9_p20081201-r2, 2.6.29-gentoo-r5 i686)
=================================================================
System uname: Linux-2.6.29-gentoo-r5-i686-Intel-R-_Atom-TM-_CPU_330_@_1.60GHz-with-glibc2.0
Timestamp of tree: Mon, 27 Jul 2009 10:45:02 +0000


My kit (from dmidecode):
Code:
Base Board Information
        Manufacturer: Intel Corporation
        Product Name: D945GCLF2
        Version: AAE46416-106


I have one stick of DDR2 800 but it only runs at 533 (despite the box saying it can do 667!). I'm actually pretty happy with this. For a hundred bucks, I have a completely usable system. Gotta love Gentoo....
Back to top
View user's profile Send private message
djtreble
n00b
n00b


Joined: 09 Jan 2006
Posts: 39
Location: Brisbane, Australia

PostPosted: Sat Jan 16, 2010 11:41 am    Post subject: Reply with quote

Comparing march=atom to march=core2

Code:
CFLAGS="-O2 -march=core2 -pipe


Code:
BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          479.28  :      12.29  :       4.04
STRING SORT         :          56.235  :      25.13  :       3.89
BITFIELD            :      1.3752e+08  :      23.59  :       4.93
FP EMULATION        :          46.123  :      22.13  :       5.11
FOURIER             :          7237.1  :       8.23  :       4.62
ASSIGNMENT          :          11.877  :      45.19  :      11.72
IDEA                :          1840.9  :      28.16  :       8.36
HUFFMAN             :          849.82  :      23.57  :       7.53
NEURAL NET          :          6.9442  :      11.16  :       4.69
LU DECOMPOSITION    :          399.44  :      20.69  :      14.94
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 24.182
FLOATING-POINT INDEX: 12.385
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU                 : Dual GenuineIntel Intel(R) Atom(TM) CPU N270   @ 1.60GHz 1600MHz
L2 Cache            : 512 KB
OS                  : Linux 2.6.31-gentoo-r6
C compiler          : i686-pc-linux-gnu-gcc
libc                :
MEMORY INDEX        : 6.079
INTEGER INDEX       : 6.001
FLOATING-POINT INDEX: 6.869
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.


Code:
CFLAGS="-O2 -march=atom -pipe"


Code:
BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          512.16  :      13.13  :       4.31
STRING SORT         :          56.093  :      25.06  :       3.88
BITFIELD            :      1.3813e+08  :      23.69  :       4.95
FP EMULATION        :          51.637  :      24.78  :       5.72
FOURIER             :          7118.5  :       8.10  :       4.55
ASSIGNMENT          :          12.773  :      48.60  :      12.61
IDEA                :          1531.4  :      23.42  :       6.95
HUFFMAN             :           868.2  :      24.08  :       7.69
NEURAL NET          :          7.0021  :      11.25  :       4.73
LU DECOMPOSITION    :          379.56  :      19.66  :      14.20
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 24.499
FLOATING-POINT INDEX: 12.143
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU                 : Dual GenuineIntel Intel(R) Atom(TM) CPU N270   @ 1.60GHz 1600MHz
L2 Cache            : 512 KB
OS                  : Linux 2.6.31-gentoo-r6
C compiler          : i686-pc-linux-gnu-gcc
libc                :
MEMORY INDEX        : 6.232
INTEGER INDEX       : 6.026
FLOATING-POINT INDEX: 6.735
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.


Code:
gcc version 4.5.0-alpha20091224 20091224 (experimental) (Gentoo 4.5.0_alpha20091224)


Shows nothing really :-( I ran nbench again and it gave differing results, so I don't really trust it!
Back to top
View user's profile Send private message
b0nafide
Apprentice
Apprentice


Joined: 17 Feb 2008
Posts: 153
Location: ~/

PostPosted: Sat Jan 16, 2010 4:45 pm    Post subject: Reply with quote

Acer Aspire One D150...

Code:
gcc version 4.3.4 (Gentoo 4.3.4 p1.0, pie-10.1.5)
CFLAGS="-O2 -march=core2 -mtune=generic -fomit-frame-pointer -pipe"

# nbench

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          525.72  :      13.48  :       4.43
STRING SORT         :          57.211  :      25.56  :       3.96
BITFIELD            :      1.7151e+08  :      29.42  :       6.15
FP EMULATION        :          56.795  :      27.25  :       6.29
FOURIER             :          7329.5  :       8.34  :       4.68
ASSIGNMENT          :          11.688  :      44.48  :      11.54
IDEA                :          2050.2  :      31.36  :       9.31
HUFFMAN             :          964.26  :      26.74  :       8.54
NEURAL NET          :          7.1714  :      11.52  :       4.85
LU DECOMPOSITION    :          405.76  :      21.02  :      15.18
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 26.942
FLOATING-POINT INDEX: 12.638
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU                 : Dual GenuineIntel Intel(R) Atom(TM) CPU N270   @ 1.60GHz 1600MHz
L2 Cache            : 512 KB
OS                  : Linux 2.6.31-gentoo-r6
C compiler          : i686-pc-linux-gnu-gcc
libc                :
MEMORY INDEX        : 6.546
INTEGER INDEX       : 6.859
FLOATING-POINT INDEX: 7.009
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.
Back to top
View user's profile Send private message
djselbeck
n00b
n00b


Joined: 10 Oct 2005
Posts: 31
Location: Germany

PostPosted: Mon Jan 18, 2010 8:47 pm    Post subject: Reply with quote

on HP Mini 5101:

Code:
CFLAGS="-O2 -march=core2 -mtune=generic -fomit-frame-pointer  -pipe"
gcc 4.3.4


Code:
BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :           553.8  :      14.20  :       4.66
STRING SORT         :           60.52  :      27.04  :       4.19
BITFIELD            :      1.7867e+08  :      30.65  :       6.40
FP EMULATION        :           59.08  :      28.35  :       6.54
FOURIER             :          7646.5  :       8.70  :       4.88
ASSIGNMENT          :          12.227  :      46.53  :      12.07
IDEA                :          2147.4  :      32.84  :       9.75
HUFFMAN             :          1035.4  :      28.71  :       9.17
NEURAL NET          :          7.5818  :      12.18  :       5.12
LU DECOMPOSITION    :          429.08  :      22.23  :      16.05
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 28.329
FLOATING-POINT INDEX: 13.303
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU                 : Dual GenuineIntel Intel(R) Atom(TM) CPU N280   @ 1.66GHz 1667MHz
L2 Cache            : 512 KB
OS                  : Linux 2.6.31-gentoo-r6
C compiler          : i686-pc-linux-gnu-gcc
libc                :
MEMORY INDEX        : 6.864
INTEGER INDEX       : 7.227
FLOATING-POINT INDEX: 7.378
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.
Back to top
View user's profile Send private message
Nuteater
Apprentice
Apprentice


Joined: 25 Sep 2003
Posts: 193
Location: Jyväskylä, Finland

PostPosted: Tue Apr 13, 2010 7:07 pm    Post subject: Reply with quote

I recently upgraded my EEE 901 to a 4.5 prerelease to try -march=atom (and because
my system hasn’t been properly broken for a long time :wink:). Here are the results.

With gcc-4.4.1 and
Code:
CFLAGS="-march=prescott -O2 -fomit-frame-pointer -pipe"


Code:
BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :           527.2  :      13.52  :       4.44
STRING SORT         :          57.857  :      25.85  :       4.00
BITFIELD            :      2.0284e+08  :      34.79  :       7.27
FP EMULATION        :          56.235  :      26.98  :       6.23
FOURIER             :          7325.3  :       8.33  :       4.68
ASSIGNMENT          :          11.777  :      44.81  :      11.62
IDEA                :          1991.2  :      30.46  :       9.04
HUFFMAN             :          869.22  :      24.10  :       7.70
NEURAL NET          :          6.5974  :      10.60  :       4.46
LU DECOMPOSITION    :          310.24  :      16.07  :      11.61
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 27.122
FLOATING-POINT INDEX: 11.237
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU                 : Dual GenuineIntel Intel(R) Atom(TM) CPU N270   @ 1.60GHz 1600MHz
L2 Cache            : 512 KB
OS                  : Linux 2.6.32.8
C compiler          : i686-pc-linux-gnu-gcc
libc                :
MEMORY INDEX        : 6.966
INTEGER INDEX       : 6.623
FLOATING-POINT INDEX: 6.232
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.


With gcc-4.5.0-alpha20100408 and
Code:
CFLAGS="-march=atom -O2 -mssse3 -mfpmath=sse -fexcess-precision=fast -fomit-frame-pointer -pipe"


Code:
BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          523.92  :      13.44  :       4.41
STRING SORT         :          59.896  :      26.76  :       4.14
BITFIELD            :      1.4147e+08  :      24.27  :       5.07
FP EMULATION        :          54.872  :      26.33  :       6.08
FOURIER             :          7708.9  :       8.77  :       4.92
ASSIGNMENT          :          13.934  :      53.02  :      13.75
IDEA                :          1939.2  :      29.66  :       8.81
HUFFMAN             :          1017.2  :      28.21  :       9.01
NEURAL NET          :          9.6915  :      15.57  :       6.55
LU DECOMPOSITION    :          451.44  :      23.39  :      16.89
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 26.900
FLOATING-POINT INDEX: 14.724
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU                 : Dual GenuineIntel Intel(R) Atom(TM) CPU N270   @ 1.60GHz 800MHz
L2 Cache            : 512 KB
OS                  : Linux 2.6.32.8
C compiler          : i686-pc-linux-gnu-gcc
libc                :
MEMORY INDEX        : 6.610
INTEGER INDEX       : 6.791
FLOATING-POINT INDEX: 8.166
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.


Of course an artificial benchmark such as this doesn’t tell much, but floating point performance seems to be improved by a significant amount. Of course this may be just because of the other optimizations such as -mfpmath=sse.
_________________
I am Nuteater, hear me roar.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Goto page Previous  1, 2, 3
Page 3 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum