Gentoo Forums :: View topic - CFlags for Intel Atom?

CFlags for Intel Atom?

View unanswered posts
View posts from last 24 hours

Goto page Previous 1, 2, 3

Gentoo Forums Forum Index

Kernel & Hardware

View previous topic :: View next topic

Author

Message

rufnut
Apprentice

Joined: 16 May 2005
Posts: 247

Posted: Thu May 28, 2009 8:26 am Post subject:

gringo wrote:

i´m not sure i get what you mean, that bug explicitly states that distcc will be disabled if -march=native is used, and that´s how it should work IMO.

My sentiments are pretty much the same as this guy here in his last paragraph:

https://bugs.launchpad.net/distcc/+bug/188813

When you look at the bug report:

https://bugs.gentoo.org/223159

Looks like they tried to implement the patch and due to problems it may have been pulled out in later versions of distcc, or maybe it was just modified to fail if "march=native" is detected.

gringo
Advocate

Joined: 27 Apr 2003
Posts: 3793

Posted: Thu May 28, 2009 8:49 am Post subject:

Quote:

My sentiments are pretty much the same as this guy here in his last paragraph:

don´t know exactly what you mean, the guy in that bugs explains pretty well the problem and a workaround is available.
I you want to disable distcc for a few packages "manually", there are a few bash hacks available.

Quote:

Looks like they tried to implement the patch and due to problems it may have been pulled out in later versions of distcc, or maybe it was just modified to fail if "march=native" is detected.

don´t know if sth. has changed in the latest version of distcc, i use distcc quite a lot and last time i tried -march=native with distcc ( which was with the first distcc-3.x release) all jobs were processed locally, which is how it should work IMO.

it´s quite easy to test if this is still the case, right ?

cheers
_________________
Error: Failing not supported by current locale

rufnut
Apprentice

Joined: 16 May 2005
Posts: 247

Posted: Fri May 29, 2009 5:05 am Post subject:

From :

https://bugs.launchpad.net/distcc/+bug/188813

Quote:

or
(preferably?) rewrite them to read -march=arch-of-the-build-machine so the
target architecture is the same on all build nodes

I reckon this should be the way it is done.

It creates a bit of work for distcc, as if the "arch" is unknown to say stable gcc and/or distcc (-march=atom)
then maybe it could drop to prescott or whatever the concensus is until gcc 4.5.x is stable.

I am not real keen upgrading all nodes to gcc 4.5.x is the reason for the above statement.

There is nothing stopping me manually setting eg (-march=prescott) for a machine but I would have preferred some automation, which I guess is the reason (-march=native) was introduced.

gringo
Advocate

Joined: 27 Apr 2003
Posts: 3793

Posted: Fri May 29, 2009 8:25 am Post subject:

Quote:

or (preferably?) rewrite them to read -march=arch-of-the-build-machine so the
target architecture is the same on all build nodes

do you really want an app like distcc to rewrite your -march setting ? Why don´t you set the correct one in first place ?
And how is that supposed to work if you are crosscompiling f.ex. ?
That doesn´t make any sense to me and in any case i don´t think rewriting compiler parameters is distcc´s job.

That said, i would like to have a better workaround too and just set -march=native everywhere, but it really isn´t that easy.

cheers

_________________
Error: Failing not supported by current locale

Rony
n00b
n00b

Joined: 12 Oct 2003
Posts: 20
Location: Hong Kong, China

Posted: Fri May 29, 2009 10:16 am Post subject:

GCC-optimized: adding the suggested GCC compiler flags for Intel® Atom™

Code:

-Wall -O1 -msse3 -march=core2 -mfpmath=sse -pedantic -pipe -fstrength-reduce -fexpensive-optimizations -finline-functions -funroll-loops -foptimize-register-move

I am testing with the above on with Intel's D945GCLF2D (Atom 330).

Regards.

gringo
Advocate

Joined: 27 Apr 2003
Posts: 3793

Posted: Fri May 29, 2009 11:21 am Post subject:

if that numbers are correct, that isn´t that bad i would say, i was expecting way more difference between icc and gcc. Would be great to see the same benchmark with the new atom target included. I found some time ago a discussion about what would be the best options for gcc when building for an atom and Arjan van de Ven ( intel kernel hacker) suggested -march=core2 -mtune=generic. Note that this was before the atom target was even in development IIRC. http://lkml.indiana.edu/hypermail/linux/kernel/0810.1/2015.html cheers guys _________________ Error: Failing not supported by current locale

rufnut
Apprentice

Joined: 16 May 2005
Posts: 247

Posted: Fri May 29, 2009 4:01 pm Post subject:

gringo wrote:

Quote:

or (preferably?) rewrite them to read -march=arch-of-the-build-machine so the
target architecture is the same on all build nodes

do you really want an app like distcc to rewrite your -march setting ? Why don´t you set the correct one in first place ?

He does say "read -march=arch-of-the-build-machine" not rewrite ?

gringo
Advocate

Joined: 27 Apr 2003
Posts: 3793

Posted: Fri May 29, 2009 4:08 pm Post subject:

Quote:

He does say "read -march=arch-of-the-build-machine" not rewrite ?

no, he says "rewrite them to read".
English is not my main language but i get that as "rewriting".

and this starts to be a bit pointless and complete OT.

cheers
_________________
Error: Failing not supported by current locale

rufnut
Apprentice

Joined: 16 May 2005
Posts: 247

Posted: Sat May 30, 2009 12:26 am Post subject:

gringo wrote:

Quote:

He does say "read -march=arch-of-the-build-machine" not rewrite ?

no, he says "rewrite them to read".
English is not my main language but i get that as "rewriting".

He is talking about rewriting distcc.

Mr_Maniac
Guru
Guru

Joined: 10 Jun 2004
Posts: 543

Posted: Mon Jun 01, 2009 1:19 pm Post subject:

Rony wrote:

GCC-optimized: adding the suggested GCC compiler flags for Intel® Atom™

Code:

-Wall -O1 -msse3 -march=core2 -mfpmath=sse -pedantic -pipe -fstrength-reduce -fexpensive-optimizations -finline-functions -funroll-loops -foptimize-register-move

I am testing with the above on with Intel's D945GCLF2D (Atom 330).

Regards.

I have a Intel D945GCLF2, too. System compiled with

Code:

CFLAGS="-march=nocona -O2 -pipe"

GCC: gcc (Gentoo 4.3.3-r2 p1.2, pie-10.1.5) 4.3.3
GLIBC: glibc-2.10.1-r0
Kernel: 2.6.29-r5 - CONFIG_MCORE2=y
64bit-System

With the CFLAGS mentioned by you I get following results:

Code:

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 193.76 : 4.97 : 1.63
STRING SORT : 44.542 : 19.90 : 3.08
BITFIELD : 5.6065e+07 : 9.62 : 2.01
FP EMULATION : 17.793 : 8.54 : 1.97
FOURIER : 6628 : 7.54 : 4.23
ASSIGNMENT : 2.757 : 10.49 : 2.72
IDEA : 739.7 : 11.31 : 3.36
HUFFMAN : 354.47 : 9.83 : 3.14
NEURAL NET : 1.9554 : 3.14 : 1.32
LU DECOMPOSITION : 62.884 : 3.26 : 2.35
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 9.923
FLOATING-POINT INDEX: 4.257
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : 4 CPU GenuineIntel Intel(R) Atom(TM) CPU 330 @ 1.60GHz 1596MHz
L2 Cache : 512 KB
OS : Linux 2.6.29-gentoo-r5
C compiler : x86_64-pc-linux-gnu-gcc
libc :
MEMORY INDEX : 2.563
INTEGER INDEX : 2.413
FLOATING-POINT INDEX: 2.361
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.

With my standard-CFLAGS
CFLAGS="-march=nocona -O2 -pipe"
I have

Code:

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 497.52 : 12.76 : 4.19
STRING SORT : 62.335 : 27.85 : 4.31
BITFIELD : 2.0232e+08 : 34.71 : 7.25
FP EMULATION : 52.817 : 25.34 : 5.85
FOURIER : 6763.3 : 7.69 : 4.32
ASSIGNMENT : 9.4219 : 35.85 : 9.30
IDEA : 2106.5 : 32.22 : 9.57
HUFFMAN : 913.79 : 25.34 : 8.09
NEURAL NET : 8.498 : 13.65 : 5.74
LU DECOMPOSITION : 311.92 : 16.16 : 11.67
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 26.488
FLOATING-POINT INDEX: 11.927
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : 4 CPU GenuineIntel Intel(R) Atom(TM) CPU 330 @ 1.60GHz 1596MHz
L2 Cache : 512 KB
OS : Linux 2.6.29-gentoo-r5
C compiler : x86_64-pc-linux-gnu-gcc
libc :
MEMORY INDEX : 6.624
INTEGER INDEX : 6.599
FLOATING-POINT INDEX: 6.615
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.

CFLAGS="-march=core2 -O2 -pipe"]:

Code:

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 510.72 : 13.10 : 4.30
STRING SORT : 61.406 : 27.44 : 4.25
BITFIELD : 2.3122e+08 : 39.66 : 8.28
FP EMULATION : 54.4 : 26.10 : 6.02
FOURIER : 6757.9 : 7.69 : 4.32
ASSIGNMENT : 8.7198 : 33.18 : 8.61
IDEA : 2157.4 : 33.00 : 9.80
HUFFMAN : 908.02 : 25.18 : 8.04
NEURAL NET : 10.043 : 16.13 : 6.79
LU DECOMPOSITION : 408.24 : 21.15 : 15.27
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 26.924
FLOATING-POINT INDEX: 13.790
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : 4 CPU GenuineIntel Intel(R) Atom(TM) CPU 330 @ 1.60GHz 1596MHz
L2 Cache : 512 KB
OS : Linux 2.6.29-gentoo-r5
C compiler : x86_64-pc-linux-gnu-gcc
libc :
MEMORY INDEX : 6.715
INTEGER INDEX : 6.721
FLOATING-POINT INDEX: 7.648
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.

And the best results so far
CFLAGS="-march=native -O2 -pipe"

Code:

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 508.56 : 13.04 : 4.28
STRING SORT : 60.862 : 27.20 : 4.21
BITFIELD : 2.314e+08 : 39.69 : 8.29
FP EMULATION : 54.498 : 26.15 : 6.03
FOURIER : 6778.9 : 7.71 : 4.33
ASSIGNMENT : 9.5709 : 36.42 : 9.45
IDEA : 2164.4 : 33.10 : 9.83
HUFFMAN : 911.25 : 25.27 : 8.07
NEURAL NET : 10.083 : 16.20 : 6.81
LU DECOMPOSITION : 412.64 : 21.38 : 15.44
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 27.270
FLOATING-POINT INDEX: 13.872
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : 4 CPU GenuineIntel Intel(R) Atom(TM) CPU 330 @ 1.60GHz 1596MHz
L2 Cache : 512 KB
OS : Linux 2.6.29-gentoo-r5
C compiler : x86_64-pc-linux-gnu-gcc
libc :
MEMORY INDEX : 6.908
INTEGER INDEX : 6.729
FLOATING-POINT INDEX: 7.694
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.

Can someone post results with gcc-4.5 and "-march=atom"? My System is in use (Router/Server), so i don't want to make too big changes...
_________________
AMD Ryzen 5900X
64 GB DDR4 RAM
GeForce RTX 3080
Gentoo Linux (most recent stable kernel - amd64)
Windows 11 x64

Bircoph
Developer

Joined: 27 Jun 2008
Posts: 261
Location: Moscow

Posted: Wed Jul 08, 2009 2:40 am Post subject:

I use the following for my Atom N270 (on Asus Eee PC 1000H):

Code:

CFLAGS="-march=core2 -m32 --param l1-cache-line-size=64
--param l1-cache-size=32 --param l2-cache-size=512
-O2 -funswitch-loops -fpredictive-commoning
-fgcse-after-reload -ftree-vectorize -fomit-frame-pointer
-mfpmath=sse -pipe"

Some explanation why exactly these flags are used. (I use gcc-4.3.3-r2 ATM: the latest unmasked gcc for Gentoo.)

1) Why not "-march=native"?
That's obvious: a) because current gcc doesn't understand atom properly and will fail to detect it in the best way; b) this will make distcc unusable.

2) Why "-march=core2 -m32"?
Just learn this CPU instruction set, actually it equals to core2 with the exception of x86_64 instructions (also -m32 is required for distcc crosscompilation on amd64):

Code:

% x86info -f
x86info v1.24. Dave Jones 2001-2009
Feedback to <davej@redhat.com>.

Found 2 CPUs
--------------------------------------------------------------------------
CPU #1
EFamily: 0 EModel: 1 Family: 6 Model: 28 Stepping: 2
CPU Model: Unknown model.
Processor name string: Intel(R) Atom(TM) CPU N270 @ 1.60GHz
Type: 0 (Original OEM) Brand: 0 (Unsupported)
Number of cores per physical package=1
Number of logical processors per socket=2
Number of logical processors per core=2
APIC ID: 0x0 Package: 0 Core: 0 SMT ID 0
Feature flags:
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflsh ds acpi mmx fxsr sse sse2 ss ht tm pbe
Extended feature flags:
sse3 [2] monitor ds-cpl est tm2 ssse3 xTPR [15] [22]

--------------------------------------------------------------------------
CPU #2
EFamily: 0 EModel: 1 Family: 6 Model: 28 Stepping: 2
CPU Model: Unknown model.
Processor name string: Intel(R) Atom(TM) CPU N270 @ 1.60GHz
Type: 0 (Original OEM) Brand: 0 (Unsupported)
Number of cores per physical package=1
Number of logical processors per socket=2
Number of logical processors per core=2
APIC ID: 0x1 Package: 0 Core: 0 SMT ID 1
Feature flags:
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflsh ds acpi mmx fxsr sse sse2 ss ht tm pbe
Extended feature flags:
sse3 [2] monitor ds-cpl est tm2 ssse3 xTPR [15] [22]

--------------------------------------------------------------------------

3) Why "--param l1-cache-line-size=64 --param l1-cache-size=32 --param l2-cache-size=512"?
Because N270 isn't core2: in have smaller l1/l2 cache, thus code generated for core2 will not be so efficient for Atom because of improper cache use: data/code blocks may be too long, etc.
Specifying CPU cache is also always important for distcc: compiler on the other host don't know what CPU you actually use.

4) Why "-O2 -funswitch-loops -fpredictive-commoning -fgcse-after-reload -ftree-vectorize"?
This is actually -O3 -fno-inline-functions. Atom CPU provides relatively small L1/L2 cache, thus its efficiency will be decreased due to extra inlining dramastically, CPU cache should be used for better purposes.

5) Why "-fomit-frame-pointer"?
Because it gains extra free register, this is extremely important because on x86 you have only 4 free-to-use general registers. (JFYI: access to register is 3 times faster that even to L1 cache). If you'll really want to debug something, you'll need recompile it with -g/-g3 anyway.
Isn't it enabled by default? No, it isn't, because it interferes with debugging, read gcc manual.

6) Why -mfpmath=sse?
SSE unit is significantly more efficient than i387 used by default for x86, mostly more due to enhanced instructions. The only problem that i387 unit allows 80-bit width floats, but SSE allows maximum width of 64 bits. In theory this may be a problem for applications relying on 80-bit width floats, but not specifying this explicitly for gcc. Practically I use tons of scientific software (such as root, maxima, R, octave,...) compiled with -mfpmath=sse (in make.conf CFLAGS) for years without any problems.
Ideally -mfpmath=see,i387 as it actually doubles amount of available registers (i387 and sse units are implemented separately by Intel), but gcc register allocator can't model separate units utilization at once, so it is quite risky from the performance POW to use -mfpmath=see,i387 everywhere, you should implement an appropriate assembly by hand.

7) Why "-pipe"?
This speeds compilation up via pipes utilization to avoid temporary files usage. This doesn't affect generated code itself.
_________________
Per aspera ad astra!

Mr_Maniac
Guru
Guru

Joined: 10 Jun 2004
Posts: 543

Posted: Wed Jul 08, 2009 5:43 am Post subject:

Code:

~ # CFLAGS="-march=core2 --param l1-cache-line-size=64 --param l1-cache-size=32 --param l2-cache-size=512 -O2 -funswitch-loops -fpredictive-commoning -fgcse-after-reload -ftree-vectorize -fomit-frame-pointer -mfpmath=sse -pipe" emerge nbench

~ # nbench

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 516.24 : 13.24 : 4.35
STRING SORT : 62.19 : 27.79 : 4.30
BITFIELD : 2.3393e+08 : 40.13 : 8.38
FP EMULATION : 54.894 : 26.34 : 6.08
FOURIER : 6778.9 : 7.71 : 4.33
ASSIGNMENT : 9.7533 : 37.11 : 9.63
IDEA : 2172.2 : 33.22 : 9.86
HUFFMAN : 921.62 : 25.56 : 8.16
NEURAL NET : 9.8775 : 15.87 : 6.67
LU DECOMPOSITION : 418.4 : 21.68 : 15.65
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 27.617
FLOATING-POINT INDEX: 13.841
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : 4 CPU GenuineIntel Intel(R) Atom(TM) CPU 330 @ 1.60GHz 1596MHz
L2 Cache : 512 KB
OS : Linux 2.6.29-gentoo-r5
C compiler : x86_64-pc-linux-gnu-gcc
libc :
MEMORY INDEX : 7.027
INTEGER INDEX : 6.791
FLOATING-POINT INDEX: 7.676
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.

Okay... It really is a bit faster, but really only a bit

_________________
AMD Ryzen 5900X
64 GB DDR4 RAM
GeForce RTX 3080
Gentoo Linux (most recent stable kernel - amd64)
Windows 11 x64

s4e8
Guru
Guru

Joined: 29 Jul 2006
Posts: 311

Posted: Thu Jul 09, 2009 1:38 am Post subject:

gcc 4.5.0 snapshot 20090702. -march=atom -O3 -mfpmath=sse -fomit-frame-pointer, ATOM N270 nbench score: 539.71 59.888 2.2706e8 87.08 7800.8 13.017 2276.4 979.22, NEURAL NET crashed compare to Bircoph's CFLAGS, it win: 0.4% 1.4% 28.8% 65.4% 1.7% 8.4% 20% 4.2%

Bircoph
Developer

Joined: 27 Jun 2008
Posts: 261
Location: Moscow

Posted: Wed Jul 15, 2009 7:56 am Post subject:

s4e8 wrote:

gcc 4.5.0 snapshot 20090702. -march=atom -O3 -mfpmath=sse -fomit-frame-pointer, ATOM N270
nbench score: 539.71 59.888 2.2706e8 87.08 7800.8 13.017 2276.4 979.22, NEURAL NET crashed
compare to Bircoph's CFLAGS, it win: 0.4% 1.4% 28.8% 65.4% 1.7% 8.4% 20% 4.2%

This result is very interesting. Could you please post

Code:

gcc -Q --help=target -march=atom

?

And be aware of two important aspects:

1) All measurement data should be provided with errors (either absolute with confidence probability or errors in term of standard deviation), otherwise your benefits may be just a game of statistics, nothing more. Of course, you should run tests several times to be able to calculate errors. This way I can't tell that my options are better than Mr_Maniac's: statistical error is higher test delta in my case.

2) nbench is very, eh, specific benchmark: it covers only some aspects of real-world tasks, thus you should be critical to its results. Some small example.
I have two boxes:
a) Athlon-XP 3200+ (2205 MHZ), 64KB L1 512KB L2, 32bit.
b) Celeron D (2533 MHz), 16KB L1, 256KB L2, 64bit.

Here are nbench results (memory/integer/floating indices) with errors in standard deviations:
a) 12.187 \pm 0.021; 14.068 \pm 0.014; 23.135 \pm 0.025
b) 10.36 \pm 0.18; 8.84 \pm 0.05; 13.75 \pm 0.04

As you can see, host (b) is significantly worse host (a) beyond any errors with nbench.
But wait! Try to generate 16KBit RSA key on both hosts. Host (b) appears to be ~8x times faster: due to 64bit mode and 3x more general use registers it strikes in long arithmetic tasks, particularly in anything related to asymmetric cryptography.

Thus be very careful estimating performance only via tests: you should perform really hard work to say (a) better (b): performance varies greatly depending on task in question.
_________________
Per aspera ad astra!

s4e8
Guru
Guru

Joined: 29 Jul 2006
Posts: 311

Posted: Wed Jul 15, 2009 8:22 am Post subject:

here is results.

Code:

bin # ./gcc -Q --help=target -march=atom
The following options are target specific:
-m128bit-long-double [disabled]
-m32 [enabled]
-m3dnow [disabled]
-m3dnowa [disabled]
-m64 [disabled]
-m80387 [enabled]
-m96bit-long-double [enabled]
-mabi=
-mabm [disabled]
-maccumulate-outgoing-args [disabled]
-maes [disabled]
-malign-double [disabled]
-malign-functions=
-malign-jumps=
-malign-loops=
-malign-stringops [enabled]
-march= atom
-masm=
-mavx [disabled]
-mbranch-cost=
-mcld [disabled]
-mcmodel=
-mcrc32 [disabled]
-mcx16 [disabled]
-mfancy-math-387 [enabled]
-mfma [disabled]
-mforce-drap [disabled]
-mfp-ret-in-387 [enabled]
-mfpmath=
-mfused-madd [enabled]
-mglibc [enabled]
-mhard-float [enabled]
-mieee-fp [enabled]
-mincoming-stack-boundary=
-minline-all-stringops [disabled]
-minline-stringops-dynamically [disabled]
-mintel-syntax [disabled]
-mlarge-data-threshold=
-mmmx [disabled]
-mmovbe [disabled]
-mms-bitfields [disabled]
-mno-align-stringops [disabled]
-mno-fancy-math-387 [disabled]
-mno-fused-madd [disabled]
-mno-push-args [disabled]
-mno-red-zone [disabled]
-mno-sse4 [enabled]
-momit-leaf-frame-pointer [disabled]
-mpc
-mpclmul [disabled]
-mpopcnt [disabled]
-mpreferred-stack-boundary=
-mpush-args [enabled]
-mrecip [disabled]
-mred-zone [enabled]
-mregparm=
-mrtd [disabled]
-msahf [disabled]
-msoft-float [disabled]
-msse [disabled]
-msse2 [disabled]
-msse2avx [disabled]
-msse3 [disabled]
-msse4 [disabled]
-msse4.1 [disabled]
-msse4.2 [disabled]
-msse4a [disabled]
-msse5 [disabled]
-msseregparm [disabled]
-mssse3 [disabled]
-mstack-arg-probe [disabled]
-mstackrealign [enabled]
-mstringop-strategy=
-mtls-dialect=
-mtls-direct-seg-refs [enabled]
-mtune=
-muclibc [disabled]
-mveclibabi=

Bircoph
Developer

Joined: 27 Jun 2008
Posts: 261
Location: Moscow

Posted: Wed Jul 15, 2009 8:46 am Post subject:

This is odd, I can't see any significant difference. I wonder what the've done... _________________ Per aspera ad astra!

s4e8
Guru
Guru

Joined: 29 Jul 2006
Posts: 311

Posted: Wed Jul 15, 2009 9:33 am Post subject:

Bircoph wrote:

This is odd, I can't see any significant difference.
I wonder what the've done...

There's new file atom.md, define some atom specific behavior.

Code:

......
;; Atom is an in-order core with two integer pipelines.

(define_attr "atom_unit" "sishuf,simul,jeu,complex,other"
(const_string "other"))

(define_attr "atom_sse_attr" "rcp,movdup,lfence,fence,prefetch,sqrt,mxcsr,other"
(const_string "other"))

(define_automaton "atom")

;; Atom has two ports: port 0 and port 1 connecting to all execution units
(define_cpu_unit "atom-port-0,atom-port-1" "atom")

;; EU: Execution Unit
;; Atom EUs are connected by port 0 or port 1.
......

hielvc
Advocate

Joined: 19 Apr 2002
Posts: 2805
Location: Oceanside, Ca

Posted: Wed Jul 15, 2009 7:39 pm Post subject:

s4e8 I ran your code on my AMD Athlon(tm) X2 Dual Core Processor BE-2300. No matter what I put in for "target I got the same output

Quote:

gcc -Q --help=target -march=k8 |awk '/enabled/ {print $1}'
-m64
-m80387
-m96bit-long-double
-malign-stringops
-mfancy-math-387
-mfp-ret-in-387
-mfused-madd
-mglibc
-mhard-float
-mieee-fp
-mno-sse4
-mpush-args
-mred-zone
-mtls-direct-seg-refs

Using this code

Code:

echo 'int main(){return 0;}' > test.c && gcc -v -Q -march=native -O2 test.c -o test && rm test.c test
Using built-in specs.
Target: x86_64-pc-linux-gnu
Configured with: /var/tmp/portage/sys-devel/gcc-4.3.3-r2/work/gcc-4.3.3/configure --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/4.3.3 --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/include --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.3 --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.3/man --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.3/info --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/include/g++-v4 --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --disable-altivec --disable-fixed-point --disable-nls --with-system-zlib --disable-checking --disable-werror --enable-secureplt --enable-multilib --enable-libmudflap --disable-libssp --enable-libgomp --enable-cld --disable-libgcj --enable-languages=c,c++,treelang --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --with-bugurl=http://bugs.gentoo.org/ --with-pkgversion='Gentoo 4.3.3-r2 p1.2, pie-10.1.5'
Thread model: posix
gcc version 4.3.3 (Gentoo 4.3.3-r2 p1.2, pie-10.1.5)
COLLECT_GCC_OPTIONS='-v' '-Q' '-O2' '-o' 'test'
/usr/libexec/gcc/x86_64-pc-linux-gnu/4.3.3/cc1 -v test.c -D_FORTIFY_SOURCE=2 -march=k8-sse3 -mcx16 -msahf --param l1-cache-size=64 --param l1-cache-line-size=64 -mtune=k8 -dumpbase test.c -auxbase test -O2 -version -o /tmp/ccoIvqwu.s
ignoring nonexistent directory "/usr/local/include"
ignoring nonexistent directory "/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../x86_64-pc-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/include
/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/include-fixed
/usr/include
End of search list.
GNU C (Gentoo 4.3.3-r2 p1.2, pie-10.1.5) version 4.3.3 (x86_64-pc-linux-gnu)
compiled by GNU C version 4.3.3, GMP version 4.3.1, MPFR version 2.4.1-p5.
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
options passed: -v test.c -D_FORTIFY_SOURCE=2 -march=k8-sse3 -mcx16
-msahf --param l1-cache-size=64 --param l1-cache-line-size=64 -mtune=k8
-O2
options enabled: -falign-labels -falign-loops -fargument-alias
-fasynchronous-unwind-tables -fauto-inc-dec -fbranch-count-reg
-fcaller-saves -fcommon -fcprop-registers -fcrossjumping
-fcse-follow-jumps -fdefer-pop -fdelete-null-pointer-checks
-fearly-inlining -feliminate-unused-debug-types -fexpensive-optimizations
-fforward-propagate -ffunction-cse -fgcse -fgcse-lm
-fguess-branch-probability -fident -fif-conversion -fif-conversion2
-finline-functions-called-once -finline-small-functions -fipa-pure-const
-fipa-reference -fivopts -fkeep-static-consts -fleading-underscore
-fmath-errno -fmerge-constants -fmerge-debug-strings
-fmove-loop-invariants -fomit-frame-pointer -foptimize-register-move
-foptimize-sibling-calls -fpeephole -fpeephole2 -freg-struct-return
-fregmove -freorder-blocks -freorder-functions -frerun-cse-after-loop
-fsched-interblock -fsched-spec -fsched-stalled-insns-dep
-fschedule-insns2 -fsigned-zeros -fsplit-ivs-in-unroller
-fsplit-wide-types -fstrict-aliasing -fstrict-overflow -fthread-jumps
-ftoplevel-reorder -ftrapping-math -ftree-ccp -ftree-ch -ftree-copy-prop
-ftree-copyrename -ftree-cselim -ftree-dce -ftree-dominator-opts
-ftree-dse -ftree-fre -ftree-loop-im -ftree-loop-ivcanon
-ftree-loop-optimize -ftree-parallelize-loops= -ftree-pre -ftree-reassoc
-ftree-salias -ftree-scev-cprop -ftree-sink -ftree-sra -ftree-store-ccp
-ftree-ter -ftree-vect-loop-version -ftree-vrp -funit-at-a-time
-funwind-tables -fvar-tracking -fvect-cost-model -fzero-initialized-in-bss
-m128bit-long-double -m3dnow -m64 -m80387 -maccumulate-outgoing-args
-malign-stringops -mcx16 -mfancy-math-387 -mfp-ret-in-387 -mfused-madd
-mglibc -mieee-fp -mmmx -mno-sse4 -mpush-args -mred-zone -msahf -msse
-msse2 -msse3 -mtls-direct-seg-refs
Compiler executable checksum: f6e169a902c79329927a6921bcb422f4
main
Analyzing compilation unit
Performing interprocedural optimizations
<visibility> <early_local_cleanups> <inline> <static-var> <pure-const>Assembling functions:
main
Execution times (seconds)
parser : 0.01 (100%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 76 kB ( 7%) ggc
global alloc : 0.00 ( 0%) usr 0.01 (100%) sys 0.01 (33%) wall 0 kB ( 0%) ggc
TOTAL : 0.01 0.01 0.03 1118 kB
Internal checks disabled; compiler is not suited for release.
Configure with --enable-checking=release to enable checks.
COLLECT_GCC_OPTIONS='-v' '-Q' '-O2' '-o' 'test'
/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../x86_64-pc-linux-gnu/bin/as -V -Qy -o /tmp/ccW4B8bR.o /tmp/ccoIvqwu.s
GNU assembler version 2.19.1 (x86_64-pc-linux-gnu) using BFD version (GNU Binutils) 2.19.1
COMPILER_PATH=/usr/libexec/gcc/x86_64-pc-linux-gnu/4.3.3/:/usr/libexec/gcc/x86_64-pc-linux-gnu/4.3.3/:/usr/libexec/gcc/x86_64-pc-linux-gnu/:/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/:/usr/lib/gcc/x86_64-pc-linux-gnu/:/usr/libexec/gcc/x86_64-pc-linux-gnu/4.3.3/:/usr/libexec/gcc/x86_64-pc-linux-gnu/:/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/:/usr/lib/gcc/x86_64-pc-linux-gnu/:/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../x86_64-pc-linux-gnu/bin/
LIBRARY_PATH=/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/:/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/:/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../lib64/:/lib/../lib64/:/usr/lib/../lib64/:/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../x86_64-pc-linux-gnu/lib/:/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-v' '-Q' '-O2' '-o' 'test'
/usr/libexec/gcc/x86_64-pc-linux-gnu/4.3.3/collect2 --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o test /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../lib64/crt1.o /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../lib64/crti.o /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/crtbegin.o -L/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3 -L/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3 -L/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../x86_64-pc-linux-gnu/lib -L/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../.. /tmp/ccW4B8bR.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/crtend.o /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../lib64/crtn.o

As you can see its an x86_64-pc-linux-gnu running gcc-4.3.3 using march=ntive which defaults to k8-sse3. As you can see 3dnow and company plus a bunch more are actually enabled. I like my output :wink:

_________________
An A-Z Index of the Linux BASH command line

s4e8
Guru
Guru

Joined: 29 Jul 2006
Posts: 311

Posted: Thu Jul 16, 2009 1:39 am Post subject:

hielvc wrote:

s4e8 I ran your code on my AMD Athlon(tm) X2 Dual Core Processor BE-2300. No matter what I put in for "target I got the same output

OK, here 's the -Q -v output:

Code:

GNU C (GCC) version 4.5.0 20090702 (experimental) (i686-pc-linux-gnu)
compiled by GNU C version 4.5.0 20090702 (experimental), GMP version 4.2.4, MPFR version 2.4.1-p1
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
options passed: -v a.c -march=atom -mfpmath=sse -O3 -fomit-frame-pointer
options enabled: -falign-labels -falign-loops -fargument-alias
-fauto-inc-dec -fbranch-count-reg -fcaller-saves -fcommon
-fcprop-registers -fcrossjumping -fcse-follow-jumps -fdefer-pop
-fdelete-null-pointer-checks -fdwarf2-cfi-asm -fearly-inlining
-feliminate-unused-debug-types -fexpensive-optimizations
-fforward-propagate -ffunction-cse -fgcse -fgcse-after-reload -fgcse-lm
-fguess-branch-probability -fident -fif-conversion -fif-conversion2
-findirect-inlining -finline -finline-functions
-finline-functions-called-once -finline-small-functions -fipa-cp
-fipa-cp-clone -fipa-pure-const -fipa-reference -fira-share-save-slots
-fira-share-spill-slots -fivopts -fkeep-static-consts -fleading-underscore
-fmath-errno -fmerge-constants -fmerge-debug-strings
-fmove-loop-invariants -fomit-frame-pointer -foptimize-register-move
-foptimize-sibling-calls -fpcc-struct-return -fpeephole -fpeephole2
-fpredictive-commoning -fregmove -freorder-blocks -freorder-functions
-frerun-cse-after-loop -fsched-interblock -fsched-spec
-fsched-stalled-insns-dep -fschedule-insns2 -fshow-column -fsigned-zeros
-fsplit-ivs-in-unroller -fsplit-wide-types -fstrict-aliasing
-fstrict-overflow -fthread-jumps -ftoplevel-reorder -ftrapping-math
-ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-copy-prop
-ftree-copyrename -ftree-cselim -ftree-dce -ftree-dominator-opts
-ftree-dse -ftree-forwprop -ftree-fre -ftree-loop-im -ftree-loop-ivcanon
-ftree-loop-optimize -ftree-parallelize-loops= -ftree-phiprop -ftree-pre
-ftree-pta -ftree-reassoc -ftree-scev-cprop -ftree-sink
-ftree-slp-vectorize -ftree-sra -ftree-switch-conversion -ftree-ter
-ftree-vect-loop-version -ftree-vectorize -ftree-vrp -funit-at-a-time
-funswitch-loops -fvar-tracking -fvect-cost-model
-fzero-initialized-in-bss -m32 -m80387 -m96bit-long-double
-maccumulate-outgoing-args -malign-stringops -mcx16 -mfancy-math-387
-mfp-ret-in-387 -mfused-madd -mglibc -mieee-fp -mmmx -mmovbe -mno-red-zone
-mno-sse4 -mpush-args -msahf -msse -msse2 -msse3 -mssse3
-mtls-direct-seg-refs
Compiler executable checksum: f142bf44665c008856fda3c64386a6ca
main
Analyzing compilation unit
Performing interprocedural optimizations
<visibility> <early_local_cleanups> <summary generate> <cp> <inline> <static-var> <pure-const>Assembling functions:
main
Execution times (seconds)
callgraph construction: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.03 (11%) wall 0 kB ( 0%) ggc
parser : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.08 (30%) wall 192 kB (23%) ggc
tree gimplify : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 4%) wall 0 kB ( 0%) ggc
tree CFG construction : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 7%) wall 0 kB ( 0%) ggc
tree CFG cleanup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 4%) wall 0 kB ( 0%) ggc
tree SSA other : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 4%) wall 0 kB ( 0%) ggc
tree CCP : 0.00 ( 0%) usr 0.01 (100%) sys 0.01 ( 4%) wall 0 kB ( 0%) ggc
expand : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.03 (11%) wall 3 kB ( 0%) ggc
combiner : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 7%) wall 0 kB ( 0%) ggc
scheduling 2 : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.03 (11%) wall 0 kB ( 0%) ggc
machine dep reorg : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 4%) wall 0 kB ( 0%) ggc
TOTAL : 0.01 0.01 0.27 847 kB
Extra diagnostic checks enabled; compiler may run slowly

BillyBoy
Tux's lil' helper

Joined: 26 Nov 2003
Posts: 101
Location: USA

Posted: Thu Jul 30, 2009 9:14 pm Post subject: My recent results

Code:

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 537.24 : 13.78 : 4.52
STRING SORT : 58.753 : 26.25 : 4.06
BITFIELD : 1.7623e+08 : 30.23 : 6.31
FP EMULATION : 54.418 : 26.11 : 6.03
FOURIER : 7294.8 : 8.30 : 4.66
ASSIGNMENT : 11.767 : 44.78 : 11.61
IDEA : 2044.5 : 31.27 : 9.28
HUFFMAN : 978.9 : 27.14 : 8.67
NEURAL NET : 7.4568 : 11.98 : 5.04
LU DECOMPOSITION : 396.2 : 20.53 : 14.82
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 27.142
FLOATING-POINT INDEX: 12.682
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : 4 CPU GenuineIntel Intel(R) Atom(TM) CPU 330 @ 1.60GHz 1596MHz
L2 Cache : 512 KB
OS : Linux 2.6.29-gentoo-r5
C compiler : i686-pc-linux-gnu-gcc
libc :
MEMORY INDEX : 6.679
INTEGER INDEX : 6.844
FLOATING-POINT INDEX: 7.034
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.

My CFLAGS:

Code:

CFLAGS="-O2 -march=prescott -mtune=core2 -fomit-frame-pointer -pipe"

My uname:

Code:

Linux atom 2.6.29-gentoo-r5 #3 SMP Wed Jul 29 22:40:06 PDT 2009 i686 Intel(R) Atom(TM) CPU 330 @ 1.60GHz GenuineIntel GNU/Linux

My portage:

Code:

Portage 2.1.6.13 (default/linux/x86/2008.0, gcc-4.3.2, glibc-2.9_p20081201-r2, 2.6.29-gentoo-r5 i686)
=================================================================
System uname: Linux-2.6.29-gentoo-r5-i686-Intel-R-_Atom-TM-_CPU_330_@_1.60GHz-with-glibc2.0
Timestamp of tree: Mon, 27 Jul 2009 10:45:02 +0000

My kit (from dmidecode):

Code:

Base Board Information
Manufacturer: Intel Corporation
Product Name: D945GCLF2
Version: AAE46416-106

I have one stick of DDR2 800 but it only runs at 533 (despite the box saying it can do 667!). I'm actually pretty happy with this. For a hundred bucks, I have a completely usable system. Gotta love Gentoo....

djtreble
n00b
n00b

Joined: 09 Jan 2006
Posts: 39
Location: Brisbane, Australia

Posted: Sat Jan 16, 2010 11:41 am Post subject:

Comparing march=atom to march=core2

Code:

CFLAGS="-O2 -march=core2 -pipe

Code:

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 479.28 : 12.29 : 4.04
STRING SORT : 56.235 : 25.13 : 3.89
BITFIELD : 1.3752e+08 : 23.59 : 4.93
FP EMULATION : 46.123 : 22.13 : 5.11
FOURIER : 7237.1 : 8.23 : 4.62
ASSIGNMENT : 11.877 : 45.19 : 11.72
IDEA : 1840.9 : 28.16 : 8.36
HUFFMAN : 849.82 : 23.57 : 7.53
NEURAL NET : 6.9442 : 11.16 : 4.69
LU DECOMPOSITION : 399.44 : 20.69 : 14.94
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 24.182
FLOATING-POINT INDEX: 12.385
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : Dual GenuineIntel Intel(R) Atom(TM) CPU N270 @ 1.60GHz 1600MHz
L2 Cache : 512 KB
OS : Linux 2.6.31-gentoo-r6
C compiler : i686-pc-linux-gnu-gcc
libc :
MEMORY INDEX : 6.079
INTEGER INDEX : 6.001
FLOATING-POINT INDEX: 6.869
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.

Code:

CFLAGS="-O2 -march=atom -pipe"

Code:

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 512.16 : 13.13 : 4.31
STRING SORT : 56.093 : 25.06 : 3.88
BITFIELD : 1.3813e+08 : 23.69 : 4.95
FP EMULATION : 51.637 : 24.78 : 5.72
FOURIER : 7118.5 : 8.10 : 4.55
ASSIGNMENT : 12.773 : 48.60 : 12.61
IDEA : 1531.4 : 23.42 : 6.95
HUFFMAN : 868.2 : 24.08 : 7.69
NEURAL NET : 7.0021 : 11.25 : 4.73
LU DECOMPOSITION : 379.56 : 19.66 : 14.20
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 24.499
FLOATING-POINT INDEX: 12.143
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : Dual GenuineIntel Intel(R) Atom(TM) CPU N270 @ 1.60GHz 1600MHz
L2 Cache : 512 KB
OS : Linux 2.6.31-gentoo-r6
C compiler : i686-pc-linux-gnu-gcc
libc :
MEMORY INDEX : 6.232
INTEGER INDEX : 6.026
FLOATING-POINT INDEX: 6.735
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.

Code:

gcc version 4.5.0-alpha20091224 20091224 (experimental) (Gentoo 4.5.0_alpha20091224)

Shows nothing really :-(

I ran nbench again and it gave differing results, so I don't really trust it!

b0nafide
Apprentice

Joined: 17 Feb 2008
Posts: 171
Location: ~/

Posted: Sat Jan 16, 2010 4:45 pm Post subject:

Acer Aspire One D150...

Code:

gcc version 4.3.4 (Gentoo 4.3.4 p1.0, pie-10.1.5)
CFLAGS="-O2 -march=core2 -mtune=generic -fomit-frame-pointer -pipe"

# nbench

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 525.72 : 13.48 : 4.43
STRING SORT : 57.211 : 25.56 : 3.96
BITFIELD : 1.7151e+08 : 29.42 : 6.15
FP EMULATION : 56.795 : 27.25 : 6.29
FOURIER : 7329.5 : 8.34 : 4.68
ASSIGNMENT : 11.688 : 44.48 : 11.54
IDEA : 2050.2 : 31.36 : 9.31
HUFFMAN : 964.26 : 26.74 : 8.54
NEURAL NET : 7.1714 : 11.52 : 4.85
LU DECOMPOSITION : 405.76 : 21.02 : 15.18
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 26.942
FLOATING-POINT INDEX: 12.638
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : Dual GenuineIntel Intel(R) Atom(TM) CPU N270 @ 1.60GHz 1600MHz
L2 Cache : 512 KB
OS : Linux 2.6.31-gentoo-r6
C compiler : i686-pc-linux-gnu-gcc
libc :
MEMORY INDEX : 6.546
INTEGER INDEX : 6.859
FLOATING-POINT INDEX: 7.009
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.

djselbeck
n00b
n00b

Joined: 10 Oct 2005
Posts: 32
Location: Germany

Posted: Mon Jan 18, 2010 8:47 pm Post subject:

on HP Mini 5101:

Code:

CFLAGS="-O2 -march=core2 -mtune=generic -fomit-frame-pointer -pipe"
gcc 4.3.4

Code:

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 553.8 : 14.20 : 4.66
STRING SORT : 60.52 : 27.04 : 4.19
BITFIELD : 1.7867e+08 : 30.65 : 6.40
FP EMULATION : 59.08 : 28.35 : 6.54
FOURIER : 7646.5 : 8.70 : 4.88
ASSIGNMENT : 12.227 : 46.53 : 12.07
IDEA : 2147.4 : 32.84 : 9.75
HUFFMAN : 1035.4 : 28.71 : 9.17
NEURAL NET : 7.5818 : 12.18 : 5.12
LU DECOMPOSITION : 429.08 : 22.23 : 16.05
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 28.329
FLOATING-POINT INDEX: 13.303
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : Dual GenuineIntel Intel(R) Atom(TM) CPU N280 @ 1.66GHz 1667MHz
L2 Cache : 512 KB
OS : Linux 2.6.31-gentoo-r6
C compiler : i686-pc-linux-gnu-gcc
libc :
MEMORY INDEX : 6.864
INTEGER INDEX : 7.227
FLOATING-POINT INDEX: 7.378
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.

Nuteater
Apprentice

Joined: 25 Sep 2003
Posts: 193
Location: Jyväskylä, Finland

Posted: Tue Apr 13, 2010 7:07 pm Post subject:

I recently upgraded my EEE 901 to a 4.5 prerelease to try -march=atom (and because
my system hasn’t been properly broken for a long time :wink:

). Here are the results.

With gcc-4.4.1 and

Code:

CFLAGS="-march=prescott -O2 -fomit-frame-pointer -pipe"

Code:

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 527.2 : 13.52 : 4.44
STRING SORT : 57.857 : 25.85 : 4.00
BITFIELD : 2.0284e+08 : 34.79 : 7.27
FP EMULATION : 56.235 : 26.98 : 6.23
FOURIER : 7325.3 : 8.33 : 4.68
ASSIGNMENT : 11.777 : 44.81 : 11.62
IDEA : 1991.2 : 30.46 : 9.04
HUFFMAN : 869.22 : 24.10 : 7.70
NEURAL NET : 6.5974 : 10.60 : 4.46
LU DECOMPOSITION : 310.24 : 16.07 : 11.61
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 27.122
FLOATING-POINT INDEX: 11.237
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : Dual GenuineIntel Intel(R) Atom(TM) CPU N270 @ 1.60GHz 1600MHz
L2 Cache : 512 KB
OS : Linux 2.6.32.8
C compiler : i686-pc-linux-gnu-gcc
libc :
MEMORY INDEX : 6.966
INTEGER INDEX : 6.623
FLOATING-POINT INDEX: 6.232
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.

With gcc-4.5.0-alpha20100408 and

Code:

CFLAGS="-march=atom -O2 -mssse3 -mfpmath=sse -fexcess-precision=fast -fomit-frame-pointer -pipe"

Code:

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 523.92 : 13.44 : 4.41
STRING SORT : 59.896 : 26.76 : 4.14
BITFIELD : 1.4147e+08 : 24.27 : 5.07
FP EMULATION : 54.872 : 26.33 : 6.08
FOURIER : 7708.9 : 8.77 : 4.92
ASSIGNMENT : 13.934 : 53.02 : 13.75
IDEA : 1939.2 : 29.66 : 8.81
HUFFMAN : 1017.2 : 28.21 : 9.01
NEURAL NET : 9.6915 : 15.57 : 6.55
LU DECOMPOSITION : 451.44 : 23.39 : 16.89
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 26.900
FLOATING-POINT INDEX: 14.724
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : Dual GenuineIntel Intel(R) Atom(TM) CPU N270 @ 1.60GHz 800MHz
L2 Cache : 512 KB
OS : Linux 2.6.32.8
C compiler : i686-pc-linux-gnu-gcc
libc :
MEMORY INDEX : 6.610
INTEGER INDEX : 6.791
FLOATING-POINT INDEX: 8.166
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.

Of course an artificial benchmark such as this doesn’t tell much, but floating point performance seems to be improved by a significant amount. Of course this may be just because of the other optimizations such as -mfpmath=sse.
_________________
I am Nuteater, hear me roar.

Display posts from previous:

	Gentoo Forums Forum Index Kernel & Hardware	All times are GMT Goto page Previous 1, 2, 3
Page 3 of 3

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Copyright 2001-2024 Gentoo Foundation, Inc. Designed by Kyle Manna © 2003; Style derived from original subSilver theme. | Hosting by Gossamer Threads Inc. © | Powered by phpBB 2.0.23-gentoo-p11 © 2001, 2002 phpBB Group
Privacy Policy