View previous topic :: View next topic |
Author |
Message |
rufnut Apprentice
Joined: 16 May 2005 Posts: 247
|
Posted: Thu May 28, 2009 8:26 am Post subject: |
|
|
gringo wrote: |
i´m not sure i get what you mean, that bug explicitly states that distcc will be disabled if -march=native is used, and that´s how it should work IMO.
|
My sentiments are pretty much the same as this guy here in his last paragraph:
https://bugs.launchpad.net/distcc/+bug/188813
When you look at the bug report:
https://bugs.gentoo.org/223159
Looks like they tried to implement the patch and due to problems it may have been pulled out in later versions of distcc, or maybe it was just modified to fail if "march=native" is detected. |
|
Back to top |
|
|
gringo Advocate
Joined: 27 Apr 2003 Posts: 3793
|
Posted: Thu May 28, 2009 8:49 am Post subject: |
|
|
Quote: | My sentiments are pretty much the same as this guy here in his last paragraph: |
don´t know exactly what you mean, the guy in that bugs explains pretty well the problem and a workaround is available.
I you want to disable distcc for a few packages "manually", there are a few bash hacks available.
Quote: | Looks like they tried to implement the patch and due to problems it may have been pulled out in later versions of distcc, or maybe it was just modified to fail if "march=native" is detected. |
don´t know if sth. has changed in the latest version of distcc, i use distcc quite a lot and last time i tried -march=native with distcc ( which was with the first distcc-3.x release) all jobs were processed locally, which is how it should work IMO.
it´s quite easy to test if this is still the case, right ?
cheers _________________ Error: Failing not supported by current locale |
|
Back to top |
|
|
rufnut Apprentice
Joined: 16 May 2005 Posts: 247
|
Posted: Fri May 29, 2009 5:05 am Post subject: |
|
|
From :
https://bugs.launchpad.net/distcc/+bug/188813
Quote: | or
(preferably?) rewrite them to read -march=arch-of-the-build-machine so the
target architecture is the same on all build nodes |
I reckon this should be the way it is done.
It creates a bit of work for distcc, as if the "arch" is unknown to say stable gcc and/or distcc (-march=atom)
then maybe it could drop to prescott or whatever the concensus is until gcc 4.5.x is stable.
I am not real keen upgrading all nodes to gcc 4.5.x is the reason for the above statement.
There is nothing stopping me manually setting eg (-march=prescott) for a machine but I would have preferred some automation, which I guess is the reason (-march=native) was introduced.
|
|
Back to top |
|
|
gringo Advocate
Joined: 27 Apr 2003 Posts: 3793
|
Posted: Fri May 29, 2009 8:25 am Post subject: |
|
|
Quote: | or (preferably?) rewrite them to read -march=arch-of-the-build-machine so the
target architecture is the same on all build nodes |
do you really want an app like distcc to rewrite your -march setting ? Why don´t you set the correct one in first place ?
And how is that supposed to work if you are crosscompiling f.ex. ?
That doesn´t make any sense to me and in any case i don´t think rewriting compiler parameters is distcc´s job.
That said, i would like to have a better workaround too and just set -march=native everywhere, but it really isn´t that easy.
cheers _________________ Error: Failing not supported by current locale |
|
Back to top |
|
|
Rony n00b
Joined: 12 Oct 2003 Posts: 20 Location: Hong Kong, China
|
Posted: Fri May 29, 2009 10:16 am Post subject: |
|
|
GCC-optimized: adding the suggested GCC compiler flags for Intel® Atom™
Code: | -Wall -O1 -msse3 -march=core2 -mfpmath=sse -pedantic -pipe -fstrength-reduce -fexpensive-optimizations -finline-functions -funroll-loops -foptimize-register-move |
I am testing with the above on with Intel's D945GCLF2D (Atom 330).
Regards. |
|
Back to top |
|
|
gringo Advocate
Joined: 27 Apr 2003 Posts: 3793
|
Posted: Fri May 29, 2009 11:21 am Post subject: |
|
|
if that numbers are correct, that isn´t that bad i would say, i was expecting way more difference between icc and gcc.
Would be great to see the same benchmark with the new atom target included.
I found some time ago a discussion about what would be the best options for gcc when building for an atom and Arjan van de Ven ( intel kernel hacker) suggested -march=core2 -mtune=generic. Note that this was before the atom target was even in development IIRC.
http://lkml.indiana.edu/hypermail/linux/kernel/0810.1/2015.html
cheers guys _________________ Error: Failing not supported by current locale |
|
Back to top |
|
|
rufnut Apprentice
Joined: 16 May 2005 Posts: 247
|
Posted: Fri May 29, 2009 4:01 pm Post subject: |
|
|
gringo wrote: | Quote: | or (preferably?) rewrite them to read -march=arch-of-the-build-machine so the
target architecture is the same on all build nodes |
do you really want an app like distcc to rewrite your -march setting ? Why don´t you set the correct one in first place ?
|
He does say "read -march=arch-of-the-build-machine" not rewrite ?
|
|
Back to top |
|
|
gringo Advocate
Joined: 27 Apr 2003 Posts: 3793
|
Posted: Fri May 29, 2009 4:08 pm Post subject: |
|
|
Quote: | He does say "read -march=arch-of-the-build-machine" not rewrite ? |
no, he says "rewrite them to read".
English is not my main language but i get that as "rewriting".
and this starts to be a bit pointless and complete OT.
cheers _________________ Error: Failing not supported by current locale |
|
Back to top |
|
|
rufnut Apprentice
Joined: 16 May 2005 Posts: 247
|
Posted: Sat May 30, 2009 12:26 am Post subject: |
|
|
gringo wrote: | Quote: | He does say "read -march=arch-of-the-build-machine" not rewrite ? |
no, he says "rewrite them to read".
English is not my main language but i get that as "rewriting".
|
He is talking about rewriting distcc.
|
|
Back to top |
|
|
Mr_Maniac Guru
Joined: 10 Jun 2004 Posts: 543
|
Posted: Mon Jun 01, 2009 1:19 pm Post subject: |
|
|
I have a Intel D945GCLF2, too. System compiled with
Code: | CFLAGS="-march=nocona -O2 -pipe" |
GCC: gcc (Gentoo 4.3.3-r2 p1.2, pie-10.1.5) 4.3.3
GLIBC: glibc-2.10.1-r0
Kernel: 2.6.29-r5 - CONFIG_MCORE2=y
64bit-System
With the CFLAGS mentioned by you I get following results:
Code: |
BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 193.76 : 4.97 : 1.63
STRING SORT : 44.542 : 19.90 : 3.08
BITFIELD : 5.6065e+07 : 9.62 : 2.01
FP EMULATION : 17.793 : 8.54 : 1.97
FOURIER : 6628 : 7.54 : 4.23
ASSIGNMENT : 2.757 : 10.49 : 2.72
IDEA : 739.7 : 11.31 : 3.36
HUFFMAN : 354.47 : 9.83 : 3.14
NEURAL NET : 1.9554 : 3.14 : 1.32
LU DECOMPOSITION : 62.884 : 3.26 : 2.35
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 9.923
FLOATING-POINT INDEX: 4.257
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : 4 CPU GenuineIntel Intel(R) Atom(TM) CPU 330 @ 1.60GHz 1596MHz
L2 Cache : 512 KB
OS : Linux 2.6.29-gentoo-r5
C compiler : x86_64-pc-linux-gnu-gcc
libc :
MEMORY INDEX : 2.563
INTEGER INDEX : 2.413
FLOATING-POINT INDEX: 2.361
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.
|
With my standard-CFLAGS
CFLAGS="-march=nocona -O2 -pipe"
I have
Code: |
BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 497.52 : 12.76 : 4.19
STRING SORT : 62.335 : 27.85 : 4.31
BITFIELD : 2.0232e+08 : 34.71 : 7.25
FP EMULATION : 52.817 : 25.34 : 5.85
FOURIER : 6763.3 : 7.69 : 4.32
ASSIGNMENT : 9.4219 : 35.85 : 9.30
IDEA : 2106.5 : 32.22 : 9.57
HUFFMAN : 913.79 : 25.34 : 8.09
NEURAL NET : 8.498 : 13.65 : 5.74
LU DECOMPOSITION : 311.92 : 16.16 : 11.67
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 26.488
FLOATING-POINT INDEX: 11.927
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : 4 CPU GenuineIntel Intel(R) Atom(TM) CPU 330 @ 1.60GHz 1596MHz
L2 Cache : 512 KB
OS : Linux 2.6.29-gentoo-r5
C compiler : x86_64-pc-linux-gnu-gcc
libc :
MEMORY INDEX : 6.624
INTEGER INDEX : 6.599
FLOATING-POINT INDEX: 6.615
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.
|
CFLAGS="-march=core2 -O2 -pipe"]:
Code: |
BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 510.72 : 13.10 : 4.30
STRING SORT : 61.406 : 27.44 : 4.25
BITFIELD : 2.3122e+08 : 39.66 : 8.28
FP EMULATION : 54.4 : 26.10 : 6.02
FOURIER : 6757.9 : 7.69 : 4.32
ASSIGNMENT : 8.7198 : 33.18 : 8.61
IDEA : 2157.4 : 33.00 : 9.80
HUFFMAN : 908.02 : 25.18 : 8.04
NEURAL NET : 10.043 : 16.13 : 6.79
LU DECOMPOSITION : 408.24 : 21.15 : 15.27
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 26.924
FLOATING-POINT INDEX: 13.790
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : 4 CPU GenuineIntel Intel(R) Atom(TM) CPU 330 @ 1.60GHz 1596MHz
L2 Cache : 512 KB
OS : Linux 2.6.29-gentoo-r5
C compiler : x86_64-pc-linux-gnu-gcc
libc :
MEMORY INDEX : 6.715
INTEGER INDEX : 6.721
FLOATING-POINT INDEX: 7.648
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.
|
And the best results so far
CFLAGS="-march=native -O2 -pipe"
Code: |
BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 508.56 : 13.04 : 4.28
STRING SORT : 60.862 : 27.20 : 4.21
BITFIELD : 2.314e+08 : 39.69 : 8.29
FP EMULATION : 54.498 : 26.15 : 6.03
FOURIER : 6778.9 : 7.71 : 4.33
ASSIGNMENT : 9.5709 : 36.42 : 9.45
IDEA : 2164.4 : 33.10 : 9.83
HUFFMAN : 911.25 : 25.27 : 8.07
NEURAL NET : 10.083 : 16.20 : 6.81
LU DECOMPOSITION : 412.64 : 21.38 : 15.44
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 27.270
FLOATING-POINT INDEX: 13.872
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : 4 CPU GenuineIntel Intel(R) Atom(TM) CPU 330 @ 1.60GHz 1596MHz
L2 Cache : 512 KB
OS : Linux 2.6.29-gentoo-r5
C compiler : x86_64-pc-linux-gnu-gcc
libc :
MEMORY INDEX : 6.908
INTEGER INDEX : 6.729
FLOATING-POINT INDEX: 7.694
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.
|
Can someone post results with gcc-4.5 and "-march=atom"? My System is in use (Router/Server), so i don't want to make too big changes... _________________ AMD Ryzen 5900X
64 GB DDR4 RAM
GeForce RTX 3080
Gentoo Linux (most recent stable kernel - amd64)
Windows 11 x64 |
|
Back to top |
|
|
Bircoph Developer
Joined: 27 Jun 2008 Posts: 261 Location: Moscow
|
Posted: Wed Jul 08, 2009 2:40 am Post subject: |
|
|
I use the following for my Atom N270 (on Asus Eee PC 1000H):
Code: |
CFLAGS="-march=core2 -m32 --param l1-cache-line-size=64
--param l1-cache-size=32 --param l2-cache-size=512
-O2 -funswitch-loops -fpredictive-commoning
-fgcse-after-reload -ftree-vectorize -fomit-frame-pointer
-mfpmath=sse -pipe"
|
Some explanation why exactly these flags are used. (I use gcc-4.3.3-r2 ATM: the latest unmasked gcc for Gentoo.)
1) Why not "-march=native"?
That's obvious: a) because current gcc doesn't understand atom properly and will fail to detect it in the best way; b) this will make distcc unusable.
2) Why "-march=core2 -m32"?
Just learn this CPU instruction set, actually it equals to core2 with the exception of x86_64 instructions (also -m32 is required for distcc crosscompilation on amd64):
Code: |
% x86info -f
x86info v1.24. Dave Jones 2001-2009
Feedback to <davej@redhat.com>.
Found 2 CPUs
--------------------------------------------------------------------------
CPU #1
EFamily: 0 EModel: 1 Family: 6 Model: 28 Stepping: 2
CPU Model: Unknown model.
Processor name string: Intel(R) Atom(TM) CPU N270 @ 1.60GHz
Type: 0 (Original OEM) Brand: 0 (Unsupported)
Number of cores per physical package=1
Number of logical processors per socket=2
Number of logical processors per core=2
APIC ID: 0x0 Package: 0 Core: 0 SMT ID 0
Feature flags:
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflsh ds acpi mmx fxsr sse sse2 ss ht tm pbe
Extended feature flags:
sse3 [2] monitor ds-cpl est tm2 ssse3 xTPR [15] [22]
--------------------------------------------------------------------------
CPU #2
EFamily: 0 EModel: 1 Family: 6 Model: 28 Stepping: 2
CPU Model: Unknown model.
Processor name string: Intel(R) Atom(TM) CPU N270 @ 1.60GHz
Type: 0 (Original OEM) Brand: 0 (Unsupported)
Number of cores per physical package=1
Number of logical processors per socket=2
Number of logical processors per core=2
APIC ID: 0x1 Package: 0 Core: 0 SMT ID 1
Feature flags:
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflsh ds acpi mmx fxsr sse sse2 ss ht tm pbe
Extended feature flags:
sse3 [2] monitor ds-cpl est tm2 ssse3 xTPR [15] [22]
--------------------------------------------------------------------------
|
3) Why "--param l1-cache-line-size=64 --param l1-cache-size=32 --param l2-cache-size=512"?
Because N270 isn't core2: in have smaller l1/l2 cache, thus code generated for core2 will not be so efficient for Atom because of improper cache use: data/code blocks may be too long, etc.
Specifying CPU cache is also always important for distcc: compiler on the other host don't know what CPU you actually use.
4) Why "-O2 -funswitch-loops -fpredictive-commoning -fgcse-after-reload -ftree-vectorize"?
This is actually -O3 -fno-inline-functions. Atom CPU provides relatively small L1/L2 cache, thus its efficiency will be decreased due to extra inlining dramastically, CPU cache should be used for better purposes.
5) Why "-fomit-frame-pointer"?
Because it gains extra free register, this is extremely important because on x86 you have only 4 free-to-use general registers. (JFYI: access to register is 3 times faster that even to L1 cache). If you'll really want to debug something, you'll need recompile it with -g/-g3 anyway.
Isn't it enabled by default? No, it isn't, because it interferes with debugging, read gcc manual.
6) Why -mfpmath=sse?
SSE unit is significantly more efficient than i387 used by default for x86, mostly more due to enhanced instructions. The only problem that i387 unit allows 80-bit width floats, but SSE allows maximum width of 64 bits. In theory this may be a problem for applications relying on 80-bit width floats, but not specifying this explicitly for gcc. Practically I use tons of scientific software (such as root, maxima, R, octave,...) compiled with -mfpmath=sse (in make.conf CFLAGS) for years without any problems.
Ideally -mfpmath=see,i387 as it actually doubles amount of available registers (i387 and sse units are implemented separately by Intel), but gcc register allocator can't model separate units utilization at once, so it is quite risky from the performance POW to use -mfpmath=see,i387 everywhere, you should implement an appropriate assembly by hand.
7) Why "-pipe"?
This speeds compilation up via pipes utilization to avoid temporary files usage. This doesn't affect generated code itself. _________________ Per aspera ad astra! |
|
Back to top |
|
|
Mr_Maniac Guru
Joined: 10 Jun 2004 Posts: 543
|
Posted: Wed Jul 08, 2009 5:43 am Post subject: |
|
|
Code: |
~ # CFLAGS="-march=core2 --param l1-cache-line-size=64 --param l1-cache-size=32 --param l2-cache-size=512 -O2 -funswitch-loops -fpredictive-commoning -fgcse-after-reload -ftree-vectorize -fomit-frame-pointer -mfpmath=sse -pipe" emerge nbench
~ # nbench
BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 516.24 : 13.24 : 4.35
STRING SORT : 62.19 : 27.79 : 4.30
BITFIELD : 2.3393e+08 : 40.13 : 8.38
FP EMULATION : 54.894 : 26.34 : 6.08
FOURIER : 6778.9 : 7.71 : 4.33
ASSIGNMENT : 9.7533 : 37.11 : 9.63
IDEA : 2172.2 : 33.22 : 9.86
HUFFMAN : 921.62 : 25.56 : 8.16
NEURAL NET : 9.8775 : 15.87 : 6.67
LU DECOMPOSITION : 418.4 : 21.68 : 15.65
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 27.617
FLOATING-POINT INDEX: 13.841
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : 4 CPU GenuineIntel Intel(R) Atom(TM) CPU 330 @ 1.60GHz 1596MHz
L2 Cache : 512 KB
OS : Linux 2.6.29-gentoo-r5
C compiler : x86_64-pc-linux-gnu-gcc
libc :
MEMORY INDEX : 7.027
INTEGER INDEX : 6.791
FLOATING-POINT INDEX: 7.676
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.
|
Okay... It really is a bit faster, but really only a bit _________________ AMD Ryzen 5900X
64 GB DDR4 RAM
GeForce RTX 3080
Gentoo Linux (most recent stable kernel - amd64)
Windows 11 x64 |
|
Back to top |
|
|
s4e8 Guru
Joined: 29 Jul 2006 Posts: 311
|
Posted: Thu Jul 09, 2009 1:38 am Post subject: |
|
|
gcc 4.5.0 snapshot 20090702. -march=atom -O3 -mfpmath=sse -fomit-frame-pointer, ATOM N270
nbench score: 539.71 59.888 2.2706e8 87.08 7800.8 13.017 2276.4 979.22, NEURAL NET crashed
compare to Bircoph's CFLAGS, it win: 0.4% 1.4% 28.8% 65.4% 1.7% 8.4% 20% 4.2% |
|
Back to top |
|
|
Bircoph Developer
Joined: 27 Jun 2008 Posts: 261 Location: Moscow
|
Posted: Wed Jul 15, 2009 7:56 am Post subject: |
|
|
s4e8 wrote: | gcc 4.5.0 snapshot 20090702. -march=atom -O3 -mfpmath=sse -fomit-frame-pointer, ATOM N270
nbench score: 539.71 59.888 2.2706e8 87.08 7800.8 13.017 2276.4 979.22, NEURAL NET crashed
compare to Bircoph's CFLAGS, it win: 0.4% 1.4% 28.8% 65.4% 1.7% 8.4% 20% 4.2% |
This result is very interesting. Could you please post
Code: |
gcc -Q --help=target -march=atom
|
?
And be aware of two important aspects:
1) All measurement data should be provided with errors (either absolute with confidence probability or errors in term of standard deviation), otherwise your benefits may be just a game of statistics, nothing more. Of course, you should run tests several times to be able to calculate errors. This way I can't tell that my options are better than Mr_Maniac's: statistical error is higher test delta in my case.
2) nbench is very, eh, specific benchmark: it covers only some aspects of real-world tasks, thus you should be critical to its results. Some small example.
I have two boxes:
a) Athlon-XP 3200+ (2205 MHZ), 64KB L1 512KB L2, 32bit.
b) Celeron D (2533 MHz), 16KB L1, 256KB L2, 64bit.
Here are nbench results (memory/integer/floating indices) with errors in standard deviations:
a) 12.187 \pm 0.021; 14.068 \pm 0.014; 23.135 \pm 0.025
b) 10.36 \pm 0.18; 8.84 \pm 0.05; 13.75 \pm 0.04
As you can see, host (b) is significantly worse host (a) beyond any errors with nbench.
But wait! Try to generate 16KBit RSA key on both hosts. Host (b) appears to be ~8x times faster: due to 64bit mode and 3x more general use registers it strikes in long arithmetic tasks, particularly in anything related to asymmetric cryptography.
Thus be very careful estimating performance only via tests: you should perform really hard work to say (a) better (b): performance varies greatly depending on task in question. _________________ Per aspera ad astra! |
|
Back to top |
|
|
s4e8 Guru
Joined: 29 Jul 2006 Posts: 311
|
Posted: Wed Jul 15, 2009 8:22 am Post subject: |
|
|
here is results.
Code: |
bin # ./gcc -Q --help=target -march=atom
The following options are target specific:
-m128bit-long-double [disabled]
-m32 [enabled]
-m3dnow [disabled]
-m3dnowa [disabled]
-m64 [disabled]
-m80387 [enabled]
-m96bit-long-double [enabled]
-mabi=
-mabm [disabled]
-maccumulate-outgoing-args [disabled]
-maes [disabled]
-malign-double [disabled]
-malign-functions=
-malign-jumps=
-malign-loops=
-malign-stringops [enabled]
-march= atom
-masm=
-mavx [disabled]
-mbranch-cost=
-mcld [disabled]
-mcmodel=
-mcrc32 [disabled]
-mcx16 [disabled]
-mfancy-math-387 [enabled]
-mfma [disabled]
-mforce-drap [disabled]
-mfp-ret-in-387 [enabled]
-mfpmath=
-mfused-madd [enabled]
-mglibc [enabled]
-mhard-float [enabled]
-mieee-fp [enabled]
-mincoming-stack-boundary=
-minline-all-stringops [disabled]
-minline-stringops-dynamically [disabled]
-mintel-syntax [disabled]
-mlarge-data-threshold=
-mmmx [disabled]
-mmovbe [disabled]
-mms-bitfields [disabled]
-mno-align-stringops [disabled]
-mno-fancy-math-387 [disabled]
-mno-fused-madd [disabled]
-mno-push-args [disabled]
-mno-red-zone [disabled]
-mno-sse4 [enabled]
-momit-leaf-frame-pointer [disabled]
-mpc
-mpclmul [disabled]
-mpopcnt [disabled]
-mpreferred-stack-boundary=
-mpush-args [enabled]
-mrecip [disabled]
-mred-zone [enabled]
-mregparm=
-mrtd [disabled]
-msahf [disabled]
-msoft-float [disabled]
-msse [disabled]
-msse2 [disabled]
-msse2avx [disabled]
-msse3 [disabled]
-msse4 [disabled]
-msse4.1 [disabled]
-msse4.2 [disabled]
-msse4a [disabled]
-msse5 [disabled]
-msseregparm [disabled]
-mssse3 [disabled]
-mstack-arg-probe [disabled]
-mstackrealign [enabled]
-mstringop-strategy=
-mtls-dialect=
-mtls-direct-seg-refs [enabled]
-mtune=
-muclibc [disabled]
-mveclibabi=
|
|
|
Back to top |
|
|
Bircoph Developer
Joined: 27 Jun 2008 Posts: 261 Location: Moscow
|
Posted: Wed Jul 15, 2009 8:46 am Post subject: |
|
|
This is odd, I can't see any significant difference.
I wonder what the've done... _________________ Per aspera ad astra! |
|
Back to top |
|
|
s4e8 Guru
Joined: 29 Jul 2006 Posts: 311
|
Posted: Wed Jul 15, 2009 9:33 am Post subject: |
|
|
Bircoph wrote: | This is odd, I can't see any significant difference.
I wonder what the've done... |
There's new file atom.md, define some atom specific behavior.
Code: |
......
;; Atom is an in-order core with two integer pipelines.
(define_attr "atom_unit" "sishuf,simul,jeu,complex,other"
(const_string "other"))
(define_attr "atom_sse_attr" "rcp,movdup,lfence,fence,prefetch,sqrt,mxcsr,other"
(const_string "other"))
(define_automaton "atom")
;; Atom has two ports: port 0 and port 1 connecting to all execution units
(define_cpu_unit "atom-port-0,atom-port-1" "atom")
;; EU: Execution Unit
;; Atom EUs are connected by port 0 or port 1.
......
|
|
|
Back to top |
|
|
hielvc Advocate
Joined: 19 Apr 2002 Posts: 2805 Location: Oceanside, Ca
|
Posted: Wed Jul 15, 2009 7:39 pm Post subject: |
|
|
s4e8 I ran your code on my AMD Athlon(tm) X2 Dual Core Processor BE-2300. No matter what I put in for "target I got the same output Quote: | gcc -Q --help=target -march=k8 |awk '/enabled/ {print $1}'
-m64
-m80387
-m96bit-long-double
-malign-stringops
-mfancy-math-387
-mfp-ret-in-387
-mfused-madd
-mglibc
-mhard-float
-mieee-fp
-mno-sse4
-mpush-args
-mred-zone
-mtls-direct-seg-refs |
Using this code Code: | echo 'int main(){return 0;}' > test.c && gcc -v -Q -march=native -O2 test.c -o test && rm test.c test
Using built-in specs.
Target: x86_64-pc-linux-gnu
Configured with: /var/tmp/portage/sys-devel/gcc-4.3.3-r2/work/gcc-4.3.3/configure --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/4.3.3 --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/include --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.3 --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.3/man --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.3/info --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/include/g++-v4 --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --disable-altivec --disable-fixed-point --disable-nls --with-system-zlib --disable-checking --disable-werror --enable-secureplt --enable-multilib --enable-libmudflap --disable-libssp --enable-libgomp --enable-cld --disable-libgcj --enable-languages=c,c++,treelang --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --with-bugurl=http://bugs.gentoo.org/ --with-pkgversion='Gentoo 4.3.3-r2 p1.2, pie-10.1.5'
Thread model: posix
gcc version 4.3.3 (Gentoo 4.3.3-r2 p1.2, pie-10.1.5)
COLLECT_GCC_OPTIONS='-v' '-Q' '-O2' '-o' 'test'
/usr/libexec/gcc/x86_64-pc-linux-gnu/4.3.3/cc1 -v test.c -D_FORTIFY_SOURCE=2 -march=k8-sse3 -mcx16 -msahf --param l1-cache-size=64 --param l1-cache-line-size=64 -mtune=k8 -dumpbase test.c -auxbase test -O2 -version -o /tmp/ccoIvqwu.s
ignoring nonexistent directory "/usr/local/include"
ignoring nonexistent directory "/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../x86_64-pc-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/include
/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/include-fixed
/usr/include
End of search list.
GNU C (Gentoo 4.3.3-r2 p1.2, pie-10.1.5) version 4.3.3 (x86_64-pc-linux-gnu)
compiled by GNU C version 4.3.3, GMP version 4.3.1, MPFR version 2.4.1-p5.
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
options passed: -v test.c -D_FORTIFY_SOURCE=2 -march=k8-sse3 -mcx16
-msahf --param l1-cache-size=64 --param l1-cache-line-size=64 -mtune=k8
-O2
options enabled: -falign-labels -falign-loops -fargument-alias
-fasynchronous-unwind-tables -fauto-inc-dec -fbranch-count-reg
-fcaller-saves -fcommon -fcprop-registers -fcrossjumping
-fcse-follow-jumps -fdefer-pop -fdelete-null-pointer-checks
-fearly-inlining -feliminate-unused-debug-types -fexpensive-optimizations
-fforward-propagate -ffunction-cse -fgcse -fgcse-lm
-fguess-branch-probability -fident -fif-conversion -fif-conversion2
-finline-functions-called-once -finline-small-functions -fipa-pure-const
-fipa-reference -fivopts -fkeep-static-consts -fleading-underscore
-fmath-errno -fmerge-constants -fmerge-debug-strings
-fmove-loop-invariants -fomit-frame-pointer -foptimize-register-move
-foptimize-sibling-calls -fpeephole -fpeephole2 -freg-struct-return
-fregmove -freorder-blocks -freorder-functions -frerun-cse-after-loop
-fsched-interblock -fsched-spec -fsched-stalled-insns-dep
-fschedule-insns2 -fsigned-zeros -fsplit-ivs-in-unroller
-fsplit-wide-types -fstrict-aliasing -fstrict-overflow -fthread-jumps
-ftoplevel-reorder -ftrapping-math -ftree-ccp -ftree-ch -ftree-copy-prop
-ftree-copyrename -ftree-cselim -ftree-dce -ftree-dominator-opts
-ftree-dse -ftree-fre -ftree-loop-im -ftree-loop-ivcanon
-ftree-loop-optimize -ftree-parallelize-loops= -ftree-pre -ftree-reassoc
-ftree-salias -ftree-scev-cprop -ftree-sink -ftree-sra -ftree-store-ccp
-ftree-ter -ftree-vect-loop-version -ftree-vrp -funit-at-a-time
-funwind-tables -fvar-tracking -fvect-cost-model -fzero-initialized-in-bss
-m128bit-long-double -m3dnow -m64 -m80387 -maccumulate-outgoing-args
-malign-stringops -mcx16 -mfancy-math-387 -mfp-ret-in-387 -mfused-madd
-mglibc -mieee-fp -mmmx -mno-sse4 -mpush-args -mred-zone -msahf -msse
-msse2 -msse3 -mtls-direct-seg-refs
Compiler executable checksum: f6e169a902c79329927a6921bcb422f4
main
Analyzing compilation unit
Performing interprocedural optimizations
<visibility> <early_local_cleanups> <inline> <static-var> <pure-const>Assembling functions:
main
Execution times (seconds)
parser : 0.01 (100%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 76 kB ( 7%) ggc
global alloc : 0.00 ( 0%) usr 0.01 (100%) sys 0.01 (33%) wall 0 kB ( 0%) ggc
TOTAL : 0.01 0.01 0.03 1118 kB
Internal checks disabled; compiler is not suited for release.
Configure with --enable-checking=release to enable checks.
COLLECT_GCC_OPTIONS='-v' '-Q' '-O2' '-o' 'test'
/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../x86_64-pc-linux-gnu/bin/as -V -Qy -o /tmp/ccW4B8bR.o /tmp/ccoIvqwu.s
GNU assembler version 2.19.1 (x86_64-pc-linux-gnu) using BFD version (GNU Binutils) 2.19.1
COMPILER_PATH=/usr/libexec/gcc/x86_64-pc-linux-gnu/4.3.3/:/usr/libexec/gcc/x86_64-pc-linux-gnu/4.3.3/:/usr/libexec/gcc/x86_64-pc-linux-gnu/:/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/:/usr/lib/gcc/x86_64-pc-linux-gnu/:/usr/libexec/gcc/x86_64-pc-linux-gnu/4.3.3/:/usr/libexec/gcc/x86_64-pc-linux-gnu/:/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/:/usr/lib/gcc/x86_64-pc-linux-gnu/:/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../x86_64-pc-linux-gnu/bin/
LIBRARY_PATH=/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/:/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/:/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../lib64/:/lib/../lib64/:/usr/lib/../lib64/:/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../x86_64-pc-linux-gnu/lib/:/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-v' '-Q' '-O2' '-o' 'test'
/usr/libexec/gcc/x86_64-pc-linux-gnu/4.3.3/collect2 --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o test /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../lib64/crt1.o /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../lib64/crti.o /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/crtbegin.o -L/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3 -L/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3 -L/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../x86_64-pc-linux-gnu/lib -L/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../.. /tmp/ccW4B8bR.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/crtend.o /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../lib64/crtn.o |
As you can see its an x86_64-pc-linux-gnu running gcc-4.3.3 using march=ntive which defaults to k8-sse3. As you can see 3dnow and company plus a bunch more are actually enabled. I like my output _________________ An A-Z Index of the Linux BASH command line |
|
Back to top |
|
|
s4e8 Guru
Joined: 29 Jul 2006 Posts: 311
|
Posted: Thu Jul 16, 2009 1:39 am Post subject: |
|
|
hielvc wrote: | s4e8 I ran your code on my AMD Athlon(tm) X2 Dual Core Processor BE-2300. No matter what I put in for "target I got the same output |
OK, here 's the -Q -v output:
Code: |
GNU C (GCC) version 4.5.0 20090702 (experimental) (i686-pc-linux-gnu)
compiled by GNU C version 4.5.0 20090702 (experimental), GMP version 4.2.4, MPFR version 2.4.1-p1
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
options passed: -v a.c -march=atom -mfpmath=sse -O3 -fomit-frame-pointer
options enabled: -falign-labels -falign-loops -fargument-alias
-fauto-inc-dec -fbranch-count-reg -fcaller-saves -fcommon
-fcprop-registers -fcrossjumping -fcse-follow-jumps -fdefer-pop
-fdelete-null-pointer-checks -fdwarf2-cfi-asm -fearly-inlining
-feliminate-unused-debug-types -fexpensive-optimizations
-fforward-propagate -ffunction-cse -fgcse -fgcse-after-reload -fgcse-lm
-fguess-branch-probability -fident -fif-conversion -fif-conversion2
-findirect-inlining -finline -finline-functions
-finline-functions-called-once -finline-small-functions -fipa-cp
-fipa-cp-clone -fipa-pure-const -fipa-reference -fira-share-save-slots
-fira-share-spill-slots -fivopts -fkeep-static-consts -fleading-underscore
-fmath-errno -fmerge-constants -fmerge-debug-strings
-fmove-loop-invariants -fomit-frame-pointer -foptimize-register-move
-foptimize-sibling-calls -fpcc-struct-return -fpeephole -fpeephole2
-fpredictive-commoning -fregmove -freorder-blocks -freorder-functions
-frerun-cse-after-loop -fsched-interblock -fsched-spec
-fsched-stalled-insns-dep -fschedule-insns2 -fshow-column -fsigned-zeros
-fsplit-ivs-in-unroller -fsplit-wide-types -fstrict-aliasing
-fstrict-overflow -fthread-jumps -ftoplevel-reorder -ftrapping-math
-ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-copy-prop
-ftree-copyrename -ftree-cselim -ftree-dce -ftree-dominator-opts
-ftree-dse -ftree-forwprop -ftree-fre -ftree-loop-im -ftree-loop-ivcanon
-ftree-loop-optimize -ftree-parallelize-loops= -ftree-phiprop -ftree-pre
-ftree-pta -ftree-reassoc -ftree-scev-cprop -ftree-sink
-ftree-slp-vectorize -ftree-sra -ftree-switch-conversion -ftree-ter
-ftree-vect-loop-version -ftree-vectorize -ftree-vrp -funit-at-a-time
-funswitch-loops -fvar-tracking -fvect-cost-model
-fzero-initialized-in-bss -m32 -m80387 -m96bit-long-double
-maccumulate-outgoing-args -malign-stringops -mcx16 -mfancy-math-387
-mfp-ret-in-387 -mfused-madd -mglibc -mieee-fp -mmmx -mmovbe -mno-red-zone
-mno-sse4 -mpush-args -msahf -msse -msse2 -msse3 -mssse3
-mtls-direct-seg-refs
Compiler executable checksum: f142bf44665c008856fda3c64386a6ca
main
Analyzing compilation unit
Performing interprocedural optimizations
<visibility> <early_local_cleanups> <summary generate> <cp> <inline> <static-var> <pure-const>Assembling functions:
main
Execution times (seconds)
callgraph construction: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.03 (11%) wall 0 kB ( 0%) ggc
parser : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.08 (30%) wall 192 kB (23%) ggc
tree gimplify : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 4%) wall 0 kB ( 0%) ggc
tree CFG construction : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 7%) wall 0 kB ( 0%) ggc
tree CFG cleanup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 4%) wall 0 kB ( 0%) ggc
tree SSA other : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 4%) wall 0 kB ( 0%) ggc
tree CCP : 0.00 ( 0%) usr 0.01 (100%) sys 0.01 ( 4%) wall 0 kB ( 0%) ggc
expand : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.03 (11%) wall 3 kB ( 0%) ggc
combiner : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 7%) wall 0 kB ( 0%) ggc
scheduling 2 : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.03 (11%) wall 0 kB ( 0%) ggc
machine dep reorg : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 4%) wall 0 kB ( 0%) ggc
TOTAL : 0.01 0.01 0.27 847 kB
Extra diagnostic checks enabled; compiler may run slowly
|
|
|
Back to top |
|
|
BillyBoy Tux's lil' helper
Joined: 26 Nov 2003 Posts: 101 Location: USA
|
Posted: Thu Jul 30, 2009 9:14 pm Post subject: My recent results |
|
|
Code: |
BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 537.24 : 13.78 : 4.52
STRING SORT : 58.753 : 26.25 : 4.06
BITFIELD : 1.7623e+08 : 30.23 : 6.31
FP EMULATION : 54.418 : 26.11 : 6.03
FOURIER : 7294.8 : 8.30 : 4.66
ASSIGNMENT : 11.767 : 44.78 : 11.61
IDEA : 2044.5 : 31.27 : 9.28
HUFFMAN : 978.9 : 27.14 : 8.67
NEURAL NET : 7.4568 : 11.98 : 5.04
LU DECOMPOSITION : 396.2 : 20.53 : 14.82
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 27.142
FLOATING-POINT INDEX: 12.682
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : 4 CPU GenuineIntel Intel(R) Atom(TM) CPU 330 @ 1.60GHz 1596MHz
L2 Cache : 512 KB
OS : Linux 2.6.29-gentoo-r5
C compiler : i686-pc-linux-gnu-gcc
libc :
MEMORY INDEX : 6.679
INTEGER INDEX : 6.844
FLOATING-POINT INDEX: 7.034
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.
|
My CFLAGS:
Code: | CFLAGS="-O2 -march=prescott -mtune=core2 -fomit-frame-pointer -pipe" |
My uname:
Code: | Linux atom 2.6.29-gentoo-r5 #3 SMP Wed Jul 29 22:40:06 PDT 2009 i686 Intel(R) Atom(TM) CPU 330 @ 1.60GHz GenuineIntel GNU/Linux |
My portage:
Code: | Portage 2.1.6.13 (default/linux/x86/2008.0, gcc-4.3.2, glibc-2.9_p20081201-r2, 2.6.29-gentoo-r5 i686)
=================================================================
System uname: Linux-2.6.29-gentoo-r5-i686-Intel-R-_Atom-TM-_CPU_330_@_1.60GHz-with-glibc2.0
Timestamp of tree: Mon, 27 Jul 2009 10:45:02 +0000 |
My kit (from dmidecode):
Code: | Base Board Information
Manufacturer: Intel Corporation
Product Name: D945GCLF2
Version: AAE46416-106 |
I have one stick of DDR2 800 but it only runs at 533 (despite the box saying it can do 667!). I'm actually pretty happy with this. For a hundred bucks, I have a completely usable system. Gotta love Gentoo.... |
|
Back to top |
|
|
djtreble n00b
Joined: 09 Jan 2006 Posts: 39 Location: Brisbane, Australia
|
Posted: Sat Jan 16, 2010 11:41 am Post subject: |
|
|
Comparing march=atom to march=core2
Code: | CFLAGS="-O2 -march=core2 -pipe |
Code: | BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 479.28 : 12.29 : 4.04
STRING SORT : 56.235 : 25.13 : 3.89
BITFIELD : 1.3752e+08 : 23.59 : 4.93
FP EMULATION : 46.123 : 22.13 : 5.11
FOURIER : 7237.1 : 8.23 : 4.62
ASSIGNMENT : 11.877 : 45.19 : 11.72
IDEA : 1840.9 : 28.16 : 8.36
HUFFMAN : 849.82 : 23.57 : 7.53
NEURAL NET : 6.9442 : 11.16 : 4.69
LU DECOMPOSITION : 399.44 : 20.69 : 14.94
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 24.182
FLOATING-POINT INDEX: 12.385
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : Dual GenuineIntel Intel(R) Atom(TM) CPU N270 @ 1.60GHz 1600MHz
L2 Cache : 512 KB
OS : Linux 2.6.31-gentoo-r6
C compiler : i686-pc-linux-gnu-gcc
libc :
MEMORY INDEX : 6.079
INTEGER INDEX : 6.001
FLOATING-POINT INDEX: 6.869
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder. |
Code: | CFLAGS="-O2 -march=atom -pipe" |
Code: | BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 512.16 : 13.13 : 4.31
STRING SORT : 56.093 : 25.06 : 3.88
BITFIELD : 1.3813e+08 : 23.69 : 4.95
FP EMULATION : 51.637 : 24.78 : 5.72
FOURIER : 7118.5 : 8.10 : 4.55
ASSIGNMENT : 12.773 : 48.60 : 12.61
IDEA : 1531.4 : 23.42 : 6.95
HUFFMAN : 868.2 : 24.08 : 7.69
NEURAL NET : 7.0021 : 11.25 : 4.73
LU DECOMPOSITION : 379.56 : 19.66 : 14.20
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 24.499
FLOATING-POINT INDEX: 12.143
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : Dual GenuineIntel Intel(R) Atom(TM) CPU N270 @ 1.60GHz 1600MHz
L2 Cache : 512 KB
OS : Linux 2.6.31-gentoo-r6
C compiler : i686-pc-linux-gnu-gcc
libc :
MEMORY INDEX : 6.232
INTEGER INDEX : 6.026
FLOATING-POINT INDEX: 6.735
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder. |
Code: | gcc version 4.5.0-alpha20091224 20091224 (experimental) (Gentoo 4.5.0_alpha20091224) |
Shows nothing really I ran nbench again and it gave differing results, so I don't really trust it! |
|
Back to top |
|
|
b0nafide Apprentice
Joined: 17 Feb 2008 Posts: 171 Location: ~/
|
Posted: Sat Jan 16, 2010 4:45 pm Post subject: |
|
|
Acer Aspire One D150...
Code: | gcc version 4.3.4 (Gentoo 4.3.4 p1.0, pie-10.1.5)
CFLAGS="-O2 -march=core2 -mtune=generic -fomit-frame-pointer -pipe"
# nbench
BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 525.72 : 13.48 : 4.43
STRING SORT : 57.211 : 25.56 : 3.96
BITFIELD : 1.7151e+08 : 29.42 : 6.15
FP EMULATION : 56.795 : 27.25 : 6.29
FOURIER : 7329.5 : 8.34 : 4.68
ASSIGNMENT : 11.688 : 44.48 : 11.54
IDEA : 2050.2 : 31.36 : 9.31
HUFFMAN : 964.26 : 26.74 : 8.54
NEURAL NET : 7.1714 : 11.52 : 4.85
LU DECOMPOSITION : 405.76 : 21.02 : 15.18
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 26.942
FLOATING-POINT INDEX: 12.638
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : Dual GenuineIntel Intel(R) Atom(TM) CPU N270 @ 1.60GHz 1600MHz
L2 Cache : 512 KB
OS : Linux 2.6.31-gentoo-r6
C compiler : i686-pc-linux-gnu-gcc
libc :
MEMORY INDEX : 6.546
INTEGER INDEX : 6.859
FLOATING-POINT INDEX: 7.009
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.
|
|
|
Back to top |
|
|
djselbeck n00b
Joined: 10 Oct 2005 Posts: 32 Location: Germany
|
Posted: Mon Jan 18, 2010 8:47 pm Post subject: |
|
|
on HP Mini 5101:
Code: | CFLAGS="-O2 -march=core2 -mtune=generic -fomit-frame-pointer -pipe"
gcc 4.3.4 |
Code: | BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 553.8 : 14.20 : 4.66
STRING SORT : 60.52 : 27.04 : 4.19
BITFIELD : 1.7867e+08 : 30.65 : 6.40
FP EMULATION : 59.08 : 28.35 : 6.54
FOURIER : 7646.5 : 8.70 : 4.88
ASSIGNMENT : 12.227 : 46.53 : 12.07
IDEA : 2147.4 : 32.84 : 9.75
HUFFMAN : 1035.4 : 28.71 : 9.17
NEURAL NET : 7.5818 : 12.18 : 5.12
LU DECOMPOSITION : 429.08 : 22.23 : 16.05
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 28.329
FLOATING-POINT INDEX: 13.303
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : Dual GenuineIntel Intel(R) Atom(TM) CPU N280 @ 1.66GHz 1667MHz
L2 Cache : 512 KB
OS : Linux 2.6.31-gentoo-r6
C compiler : i686-pc-linux-gnu-gcc
libc :
MEMORY INDEX : 6.864
INTEGER INDEX : 7.227
FLOATING-POINT INDEX: 7.378
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.
|
|
|
Back to top |
|
|
Nuteater Apprentice
Joined: 25 Sep 2003 Posts: 193 Location: Jyväskylä, Finland
|
Posted: Tue Apr 13, 2010 7:07 pm Post subject: |
|
|
I recently upgraded my EEE 901 to a 4.5 prerelease to try -march=atom (and because
my system hasn’t been properly broken for a long time ). Here are the results.
With gcc-4.4.1 and
Code: | CFLAGS="-march=prescott -O2 -fomit-frame-pointer -pipe" |
Code: | BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 527.2 : 13.52 : 4.44
STRING SORT : 57.857 : 25.85 : 4.00
BITFIELD : 2.0284e+08 : 34.79 : 7.27
FP EMULATION : 56.235 : 26.98 : 6.23
FOURIER : 7325.3 : 8.33 : 4.68
ASSIGNMENT : 11.777 : 44.81 : 11.62
IDEA : 1991.2 : 30.46 : 9.04
HUFFMAN : 869.22 : 24.10 : 7.70
NEURAL NET : 6.5974 : 10.60 : 4.46
LU DECOMPOSITION : 310.24 : 16.07 : 11.61
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 27.122
FLOATING-POINT INDEX: 11.237
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : Dual GenuineIntel Intel(R) Atom(TM) CPU N270 @ 1.60GHz 1600MHz
L2 Cache : 512 KB
OS : Linux 2.6.32.8
C compiler : i686-pc-linux-gnu-gcc
libc :
MEMORY INDEX : 6.966
INTEGER INDEX : 6.623
FLOATING-POINT INDEX: 6.232
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder. |
With gcc-4.5.0-alpha20100408 and
Code: | CFLAGS="-march=atom -O2 -mssse3 -mfpmath=sse -fexcess-precision=fast -fomit-frame-pointer -pipe" |
Code: | BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 523.92 : 13.44 : 4.41
STRING SORT : 59.896 : 26.76 : 4.14
BITFIELD : 1.4147e+08 : 24.27 : 5.07
FP EMULATION : 54.872 : 26.33 : 6.08
FOURIER : 7708.9 : 8.77 : 4.92
ASSIGNMENT : 13.934 : 53.02 : 13.75
IDEA : 1939.2 : 29.66 : 8.81
HUFFMAN : 1017.2 : 28.21 : 9.01
NEURAL NET : 9.6915 : 15.57 : 6.55
LU DECOMPOSITION : 451.44 : 23.39 : 16.89
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 26.900
FLOATING-POINT INDEX: 14.724
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : Dual GenuineIntel Intel(R) Atom(TM) CPU N270 @ 1.60GHz 800MHz
L2 Cache : 512 KB
OS : Linux 2.6.32.8
C compiler : i686-pc-linux-gnu-gcc
libc :
MEMORY INDEX : 6.610
INTEGER INDEX : 6.791
FLOATING-POINT INDEX: 8.166
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder. |
Of course an artificial benchmark such as this doesn’t tell much, but floating point performance seems to be improved by a significant amount. Of course this may be just because of the other optimizations such as -mfpmath=sse. _________________ I am Nuteater, hear me roar. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|