| View previous topic :: View next topic |
| Author |
Message |
jig n00b

Joined: 15 Feb 2003 Posts: 4
|
Posted: Sun Oct 19, 2003 10:24 am Post subject: Gentoo/ GNU-Linux benchmark |
|
|
Hi all!
I have an idea but due to lack of time I wont be able to make it working, so I hope someone will, if it turns out a good idea.
Along time ago I was thinking that one of the most important things was missing in the developers world. "How faster will my program run with this gcc flag activated?"
[url] http://www.gentoo.org/main/en/performance.xml [/url] Not so long ago we had a comparison between gentoo and Mandrake 9.1 which,IMHO, could have give even faster results to gentoo if the author used other flags.
So, you are asking now "What's is idea anyway?"
Its simple.
Each gentoo user runs a serie of benchmarks (problem 1) and uploads the results to a site plus the CFLAGS.
The results then would be ordered and soon we would be able to have an enlightening result for each arch, and each user would be able to compare his results with the results of similar machines and/or different flags.
In a second (later) step we could benchmark the same programs Jose Lopez did.
(problem 1) My only problem would be to choose the right benchmarks to use... Probably some new benchmark tools would be needed to test specific parameters.
P.S.:Sorry about the english  |
|
| Back to top |
|
 |
HelloWorld82 n00b


Joined: 05 Oct 2003 Posts: 46 Location: Germany
|
Posted: Sun Oct 19, 2003 10:59 am Post subject: Fair test ? |
|
|
I think the perf test where isn't right.
http://www.gentoo.org/main/en/performance.xml
It's not to flame gentoo, Im also using it, and I like it a lot - the best way to have a linux distributioon that works right for u is : "do it yourself". But I thing Mandrake isn't so slow. 9.1 is an older disrtibution, u should compare gentoo witch mandrake 9.2. Also mozilla was older in mandrake 9.1, it's normal that is starts slowler . And there is also no indication with which flags mozilla has been compiled. I like Mandrake, because I disovered Linux because of them  |
|
| Back to top |
|
 |
The_Paya Developer


Joined: 29 Aug 2003 Posts: 23 Location: Argentina
|
Posted: Wed Oct 29, 2003 2:40 am Post subject: |
|
|
As you asked me in the mailing list, I like you idea, and this was "a sort of that" idea, but I don't think that we should use "benchmark" programs for such a thing, because them not always do the same as real-world programs, so my idea is to use real-world programs (maybe configured in some way) to do this kind of benchmarks.
The benchmark I'll post here was done using povray (http://www.povray.org) following their specs to do benchmarks (using the benchmark.ini configuration, and the skyvase.pov file to render).
I choosed povray because this is a sort of rendering app that does lots of the three big things: memory i/o, integer and floating operations. it doesn't use the video card to render anything and this configuration (benchmark.ini) neither uses any kind of time-consuming output (just the screen for statistics).
Also, another thing to have in mind when choosing a program to benchmark: as most of you who know that every program in plain C/C++ (which is we want to optimize with our CFLAGS) uses nomatterwhat the system libc, may ask "yeah, but you have to compile the whole glibc again and again between your CFLAGS compilations", that's not so important if you choose the right program to do your bench. Povray does *a lot* of operations between variables and uses *a lot* of functions and subfuntions of these functions, to render the code, (look at a trace of the DNoise function) so -here- the compiler will have lots of choices to do a big optimization regardless if your glibc runs for i386 (of course it will run slower, but you will notice the differences between each mix of CFLAGS).
Last thing for a program to benchmark, if your program of choice uses another shared library that isn't the libc (povray uses libpng and libtiff, but for the output only, and here I have no output of any of these) and you still want to benchmark that program, remember that would be cool if every lib that the program needs/uses is compiled with the same CFLAGS, so you will notice extremely more differences between your tests.
Here it is (sorry, it's very long):
| Code: |
Commandline: "time nice -n -20 povray skyvase.pov" (using benchmark.ini)
CFLAGS= -O3 -march=athlon-xp -fomit-frame-pointer
real 0m3.156s
user 0m2.996s
sys 0m0.161s
*********************************************************************
CFLAGS= -O2 -march=athlon-xp -fomit-frame-pointer
real 0m3.002s
user 0m2.846s
sys 0m0.157s
*********************************************************************
CFLAGS= -O2 -march=athlon-xp -finline-functions -fomit-frame-pointer <- -O3 added
real 0m3.197s
user 0m3.039s
sys 0m0.158s
*********************************************************************
CFLAGS= -O2 -march=athlon-xp -frename-registers -fomit-frame-pointer <- -O3 added ! this is the fast one !
real 0m2.993s
user 0m2.834s
sys 0m0.159s
*********************************************************************
CFLAGS= -O2 -march=athlon-xp -frename-registers -mpreferred-stack-boundary=2 \ <- slower ?
-fomit-frame-pointer
real 0m3.326s
user 0m3.158s
sys 0m0.168s
*********************************************************************
CFLAGS= -O2 -march=athlon-xp -frename-registers -mpreferred-stack-boundary=4 \ <- RTFM, implied default
-fomit-frame-pointer
real 0m2.996s
user 0m2.834s
sys 0m0.162s
*********************************************************************
CFLAGS= -O2 -march=athlon-xp -frename-registers -mpreferred-stack-boundary=8 \ <- I already RTFM, slower, ok.
-fomit-frame-pointer
real 0m3.021s
user 0m2.860s
sys 0m0.162s
*********************************************************************
CFLAGS= -O2 -march=athlon-xp -frename-registers -malign-double \ <- I didn't added -mpreferred... bcos is implied
-fomit-frame-pointer <- Now -malign-double FASTER!
real 0m2.959s
user 0m2.802s
sys 0m0.158s
*********************************************************************
CFLAGS= -O2 -march=athlon-xp -frename-registers -malign-double \ <- almost same as before, new flag implied
-m96bit-long-double -fomit-frame-pointer
real 0m2.982s
user 0m2.802s
sys 0m0.181s
*********************************************************************
CFLAGS= -O2 -march=athlon-xp -frename-registers -malign-double \ <- 128bit long double slower.
-m128bit-long-double -fomit-frame-pointer
real 0m3.018s
user 0m2.858s
sys 0m0.161s
*********************************************************************
CFLAGS= -O2 -march=athlon-xp -frename-registers -malign-double \ <- almost the same as without -mmx, implied?
-mmmx -fomit-frame-pointer
real 0m2.969s
user 0m2.802s
sys 0m0.167s
*********************************************************************
CFLAGS= -O2 -march=athlon-xp -frename-registers -malign-double \ <- again, maybe implied?
-mmx -msse -fomit-frame-pointer
real 0m2.965s
user 0m2.803s
sys 0m0.162s
*********************************************************************
CFLAGS= -O2 -march=athlon-xp -frename-registers -malign-double \ <- no noticable effect yet,
-mmx -msse -m3dnow -fomit-frame-pointer <- maybe implied?
real 0m2.962s
user 0m2.803s
sys 0m0.159s
*********************************************************************
CFLAGS= -O2 -march=athlon-xp -frename-registers -malign-double \ <- what happens without mmx?
-msse -m3dnow -fomit-frame-pointer <- nothing :+/
real 0m2.964s
user 0m2.802s
sys 0m0.162s
*********************************************************************
CFLAGS= -O2 -march=athlon-xp -frename-registers -malign-double \ <- and without sse?
-m3dnow -fomit-frame-pointer <- bah, nothing :+/
real 0m2.974s
user 0m2.805s
sys 0m0.169s
*********************************************************************
CFLAGS= -O2 -march=athlon-xp -frename-registers -malign-double \ <- i was reading the info...
-mno-push-args -fomit-frame-pointer <- and I found this... not too much, and I don't like it :+P.
real 0m2.972s
user 0m2.804s
sys 0m0.168s
*********************************************************************
CFLAGS= -O2 -march=athlon-xp -frename-registers -malign-double \ <- this implies the last one, well, see what happens..
-maccumulate-outgoing-args -fomit-frame-pointer <- faster, but bigger code size. (not a lot of space here)
real 0m2.969s
user 0m2.799s
sys 0m0.170s
*********************************************************************
CFLAGS= -O2 -march=athlon-xp -frename-registers -malign-double \ <- Huh, faster, huh.
-maccumulate-outgoing-args -mno-align-stringops \
-fomit-frame-pointer
real 0m2.948s
user 0m2.781s
sys 0m0.168s
*********************************************************************
CFLAGS= -O2 -march=athlon-xp -frename-registers -malign-double \ <- again i'm reading the info...
-maccumulate-outgoing-args -mno-align-stringops \ <- 17ms slower. bah.
-minline-all-stringops -fomit-frame-pointer
real 0m2.968s
user 0m2.798s
sys 0m0.170s
*********************************************************************
CFLAGS= -O2 -march=athlon-xp -frename-registers -malign-double \ <- -fforce-mem in -O2...
-maccumulate-outgoing-args -mno-align-stringops \ <- what about -fforce-addr?
-fforce-addr -fomit-frame-pointer <- mbu. slower.
real 0m3.132s
user 0m2.970s
sys 0m0.162s
*********************************************************************
CFLAGS= -O2 -march=athlon-xp -frename-registers -malign-double \ <- -fbranch-count-reg is enabled with -O2
-maccumulate-outgoing-args -mno-align-stringops \ <- what happens disabling this?
-fno-branch-count-reg -fomit-frame-pointer <- uhm, it's enabled for a good reason (:+P)
real 0m2.958s
user 0m2.794s
sys 0m0.164s
*********************************************************************
CFLAGS= -O2 -march=athlon-xp -frename-registers -malign-double \ <- slow like hell.
-maccumulate-outgoing-args -mno-align-stringops \
-fmove-all-movables -freduce-all-givs -freduce-all-givs -fomit-frame-pointer
real 0m3.198s
user 0m3.038s
sys 0m0.160s
*********************************************************************
CFLAGS= -O2 -march=athlon-xp -frename-registers -malign-double \ <- this one generates imprecise math code
-maccumulate-outgoing-args -mno-align-stringops \ <- but not so imprecise ;+P
-ffast-math -fomit-frame-pointer
real 0m3.043s
user 0m2.881s
sys 0m0.162s
*********************************************************************
CFLAGS= -O2 -march=athlon-xp -frename-registers -malign-double \ <- let's play with -fpmath
-maccumulate-outgoing-args -mno-align-stringops \ <- sse: slower
-fpmath=sse -fomit-frame-pointer
real 0m3.048s
user 0m2.890s
sys 0m0.158s
*********************************************************************
CFLAGS= -O2 -march=athlon-xp -frename-registers -malign-double \ <- 387:
-maccumulate-outgoing-args -mno-align-stringops \ <- mmm, better...
-fpmath=387 -fomit-frame-pointer
real 0m2.952s
user 0m2.788s
sys 0m0.164s
*********************************************************************
CFLAGS= -O2 -march=athlon-xp -frename-registers -malign-double \ <- sse,387:
-maccumulate-outgoing-args -mno-align-stringops \ <- b00, slower...
-fpmath=sse,387 -fomit-frame-pointer
real 0m3.104s
user 0m2.941s
sys 0m0.163s
*********************************************************************
************************branch probabilities*****************************
*********************************************************************
This is the end of the CFLAGS that gentoo can take, the following works in this way:
You first compile a program with -fprofile-arcs, then run the program a while. When
you do this, the program runs slower than hell, but don't worry, it's creating
information at the side of your already compiled code about branch probabilities,
(without this GCC does random branch prediction, with this GCC is writing the branch
flow to a .da file (with the same name of the .c/.o file that it's being executed, so
DON'T delete your directory with the source code)
After -fprofile-arcs, and running the compiled program, you have to recompile it again
with -fbranch-probabilities, and the compiler will get branch data from the already
generated .da files and make the code run in the directions of the most commonly,
and time consuming, code. Just looks what happens:
*********************************************************************
CFLAGS= -O2 -march=athlon-xp -frename-registers -malign-double \ <- now, the real part. profiling.
-maccumulate-outgoing-args -mno-align-stringops \ <- first we compile with -fprofile-arcs
-fpmath=387 -fprofile-arcs -fomit-frame-pointer <- (compile with -p and use gprof to see nice stats)
real 0m4.048s
user 0m3.882s
sys 0m0.166s
*********************************************************************
CFLAGS= -O2 -march=athlon-xp -frename-registers -malign-double \ <- now, gcc is using the profiled data
-maccumulate-outgoing-args -mno-align-stringops \ <- what can be faster than this?? :+)
-fpmath=387 -fbranch-probabilities -fomit-frame-pointer
real 0m2.900s
user 0m2.733s
sys 0m0.167s
*********************************************************************
|
_________________ wherever you go, there you are. |
|
| Back to top |
|
 |
fsck! n00b


Joined: 24 Oct 2003 Posts: 29
|
|
| Back to top |
|
 |
The_Paya Developer


Joined: 29 Aug 2003 Posts: 23 Location: Argentina
|
Posted: Thu Oct 30, 2003 9:41 pm Post subject: |
|
|
Hi, I saw every benchmark app in portage, and as before I was using gentoo, I don't like benchmarking programs, they don't think in "real world" situations, (like povray rendering) they always run each test in a singular function, so the compiler will never do a good thing about aligning functions or guessing branch probabilities, etc. That's why I preffer real world applications to do this benchmarks, you can test even The Gimp to do so, the problem is that you don't have a way tu run it without depending of the X output, another one is "sed" (which is used a lot in a unix enviroment) or "grep" or whatever you want, and "time" it to see how it takes to process a regular expression, after you compiled it with your test CFLAGS.
And what I really mean with this is that the benchmarking apps are more oriented at "what hardware runs me faster" rather than at "what CFLAGS compiles me better and makes me run faster".
Salu2. _________________ wherever you go, there you are. |
|
| Back to top |
|
 |
The_Paya Developer


Joined: 29 Aug 2003 Posts: 23 Location: Argentina
|
Posted: Fri Nov 26, 2004 4:43 am Post subject: mbump |
|
|
First trying to revive the "benchmark these CFLAGS in a real-app with nice -n -20" idea i had writing on this thread, and second showing some success on my work and looking "not to do the same thing that someone else maybe already did...", I'm posting this "experience" and a little question.
I where working with linux in various places over the time but all of those "places" had policies and stuff about what kind of linux distribution they'll use (or not use at all) so the only thing I could do with gentoo was "I use it only on my machine". But now, I work in a complete mess of an extremely bad directed "gouvernamental institution", one of those that makes use of -INTENSIVE- processing power, but, looking for the minimal-lowest-nonexistant "cost".
Sounds funny, right? :+)
Well here is when gentoo makes my life easier:
The "guy before me" was a fan of OpenBSD which I respect and still use as our "internet" firewalls, but it really lacks of the support that everyone else needed at work. A little example of this: They where working with databases in "a couple" of RDBMSs: Gupta/Centura SQLBase, Oracle, Interbase, among others. So... "how in the hell i can make a apache+mysql+php-THING work with these databases on OBSD". just "no way" so before starting the real "web services" they asked me to prove myself "migrating" some "non-cost-effective" M$ services, as I did:
The hardware they have are some Fujistu-Siemens Primergy P470 servers, with duals PentiumII or PentiumIII with no more than 500Mhz the P3 and 450Mhz the P2, if I wanted 1Gb of ram they had this but wasn't neccesary, and the only good thing i liked about these where the hot-pluggable backplanes with Mylex DAC960 and DAC1100 SCSI-2 Raid controllers.
So I took my livecd and migrated 2 M$ systems and an old redhat: First was the "internet S and A" server (ISA :+P) to squid->winbind/samba->w2kdomaingroups. Then was the mailserver, which was a qmail installed from some sort of strange script that downloads and compiles and installs and configured all by himself without asking a sh*t, which had more than 3000 viruses incoming per week, now it's a postfix+mysql+amavis+clamav+dspam solution made by myself (of course following TONS of howtos :+P). And the last one was the webserver itself, which was an "it need to be restarted every 4 hours" win2k with apache and php,and now it can acces ANY database that have an ODBC driver even in WINDOWS (using a DBTCP proxy in a win2k machine google it, it's cool, i also patched the php module to compile and added a pear module to access it from Pear DB.php ;+) from php using apache and accessing oracle 8.1.7 and oracle 9.2.0.5.
So far, I didn't needed "great" optimization. CFLAGS like "-O2 -march=pentium3/2 -frename-registers -fomit-frame-pointers -pipe" did the magic very well.
But now "the time has come" X'D
I, now, have to make -THE- database server.
I already started it, but with -extra-safe-cflags-and-package-versions-, because it should run Oracle 9.2.0.5 and you can find a lot of "troubles" rather than "solutions" in non-certified (and in certified too btw) distributions where to run oracle.
Now after 3 weeks of recovering dead disks, going for the S40 storage, and compiling/downloading/installing everything to the tinyest detail....
it's working...
This is what i've got:
The computer have two PentiumIII 500Mhz with 1Gb of ECC RAM, and a total of 14 SCSI-2 hard disks (yes 14), 2 of them (9Gb each) in a raid1 (mirror) configuration for the booting system (gentoo ;+) with reiserfs on a DAC960PRL controller.
The other 12 are configured in this way: since the DAC1100 that connects the server to the storage cannot handle in a single array pack more than 8 disks this storage has 2 RAID5 configurations one made of 8 9Gb disks and the other made of 4 18Gb disks, which I combined in three different "partitions" using software raid0 (stripping) with a total of (around) 110Gb, divided into 60Gb/30Gb/20Gb (oracle users my guess this u01/u02/u03 :+) that holds the big swap partition (2Gb outside the softraid) and, of course, will hold the database. Since this is going to handle "big" files, it has XFS tuned accordingly with Daniel Robbins suggestions ;+).
Now, regarding the original subject, I'm thinking about doing an emerge -e world with new cflags, check if it breaks anything, if it works faster, if I can extract all the juice this machines have, so I'm about to start "benchmarking"
But, taking in mind the -fact- that this wont recompile the oracle stuff (and oracle doesn't compile anything on your machine, it just "links" stuff) I was thinking about "forcing" some cflags to the glibc itself and other libs that may oracle use that can be optimized "the gentoo way".
After this, I'm asking before I start this "adventure" X'D, if anyone, anybody did something similar before, results, advices, CFLAGS!!!, even ~x86 stuff like gcc3.4 if it doesn't break oracle ;+), kernel schedulers (staircase?, cfq?, and io: anticipatory?, deadline?), kernel patches, anything (ideas, insults, flames X'D) can be of help and will be appreciated.
(I may add that this is a sort of "competition" between me and the DBA, that says: "the server must be a redhat 8, since oracle has certified it" and I say: "the server must be tested first with gentoo since it will run faster and we need speed on that old machines we have"....
So this will be (in the end) a benchmark with a title like this "Indestructible Redhat-Oracle" vs "Fast-Bleeding-Edge Gentoo-Oracle".)
So I hope the enthusiasm from all the gentoo users can help me win this battle ;+)
Thanks a lot for reading all of this, I know is very large, but I thought it would be nice to make it known that gentoo is growing in scales like this :+)
Salu2,
Javier. _________________ wherever you go, there you are. |
|
| Back to top |
|
 |
nxsty Veteran


Joined: 23 Jun 2004 Posts: 1556 Location: .se
|
Posted: Sun Nov 28, 2004 2:30 pm Post subject: Re: mbump |
|
|
| The_Paya wrote: | | After this, I'm asking before I start this "adventure" X'D, if anyone, anybody did something similar before, results, advices, CFLAGS!!!, even ~x86 stuff like gcc3.4 if it doesn't break oracle ;+), kernel schedulers (staircase?, cfq?, and io: anticipatory?, deadline?), kernel patches, anything (ideas, insults, flames X'D) can be of help and will be appreciated. |
Staircase is a CPU-sheduler and the others are IO-schedulers so it's completly different things. I think you should try staircase, it's the best performing CPU-scheduler available today and it's faster than the standard 0(1) in almost any situation.
You can read about the IO-schedulers here:
/usr/src/linux/Documentation/block
but I think deadline is probably the one you want because it's supposed to be good at database loads.
And use NPTL instead of LinuxThreads. NPTL is much faster! |
|
| Back to top |
|
 |
The_Paya Developer


Joined: 29 Aug 2003 Posts: 23 Location: Argentina
|
Posted: Fri Dec 10, 2004 3:28 am Post subject: |
|
|
First of all, thanks for the reply. The server is running now with staircase scheduler and the deadline io scheduler, I don't think that oracle is going to support NPTL, have anyone did a test with this? _________________ wherever you go, there you are. |
|
| Back to top |
|
 |
asimon l33t


Joined: 27 Jun 2002 Posts: 979 Location: Germany, Old Europe
|
Posted: Fri Dec 10, 2004 10:30 am Post subject: |
|
|
| The_Paya wrote: | | I don't think that oracle is going to support NPTL, have anyone did a test with this? |
No tested but a google search indicates that 10g does support NPTL and works without exporting LD_ASSUME_KERNEL=2.4.1. |
|
| Back to top |
|
 |
The_Paya Developer


Joined: 29 Aug 2003 Posts: 23 Location: Argentina
|
Posted: Mon Dec 13, 2004 10:53 am Post subject: |
|
|
we're currently using Oracle 9.2i, bcos of compatibility issues with newer versions :+/ _________________ wherever you go, there you are. |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|