Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Help tweak out our Opteron cluster!!!
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Gentoo on AMD64
View previous topic :: View next topic  
Author Message
erikm
l33t
l33t


Joined: 08 Feb 2005
Posts: 634

PostPosted: Wed Jun 08, 2005 2:40 pm    Post subject: Help tweak out our Opteron cluster!!! Reply with quote

Hi all,

I am about to design and install a 16 CPU Opteron Beowulf cluster at our research department. I run Gentoo x86/~x86 on a Pentium 4, a Pentium M, a dual Pentium 3 and another, smaller Athlon MP cluster, but I have not done any complete AMD 64 systems yet, just intermittent maintenance. Being short on time, I'd like to minimize the guesswork as much as possible, hence this thread.

I have the following general wishlist / questions for the OS, in order of priority:

1. Stability. I need a rock solid toolchain, preferrably as free from 32-bit emulation as possible. I have never had much trouble going ~x86 with the system on the other architectures in terms of stability, so what do you recommend, ~x86 or x86?

2. Optimization. Given a little more experience, I might well join the anti-ricer missionaries and preach the "i686 -O2" gospel to the dark side :) . However, every little tiny bit of performance increase counts here, and counts a lot; so go crazy (they'll have my ass for this :D ): What compiler flags do I choose?

3. Maintainability. It would be great to be able to schedule upgrades etc., and actually have them work, i.e. not having to constantly babysit the thing. Since we're talking Linux, and better still Gentoo Linux, there is a number of ways to achieve the same functionality with different software combinations. So, is there any packages I definitely should steer clear of?

4. Has anyone tried the new 2.6 OpenMosix patches on an Opteron?


Each slave node will consist of dual Opteron 248:s and 4 GB RAM on a Tyan MB, with dual Broadcom Gb NIC:s and a 36 GB 15 K rpm SCSI drive. The slaves need to run as optimized, light and free from process queues as possible. Additionally, network latency is paramount. My slave specific questions are thus as follows:

1. I'm thinking
Code:
USE=-* sse sse2 3dnowext 3dnow mmx
is a good choice, considering that the slaves won't provide any services whatsoever. Not sure if there even is a "3dnowext" flag... ;). Comments?

2. Kernel config do's and don'ts?

3. Which is the best performing NIC driver for this system?


The frontend or main node will be a dual Opteron 244, 4 GB RAM, same MB as the slaves with triple 10 K rpm SCSI drives in RAID 5 for storage and nfs mounts. It will in principle be a regular server, providing ip forwarding, firewall, ntp, dns, nfs etc. for the slaves. It will also run an X server and a DE with some form of web browser for simpler maintenance. I thought I'd try a little more security routines on it than I have so far. Questions concerning the frontend are thus:

1. Security. Any experience / comment with the SELinux / grsecurity kernel patches and the hardened gcc profiles (which do I choose? nopie? ssp? 8O ) is welcome. I will also run the standard sudo / rsa authenticated ssh / non standard ports stuff.

2. "General" question 3 above is important here. Which packages / USE flags do I avoid?

Also, the switch we are looking at can handle layer 3 routing. Do we really need that? I would guess layer 3 routing might affect latencies adversely?

Finally, for those of you who have been longing to share your showstopper HPC tips; here is your chance. Thanks all!
Back to top
View user's profile Send private message
wazoo42
Apprentice
Apprentice


Joined: 13 Apr 2004
Posts: 165

PostPosted: Thu Jun 09, 2005 12:01 am    Post subject: Reply with quote

Are you using fortran? If so you really want to steer clear of gcc b/c its -i8 flag does not work properly. Also, for all out performance you will probably get more improvement going to pgi or pathscale. I have tried both for electronic structure calcs (GAMESS) and they are speedy.

Are you installing 32 bit gentoo on these? I would guess not, but the x86 flags have me wondering. In terms of gcc I would say a good start would be to have -march=opteron -O2 -fomit-frame-pointer -funroll-loops -pipe.
Back to top
View user's profile Send private message
erikm
l33t
l33t


Joined: 08 Feb 2005
Posts: 634

PostPosted: Thu Jun 09, 2005 8:54 am    Post subject: Reply with quote

Ah yes, I noticed I should have put amd64 as the architecture :oops: . So, do you suggest using icc or the pg c compiler? Where would I have used the -i8 flag? Thanks! :)
Back to top
View user's profile Send private message
wazoo42
Apprentice
Apprentice


Joined: 13 Apr 2004
Posts: 165

PostPosted: Thu Jun 09, 2005 12:13 pm    Post subject: Reply with quote

I have only used the demos, but pgi has much better support than pathscale. That being said, pathscale usually handily out performs pgi. The -i8 comes in when you want fortran to use 64bit integers.
Back to top
View user's profile Send private message
brankob
Apprentice
Apprentice


Joined: 29 Apr 2004
Posts: 188

PostPosted: Sun Jun 12, 2005 6:55 pm    Post subject: Reply with quote

OT, I know, but I'm curious- why are you not using dual-core Opterons , like 275?

That would mean only 4 motherboards instead of 8 and those are not exactly cheap, besides it could kill some of the net latency...

Unless of course you will be packing each CPU with as much ram as possible... :?
Back to top
View user's profile Send private message
visaris
n00b
n00b


Joined: 10 Jun 2005
Posts: 11

PostPosted: Sun Jun 12, 2005 10:44 pm    Post subject: Reply with quote

Depending on your needs, the dual-core parts may be a great way to go. I have a Tyan K8WE (s2895) with two Opteron 265s (dual-core @ 1.8GHz). This thing is fast.. I'll admit it took me just a little while to get it all working, but it was worth it.

As for basic advice, just be sure you get a MB that has ram banks connected to each CPU. Since each Opteron has it's own on-board memroy controller, make sure you get a motherboard that supports them all. I've seen some lame dual-proc boards that only use one mem controller.
Back to top
View user's profile Send private message
get sirius
Guru
Guru


Joined: 27 Apr 2002
Posts: 316
Location: Madison, WI

PostPosted: Mon Jun 13, 2005 9:18 pm    Post subject: Reply with quote

Just a point to make about your ventured USE flags - don't put those specific flags there. "sse" and "sse2" are both implied by the line in your make.conf, CHOST="x86_64-pc-linux-gnu", so they do not need to be mentioned at all. If you have processors built on the 90nm scale, then they have sse3 enabled. To make sure that gcc takes that into account, include "-msse3" in your CFLAGS line. And there is also no need to mention 3dnow, 3dnowext, or (and especially!)mmx.

I believe those use flags are included in the list of USE flags mainly for the use of developers.
Back to top
View user's profile Send private message
Albert_Alligator
Apprentice
Apprentice


Joined: 12 May 2004
Posts: 193
Location: Okefenokee Swamp

PostPosted: Thu Jul 21, 2005 2:51 pm    Post subject: Reply with quote

If you have a large amount of memory, using the -O3 flag is fine for HPC type computing, but you should test it out with both. A good benchmark is the HPC challenge benchmark. I will be posting a full tutorial of how to install and make it work soon.

GCC really isn't the compiler of high performance computing, although it will do in a pinch. If you're using g77 you should be fine, anything newer and you'll get into trouble with GCC.

Both Pathscale and PGI compilers are fast, but they are spendy, however, if you're willing to give up the Opteron boards, Intel has a very good compiler that if you don't sell it with a commercial system, you can have for free.

But as we all know, Intel CPUs don't hold a candle to the Opterons, so you'll have to make the choice yourself.

In a preliminary warm up test with GCC, I did the HPL test from the HPC challenge site and clocked a whopping 9.7 Gigaflops on a Tyan S4882 4-way board with 4 800 series Opterons.

Well, I'm tuning and I hope to break the 10 Gigaflop barrier, and who knows, maybe even get as high as 15 Gigaflops.

We'll have to wait and see.

Al
_________________
As Socrates once said "I drank what?"
Back to top
View user's profile Send private message
erikm
l33t
l33t


Joined: 08 Feb 2005
Posts: 634

PostPosted: Fri Jul 22, 2005 8:18 am    Post subject: Reply with quote

Albert_Alligator wrote:
If you have a large amount of memory, using the -O3 flag is fine for HPC type computing, but you should test it out with both. A good benchmark is the HPC challenge benchmark. I will be posting a full tutorial of how to install and make it work soon.

GCC really isn't the compiler of high performance computing, although it will do in a pinch. If you're using g77 you should be fine, anything newer and you'll get into trouble with GCC.

Both Pathscale and PGI compilers are fast, but they are spendy, however, if you're willing to give up the Opteron boards, Intel has a very good compiler that if you don't sell it with a commercial system, you can have for free.

But as we all know, Intel CPUs don't hold a candle to the Opterons, so you'll have to make the choice yourself.

In a preliminary warm up test with GCC, I did the HPL test from the HPC challenge site and clocked a whopping 9.7 Gigaflops on a Tyan S4882 4-way board with 4 800 series Opterons.

Well, I'm tuning and I hope to break the 10 Gigaflop barrier, and who knows, maybe even get as high as 15 Gigaflops.

We'll have to wait and see.

Al
Super. I'm looking forward to your tutorial. Reading up on the pathscale compiler, it seems they claim it is completely transparent with gcc - to the point that a trivial wrapper script would let me us it as gcc. Don't know if that is true, but if it is, using Pathscale would also fix my fortran 90 needs. Trying to incorporate IFC with gcc is a major bitch, IMO. I've had to compile dual libs for most fortran code, which I would rather not.

Any tuning tips you could share would be most gratefully received. I'll look in on this thread from time to time. Thanks!!! :D

Btw - the frontend for the mainframe just arrived yesterday (2x Opteron 244, 8 GB PC 3200, SATA II RAID 6), and it did glibc with nptl in just over half an hour :twisted:
Back to top
View user's profile Send private message
Desti²
Tux's lil' helper
Tux's lil' helper


Joined: 06 Sep 2003
Posts: 127

PostPosted: Fri Jul 22, 2005 9:08 am    Post subject: Reply with quote

Some POVRAY Benches GCC vs. Icc: http://pov4grasp.free.fr/articles/fastpov1/ :)
_________________
Linux Users Everywhere @ climateprediction.net
Back to top
View user's profile Send private message
dweigert
Guru
Guru


Joined: 04 Oct 2002
Posts: 369
Location: Somerset, NJ USA

PostPosted: Fri Jul 22, 2005 5:25 pm    Post subject: Reply with quote

Folks,
Just remember that the Intel compiler suite has been intentionally crippled by checking for Genuine Intel, before checking the processor capability flags.
Any other manufacturer's CPU will be routed to the *Slowest* code paths, including ones that truly are brain damaged.

Dan
_________________
"Always remember to mount a scratch monkey..."
Back to top
View user's profile Send private message
erikm
l33t
l33t


Joined: 08 Feb 2005
Posts: 634

PostPosted: Fri Jul 22, 2005 5:59 pm    Post subject: Reply with quote

That doesn't change the fact that gcc obviously is the better choice for an Opteron, though.
Back to top
View user's profile Send private message
nephros
Advocate
Advocate


Joined: 07 Feb 2003
Posts: 2139
Location: Graz, Austria (Europe - no kangaroos.)

PostPosted: Fri Jul 22, 2005 6:18 pm    Post subject: Reply with quote

dweigert wrote:
Folks,
Just remember that the Intel compiler suite has been intentionally crippled by checking for Genuine Intel, before checking the processor capability flags.
Any other manufacturer's CPU will be routed to the *Slowest* code paths, including ones that truly are brain damaged.

Interesting.
Would you happen to have a link handy?

Just to cool down my FUDmeter(TM), although it does make sense of course.
_________________
Please put [SOLVED] in your topic if you are a moron.
Back to top
View user's profile Send private message
piwacet
Guru
Guru


Joined: 30 Dec 2004
Posts: 486

PostPosted: Sat Jul 23, 2005 4:19 am    Post subject: Reply with quote

http://www.swallowtail.org/naughty-intel.html
Back to top
View user's profile Send private message
nephros
Advocate
Advocate


Joined: 07 Feb 2003
Posts: 2139
Location: Graz, Austria (Europe - no kangaroos.)

PostPosted: Sat Jul 23, 2005 6:54 pm    Post subject: Reply with quote

piwacet wrote:
http://www.swallowtail.org/naughty-intel.html

very interesting, thanks!
_________________
Please put [SOLVED] in your topic if you are a moron.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo on AMD64 All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum