View previous topic :: View next topic |
Author |
Message |
erikm l33t
Joined: 08 Feb 2005 Posts: 634
|
Posted: Wed Jun 08, 2005 2:40 pm Post subject: Help tweak out our Opteron cluster!!! |
|
|
Hi all,
I am about to design and install a 16 CPU Opteron Beowulf cluster at our research department. I run Gentoo x86/~x86 on a Pentium 4, a Pentium M, a dual Pentium 3 and another, smaller Athlon MP cluster, but I have not done any complete AMD 64 systems yet, just intermittent maintenance. Being short on time, I'd like to minimize the guesswork as much as possible, hence this thread.
I have the following general wishlist / questions for the OS, in order of priority:
1. Stability. I need a rock solid toolchain, preferrably as free from 32-bit emulation as possible. I have never had much trouble going ~x86 with the system on the other architectures in terms of stability, so what do you recommend, ~x86 or x86?
2. Optimization. Given a little more experience, I might well join the anti-ricer missionaries and preach the "i686 -O2" gospel to the dark side . However, every little tiny bit of performance increase counts here, and counts a lot; so go crazy (they'll have my ass for this ): What compiler flags do I choose?
3. Maintainability. It would be great to be able to schedule upgrades etc., and actually have them work, i.e. not having to constantly babysit the thing. Since we're talking Linux, and better still Gentoo Linux, there is a number of ways to achieve the same functionality with different software combinations. So, is there any packages I definitely should steer clear of?
4. Has anyone tried the new 2.6 OpenMosix patches on an Opteron?
Each slave node will consist of dual Opteron 248:s and 4 GB RAM on a Tyan MB, with dual Broadcom Gb NIC:s and a 36 GB 15 K rpm SCSI drive. The slaves need to run as optimized, light and free from process queues as possible. Additionally, network latency is paramount. My slave specific questions are thus as follows:
1. I'm thinking Code: | USE=-* sse sse2 3dnowext 3dnow mmx | is a good choice, considering that the slaves won't provide any services whatsoever. Not sure if there even is a "3dnowext" flag... . Comments?
2. Kernel config do's and don'ts?
3. Which is the best performing NIC driver for this system?
The frontend or main node will be a dual Opteron 244, 4 GB RAM, same MB as the slaves with triple 10 K rpm SCSI drives in RAID 5 for storage and nfs mounts. It will in principle be a regular server, providing ip forwarding, firewall, ntp, dns, nfs etc. for the slaves. It will also run an X server and a DE with some form of web browser for simpler maintenance. I thought I'd try a little more security routines on it than I have so far. Questions concerning the frontend are thus:
1. Security. Any experience / comment with the SELinux / grsecurity kernel patches and the hardened gcc profiles (which do I choose? nopie? ssp? ) is welcome. I will also run the standard sudo / rsa authenticated ssh / non standard ports stuff.
2. "General" question 3 above is important here. Which packages / USE flags do I avoid?
Also, the switch we are looking at can handle layer 3 routing. Do we really need that? I would guess layer 3 routing might affect latencies adversely?
Finally, for those of you who have been longing to share your showstopper HPC tips; here is your chance. Thanks all! |
|
Back to top |
|
|
wazoo42 Apprentice
Joined: 13 Apr 2004 Posts: 165
|
Posted: Thu Jun 09, 2005 12:01 am Post subject: |
|
|
Are you using fortran? If so you really want to steer clear of gcc b/c its -i8 flag does not work properly. Also, for all out performance you will probably get more improvement going to pgi or pathscale. I have tried both for electronic structure calcs (GAMESS) and they are speedy.
Are you installing 32 bit gentoo on these? I would guess not, but the x86 flags have me wondering. In terms of gcc I would say a good start would be to have -march=opteron -O2 -fomit-frame-pointer -funroll-loops -pipe. |
|
Back to top |
|
|
erikm l33t
Joined: 08 Feb 2005 Posts: 634
|
Posted: Thu Jun 09, 2005 8:54 am Post subject: |
|
|
Ah yes, I noticed I should have put amd64 as the architecture . So, do you suggest using icc or the pg c compiler? Where would I have used the -i8 flag? Thanks! |
|
Back to top |
|
|
wazoo42 Apprentice
Joined: 13 Apr 2004 Posts: 165
|
Posted: Thu Jun 09, 2005 12:13 pm Post subject: |
|
|
I have only used the demos, but pgi has much better support than pathscale. That being said, pathscale usually handily out performs pgi. The -i8 comes in when you want fortran to use 64bit integers. |
|
Back to top |
|
|
brankob Apprentice
Joined: 29 Apr 2004 Posts: 188
|
Posted: Sun Jun 12, 2005 6:55 pm Post subject: |
|
|
OT, I know, but I'm curious- why are you not using dual-core Opterons , like 275?
That would mean only 4 motherboards instead of 8 and those are not exactly cheap, besides it could kill some of the net latency...
Unless of course you will be packing each CPU with as much ram as possible... |
|
Back to top |
|
|
visaris n00b
Joined: 10 Jun 2005 Posts: 11
|
Posted: Sun Jun 12, 2005 10:44 pm Post subject: |
|
|
Depending on your needs, the dual-core parts may be a great way to go. I have a Tyan K8WE (s2895) with two Opteron 265s (dual-core @ 1.8GHz). This thing is fast.. I'll admit it took me just a little while to get it all working, but it was worth it.
As for basic advice, just be sure you get a MB that has ram banks connected to each CPU. Since each Opteron has it's own on-board memroy controller, make sure you get a motherboard that supports them all. I've seen some lame dual-proc boards that only use one mem controller. |
|
Back to top |
|
|
get sirius Guru
Joined: 27 Apr 2002 Posts: 316 Location: Madison, WI
|
Posted: Mon Jun 13, 2005 9:18 pm Post subject: |
|
|
Just a point to make about your ventured USE flags - don't put those specific flags there. "sse" and "sse2" are both implied by the line in your make.conf, CHOST="x86_64-pc-linux-gnu", so they do not need to be mentioned at all. If you have processors built on the 90nm scale, then they have sse3 enabled. To make sure that gcc takes that into account, include "-msse3" in your CFLAGS line. And there is also no need to mention 3dnow, 3dnowext, or (and especially!)mmx.
I believe those use flags are included in the list of USE flags mainly for the use of developers. |
|
Back to top |
|
|
Albert_Alligator Apprentice
Joined: 12 May 2004 Posts: 193 Location: Okefenokee Swamp
|
Posted: Thu Jul 21, 2005 2:51 pm Post subject: |
|
|
If you have a large amount of memory, using the -O3 flag is fine for HPC type computing, but you should test it out with both. A good benchmark is the HPC challenge benchmark. I will be posting a full tutorial of how to install and make it work soon.
GCC really isn't the compiler of high performance computing, although it will do in a pinch. If you're using g77 you should be fine, anything newer and you'll get into trouble with GCC.
Both Pathscale and PGI compilers are fast, but they are spendy, however, if you're willing to give up the Opteron boards, Intel has a very good compiler that if you don't sell it with a commercial system, you can have for free.
But as we all know, Intel CPUs don't hold a candle to the Opterons, so you'll have to make the choice yourself.
In a preliminary warm up test with GCC, I did the HPL test from the HPC challenge site and clocked a whopping 9.7 Gigaflops on a Tyan S4882 4-way board with 4 800 series Opterons.
Well, I'm tuning and I hope to break the 10 Gigaflop barrier, and who knows, maybe even get as high as 15 Gigaflops.
We'll have to wait and see.
Al _________________ As Socrates once said "I drank what?" |
|
Back to top |
|
|
erikm l33t
Joined: 08 Feb 2005 Posts: 634
|
Posted: Fri Jul 22, 2005 8:18 am Post subject: |
|
|
Albert_Alligator wrote: | If you have a large amount of memory, using the -O3 flag is fine for HPC type computing, but you should test it out with both. A good benchmark is the HPC challenge benchmark. I will be posting a full tutorial of how to install and make it work soon.
GCC really isn't the compiler of high performance computing, although it will do in a pinch. If you're using g77 you should be fine, anything newer and you'll get into trouble with GCC.
Both Pathscale and PGI compilers are fast, but they are spendy, however, if you're willing to give up the Opteron boards, Intel has a very good compiler that if you don't sell it with a commercial system, you can have for free.
But as we all know, Intel CPUs don't hold a candle to the Opterons, so you'll have to make the choice yourself.
In a preliminary warm up test with GCC, I did the HPL test from the HPC challenge site and clocked a whopping 9.7 Gigaflops on a Tyan S4882 4-way board with 4 800 series Opterons.
Well, I'm tuning and I hope to break the 10 Gigaflop barrier, and who knows, maybe even get as high as 15 Gigaflops.
We'll have to wait and see.
Al | Super. I'm looking forward to your tutorial. Reading up on the pathscale compiler, it seems they claim it is completely transparent with gcc - to the point that a trivial wrapper script would let me us it as gcc. Don't know if that is true, but if it is, using Pathscale would also fix my fortran 90 needs. Trying to incorporate IFC with gcc is a major bitch, IMO. I've had to compile dual libs for most fortran code, which I would rather not.
Any tuning tips you could share would be most gratefully received. I'll look in on this thread from time to time. Thanks!!!
Btw - the frontend for the mainframe just arrived yesterday (2x Opteron 244, 8 GB PC 3200, SATA II RAID 6), and it did glibc with nptl in just over half an hour |
|
Back to top |
|
|
Desti² Tux's lil' helper
Joined: 06 Sep 2003 Posts: 127
|
|
Back to top |
|
|
dweigert Guru
Joined: 04 Oct 2002 Posts: 369 Location: Somerset, NJ USA
|
Posted: Fri Jul 22, 2005 5:25 pm Post subject: |
|
|
Folks,
Just remember that the Intel compiler suite has been intentionally crippled by checking for Genuine Intel, before checking the processor capability flags.
Any other manufacturer's CPU will be routed to the *Slowest* code paths, including ones that truly are brain damaged.
Dan _________________ "Always remember to mount a scratch monkey..." |
|
Back to top |
|
|
erikm l33t
Joined: 08 Feb 2005 Posts: 634
|
Posted: Fri Jul 22, 2005 5:59 pm Post subject: |
|
|
That doesn't change the fact that gcc obviously is the better choice for an Opteron, though. |
|
Back to top |
|
|
nephros Advocate
Joined: 07 Feb 2003 Posts: 2139 Location: Graz, Austria (Europe - no kangaroos.)
|
Posted: Fri Jul 22, 2005 6:18 pm Post subject: |
|
|
dweigert wrote: | Folks,
Just remember that the Intel compiler suite has been intentionally crippled by checking for Genuine Intel, before checking the processor capability flags.
Any other manufacturer's CPU will be routed to the *Slowest* code paths, including ones that truly are brain damaged. |
Interesting.
Would you happen to have a link handy?
Just to cool down my FUDmeter(TM), although it does make sense of course. _________________ Please put [SOLVED] in your topic if you are a moron. |
|
Back to top |
|
|
piwacet Guru
Joined: 30 Dec 2004 Posts: 486
|
|
Back to top |
|
|
nephros Advocate
Joined: 07 Feb 2003 Posts: 2139 Location: Graz, Austria (Europe - no kangaroos.)
|
Posted: Sat Jul 23, 2005 6:54 pm Post subject: |
|
|
piwacet wrote: | http://www.swallowtail.org/naughty-intel.html |
very interesting, thanks! _________________ Please put [SOLVED] in your topic if you are a moron. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|