View previous topic :: View next topic |
Author |
Message |
qubex Tux's lil' helper
Joined: 06 Mar 2003 Posts: 133 Location: Portland
|
Posted: Thu Mar 13, 2003 6:43 pm Post subject: openmosix from the beginning... |
|
|
I just finished installing a network of 4 computers which are running Gentoo and the latest openmosix kernel (2 P4s, 2 PIIs). This is my first experience with openmosix and I can't believe I've never used it before. It works like a champ! Compile times of code have greatly decreased..
My question is this: is there a way from, say, stage 1 or 2, to enable openmosix? Perhaps a live CD with mutliple kernel images (including one with the latest OM kernel)? This would have vastly increased my compile times on all the PIIs, I think. Would this be possible to do?
Thanks! |
|
Back to top |
|
|
qubex Tux's lil' helper
Joined: 06 Mar 2003 Posts: 133 Location: Portland
|
Posted: Thu Mar 13, 2003 6:44 pm Post subject: openmosix from the beginning... |
|
|
I guess I should preview before I submit.. I meant to say this would have vastly DECREASED my compile times on the PIIs. Whoops! |
|
Back to top |
|
|
m1kee n00b
Joined: 17 Dec 2002 Posts: 9
|
Posted: Thu Mar 13, 2003 7:24 pm Post subject: |
|
|
I tried openmosix a while ago, which wasn't compiling. But the idea to have a openmosix kernel available on the install would be interesting Especially where one already has a 40+ machine cluster crunching SETI (yes, these things do exsist), it could decrease things quite a bit
And FYI, the Edit button is also such a lovely thing to use _________________ delirium tremens |
|
Back to top |
|
|
dennis Tux's lil' helper
Joined: 06 Jun 2002 Posts: 84
|
Posted: Fri Mar 14, 2003 12:30 am Post subject: |
|
|
hi,
I am planing to goet my computers running, too. What documentation did you use?
dennis |
|
Back to top |
|
|
qubex Tux's lil' helper
Joined: 06 Mar 2003 Posts: 133 Location: Portland
|
Posted: Fri Mar 14, 2003 5:57 pm Post subject: openMosix briefly... |
|
|
Believe it or not I did not use anything other than the FAQ and howto. It was so much easier than I thought - the process is as follows:
1. Make sure all machines are running the same kernel and same version of openmosix (made for that kernel). If not, recompile. Code: | #emerge openmosix-sources | and then compile as normal, making sure to include openmosix in the kernel config. A list of kernel opts is available at the openmosix homepage (url below).
2. Edit /etc/mosix.map and include all the computers running openosix kernels. Does not work well in dhcp environments unless clients are static or hostnames are set properly. This is the hardest part, and if you run across any problems READ THE HOWTO!!! (Don't take my word on this, I run dhcp and I just know I had to "hardcode" the IPs in this file, but as I'm running "dedicated IPs" off of dhcp it did not affect me.)
3. Install openMosixview - add openmosix to your init.d files and start it. Code: | #rc-update add openmosix default | and then Code: | #/etc/init.d/openmosix start |
4. Repeats steps 2 and 3 on all openMosix machines.
5. Run openmosixview - it should show all your machines and their "mosix up" status. Viola! You have a cluster.
This does not include the ability to modify the machines remotely via the openmosixview program (and a "passwordless" ssh login). This is also a very VERY basic "pool" type setup that may not be suitable for all purposes.
More information on this topic be found at http://www.openmosixview.com/ssh.html (on how to configure a key-authenticated passwordless ssh login) and for more information on configuring/running/whatever openMosix go to http://openmosix.sourceforge.net/.
Note - after bootstrapping and "emerge world" is complete, you can then install the openmosix kernel. Basically I was looking for a way to be able to have this activated on bootup of the stage or livecds.. This would allow the cluster to help with the bootstrapping process.. Not that it takes long mind you, but it should be possible.. Maybe I should "roll my own" livecd..
Just a disclaimer I've found - not all compiles "like" distributed processing. I had to disable the makeopt of -j in /etc/make.conf for xfree-base to install and also for kde-base. Everything else has compiled and run smoothly. Then again, I'm not using pmake..... I know, I know.... |
|
Back to top |
|
|
dol-sen Retired Dev
Joined: 30 Jun 2002 Posts: 2805 Location: Richmond, BC, Canada
|
Posted: Fri Mar 14, 2003 6:44 pm Post subject: |
|
|
I'm just starting to cluster 3 old pc's I have. BTW distcc is supposed to do compile's faster than open-mosixs on the same pc's.
Brian |
|
Back to top |
|
|
qubex Tux's lil' helper
Joined: 06 Mar 2003 Posts: 133 Location: Portland
|
Posted: Fri Mar 14, 2003 6:58 pm Post subject: |
|
|
Perhaps so, but what I've read of distcc it is not as "easy" to use as openmosix (and from what I heard it (distcc) is almost as quick as using pmake on OM). Plus it will not help with other processor intensive activities.. And if you have a process that bogs while you are compiling with distcc you will have to deal with it, while if you are running OM your process will migrate to a less busy node, and if you have a really empty node you will not notice any sort of sluggishness at all!
There are a lot of benefits to having a distributed computing network for general tasks..
Does anyone have any benchmarks or hard proof that distcc is faster/slower than pmake? What about numbers of distcc vs make? |
|
Back to top |
|
|
puddpunk l33t
Joined: 20 Jul 2002 Posts: 681 Location: New Zealand
|
Posted: Fri Mar 14, 2003 10:56 pm Post subject: |
|
|
I have access to a 64 node Beowulf.
Each node has dual Athlon 2100MP's and 512 Mb of ram. Currently it's running MPI which is old and outdated, but I'm in the process of convincing the admin to run openMosix and Gentoo!
Yipee!
The only problem is that AFAIK, openMosix doesn't have a batch queue system like MPI does (We use PBS, Portable Batch System). Any guru know of any queing system for openMosix? Because MPI is such a pain in the behind. |
|
Back to top |
|
|
qubex Tux's lil' helper
Joined: 06 Mar 2003 Posts: 133 Location: Portland
|
Posted: Fri Mar 14, 2003 11:14 pm Post subject: |
|
|
This is probably starting to get off topic, but here is a link that pretty well states that Quote: | openMosix and MPI are like bread and peanut butter, they just love each other. |
|
|
Back to top |
|
|
jimbo n00b
Joined: 17 Apr 2003 Posts: 31
|
Posted: Wed May 21, 2003 7:18 pm Post subject: Install manually |
|
|
I was able to get MOSIX 1.9 to run on Gentoo clustered to my Slackware box without using portage. Here's how I did it.
1.) Emerge the vanilla-sources kernel
2.) Download the MOSIX 1.9 patch from here: http://www.mosix.org/txt_distribution.html
3.) Download the MOSIX 1.9 user level Tools from that same site
4.) Patch your vanilla-sources with the MOSIX 1.9 patch
5.) Config the MOSIX option with
[x] MOSIX process migration
[x] MOSIX file system
6.) Compile and install your new kernel. Reboot.
7.) Install the MOSIX 1.9 user level Tools (do a manual install according to the README)
8.) Copy the mosix.init script to /etc/init.d/openmosix. Add openmosix to your default runlevel (#rc-update add openmosix default)
9.) Make your mfs dir. (#mkdir /mfs)
10.) Edit your /etc/fstab by adding:
mymfs /mfs mfs defaults 0 0
11.) Start MOSIX (#/etc/init.d/openmosix start)
12.) MOSIX will prompt you to create your mosix.map file. Add your nodes as follows:
1 192.168.1.1 1
2 192.168.1.2 1
etc.
13.) Start MOSIX again. You should see "Initializing MOSIX..."
14.) Install MOSIX on your other node(s) in the exact same way with the exact same kernel version and MOSIX patch/user level tools (otherwise, and I guarantee you, MOSIX will not work!)
15.) Reboot your machines. Pull-up a terminal and do #mon. You will see a bar-graph monitor of all machine nodes.
16.) Peruse /mfs/node-number to access the filesystems on your other nodes.
17.) Enjoy! _________________ Lian-Li PC-07
Athlon-XP 1900
512MB Crucial PC2100
Soltek SL-75FRV KT400
NVIDIA GF3 ti200
SBLive! |
|
Back to top |
|
|
bhar99328 n00b
Joined: 19 Jun 2003 Posts: 8
|
Posted: Thu Jun 19, 2003 1:17 pm Post subject: Re: openMosix briefly... |
|
|
I just installed openMosix, kernel 2.4.20-openmosix-r6, and I have been having problems where portage segfaults during an emerge, usually either in the install phase (after compiling) or when unmerging the old package. You poster earlier that:
qubex wrote: | not all compiles "like" distributed processing. I had to disable the makeopt of -j in /etc/make.conf for xfree-base to install and also for kde-base. |
However, Im experiencing this problem with pretty much all ebuilds, but the segfault seems to occur randomly. If I re-run the emerge after the segfault, it usually finishes without error the second time.
If I stop openmosix (/etc/init.d/openmosix stop), however, the problem disappears. It doesn't seem to matter if openmosix is actually migrating processes, since I shut down one machine of my two-machine cluster, but left openmosix running on the remaining machine, and I still got the segfaults.
the following info from my make.conf may be helpful:
CHOST="i686-pc-linux-gnu"
CFLAGS="-march=pentium3 -O3 -pipe"
CXXFLAGS="${CFLAGS}"
MAKEOPTS="-j2"
Oh, and both machines are P3s, so the compiler optimizations are identical.
Thanks in advance. _________________ -Bryce |
|
Back to top |
|
|
dol-sen Retired Dev
Joined: 30 Jun 2002 Posts: 2805 Location: Richmond, BC, Canada
|
Posted: Thu Jun 19, 2003 1:55 pm Post subject: |
|
|
Sounds like failing ram. Get memtest86 and let it run overnight if need be. It may need to be quite warm for it show up. I had a bad ram modules, one worked and tested fine when first started but began to fail 2-3 hrs. later. _________________ Brian
Porthole, the Portage GUI frontend irc@freenode: #gentoo-guis, #porthole, Blog
layman, gentoolkit, CoreBuilder, esearch... |
|
Back to top |
|
|
bhar99328 n00b
Joined: 19 Jun 2003 Posts: 8
|
Posted: Thu Jun 19, 2003 2:34 pm Post subject: ram problem? |
|
|
both of my machines experience the segfault issue, not just one. But I could run it tonight anyways just to be sure.
Im thinking though that Im messing something up in configuration of openmosix, my kernel, or my make.conf.
Oh, and BTW, Im using autodetection for openmosix, and mosctl saw both machines when I last had the cluster up. Since then, Ive installed openmosixview, but it is reporting memory usage incorrectly on the one machine that I left openmosix running on (Im "emerge -u world" -ing the other machine at the moment and don't want to mess with openmosix anymore on it until it finishes)
It just seems to me that if it was failing ram, then Id get the segfaults with and without openmosix, and on only one machine. _________________ -Bryce |
|
Back to top |
|
|
bhar99328 n00b
Joined: 19 Jun 2003 Posts: 8
|
Posted: Fri Jun 20, 2003 3:07 pm Post subject: openmosixtest... |
|
|
I installed and ran the openmosix test programs this morning, and if it means anything, the "eatmem" test segfaulted 3 times, and as far as I can tell the other tests ran ok.
I have noticed that some programs (like mozilla) close suddenly when openmosix is running.
Does anyone know if that test can fail for any reason other than hardware?
For both machines, openmosixview recognizes the correct amount of memory, but shows memory usages about 10-20% below actual memory usage. Any ideas? _________________ -Bryce |
|
Back to top |
|
|
_Max_ Apprentice
Joined: 03 Mar 2003 Posts: 264 Location: London, UK
|
Posted: Wed Jun 25, 2003 2:32 pm Post subject: |
|
|
I also have this problem, and there seem to be others, too:
https://forums.gentoo.org/viewtopic.php?t=55543
(this is in German)
Someone suggested that it is to do with the fact that OpenMosix is supposed to work with gcc 2.95...
I would be very interested to find out what causes this. |
|
Back to top |
|
|
bhar99328 n00b
Joined: 19 Jun 2003 Posts: 8
|
Posted: Wed Jun 25, 2003 4:17 pm Post subject: could it be libpthread? |
|
|
Ive read up a bit on openmosix forums, and it seems that there are currently problems with openmosix and libpthread on i686 machines. I guess we could wait for an updated python and try it then.
Is, by any chance, portage written in python? If so, that would explain why I can't emerge anything reliably when Im clustering... _________________ -Bryce |
|
Back to top |
|
|
_Max_ Apprentice
Joined: 03 Mar 2003 Posts: 264 Location: London, UK
|
Posted: Wed Jun 25, 2003 4:33 pm Post subject: |
|
|
From the Portage Manual: Quote: | The Portage system is a merge of a Python core with Bash script based Ebuilds. |
Hm.... |
|
Back to top |
|
|
bhar99328 n00b
Joined: 19 Jun 2003 Posts: 8
|
Posted: Wed Jun 25, 2003 6:03 pm Post subject: build python w/o threading? |
|
|
I saw somewhere that you can build python to not use threading... apparently you use ./configure --with-threads=no
Whats the easiest way to do this with ebuilds? Im really not familiar with the structure of the ebuilds (Im a newbie).
It may not fix the problem, but I think its worth a shot. _________________ -Bryce |
|
Back to top |
|
|
endgamer n00b
Joined: 10 Mar 2003 Posts: 59 Location: Columbus, Ohio
|
Posted: Fri Jul 04, 2003 3:28 pm Post subject: |
|
|
My experience with OpenMosix.
Some info on computers:
#1: P4mobile 2ghz laptop (pentium4 on kernel and make.conf).
#2: Celeron 333mhz desktop (celeron on kernel and pentium2 in make.conf).
I'm a big fan, so I tried it 3 different ways. The second two worked.
First I tried building it from source. Getting vanilla kernels from kernel.org, patching them, building them. But for some reason (probably due to my ignorance), when I tried to "./configure" the userland-tools, openmosix headers were not found. So I threw my hands in the air and emerged the openmosix-sources . And my most exciting moment was after I'd finished everything on the server and was waiting for emerge userland-tools to complete on the slow desktop, openMosix started migrating big parts of it to the laptop! So this was a good process for me, except for the problems I'll describe below.
The third way I've experimented with is using the clusterKnoppix cd, which is knoppix+openmosix (http://bofh.be/clusterknoppix/). That was a big success too.
My problems with the built kernel and everything are only on the server. My laptop runs beautifully, everything peacefully coexists with openMosix running. My server however, would start segfaulting when I tried to emerge anything.
Then, some bad things happened on the server, which I don't think have anything to do with openMosix (gcc dissapeared..., details at https://forums.gentoo.org/viewtopic.php?t=65030), and I'm where I am right now: I've booted off of the clusterKnoppix cd, formatted / partition of the server (/boot and /home are safe on seperate partitions), untarred the stage3 bz2 to /, chrooted, and am (trying to) emerge system.
For the first several builds, things flew by. ClusterKnoppix migrated the huger gcc commands and others (including tar -xjf stage3.bz2 !!!), and I went to sleep. Wake up, I have a Segmentation Fault waiting for me right after it finished downloading the "misc-tools" package...
I cannot find where portage might log what it did (IS there any such thing?!). But I can tell you that now, more than 3/4 the times, emerge [anything] will segfault. That was not the case when I started.
My idea: 'emerge system' built and started using something which cannot co-exist with openMosix running in the background. Simply retarded because my laptop runs fine!!
I've tried tweaking the make.conf on my server, removing and adding the makeopts="-j2", changing -march to -mcpu back and forth... nothing doing. Some evil entity is causing emerge to die. (python works: I can start the interpreter, calculate 888^8888, etc.) I could turn off openmosix and continue, but I'd rather figure out what was going on..
Edit: I discovered that i could ls -rt /var/tmp/portage to see what it had emerged or gotten to on the server. Here are the results:
Quote: | $ ls -rt /var/tmp/portage
zlib-1.1.4
sed-4.0.5
python-fchksum-1.6.1
debianutils-1.16.3
bash-2.05b-r3
portage-2.0.48-r1
portage-2.0.47-r3
m4-1.4
m4-1.4p
db-3.2.9-r2
db-3.2.9-r1
texinfo-4.5
texinfo-4.3-r1
groff-1.18.1-r2
groff-1.18.1-r1
man-1.5l-r6
man-1.5k-r1
perl-5.8.0-r10
perl-5.8.0-r9
patch-2.5.4-r5
patch-2.5.4-r4
binutils-2.13.90.0.18
binutils-2.13.90.0.16-r1
gcc-config-1.3.3-r1
gcc-config-1.3.1
gcc-3.2.2
gcc-3.2.1-r6
glibc-2.3.1-r4
glibc-2.3.1-r2
gawk-3.1.2-r3
gawk-3.1.1-r1
baselayout-1.8.6.8-r1
baselayout-1.8.5.8
modutils-2.4.25
modutils-2.4.22
nano-1.2.1
nano-1.0.9-r2
dhcpcd-1.3.22_p4
dhcpcd-1.3.22_p3-r3
iputils-020927
cpio-2.5
cpio-2.4.2-r4
help2man-1.29
diffutils-2.8.4-r3
diffutils-2.8.4-r1
e2fsprogs-1.33
e2fsprogs-1.32-r2
file-4.02
file-3.39
fileutils-4.1.11-r1
fileutils-4.1.11
findutils-4.1.7-r4
findutils-4.1.7-r1
miscfiles-1.3-r1
|
It was only on miscfiles-1.3-r1 that I had segfaulted.
Forgive me for my garrulous post, but I wanted to be as complete as possible. Hope the above helps, and any ideas would be greatly appreciated . |
|
Back to top |
|
|
trjones4 n00b
Joined: 27 Feb 2003 Posts: 28 Location: Somerville, MA
|
Posted: Tue Jul 08, 2003 9:20 pm Post subject: Another OpenMosix Experience ... so far |
|
|
Well endgamer, I dont't have an answer to you problem, but I've had a similar OpenMosix w/ LTSP experience ... so here goes ...
First, the hardware list (which we all love):
Server:
Dell Poweredge 4400
2 x PIII Xeon 1GHz
1 GB Ram
Adaptec PERC 3/Di RAID controller
Broadcom Gigabit LAN
2 x Intel Pro 100 LAN's
2 x 36 GB Seagate SCSI HD's in RAID 0 (mirror) config
6 x 36 GB Seagate SCSI HD's in RAID 5 config
-> All RAID's recognized great by the AACRAID kernel driver BTW
1 Adaptec Fireconnect PCI Firewire card
2 Firewire 180GB external HDs (used only as backups for the RAID 5 array)
Kernel: (was) 2.4.20-openmosix-r6
Nodes:
2 x completely diskless nodes (except floppy of course) dual PIV Xeon 2.6GHz on Supermicro mobo's with integrate Gigabit Lan.
2 GB Ram on each node
1 x 3C905 NIC (since the onboards don't quite work yet with Etherboot)
Kernel: 2.4.20-openmosix-r6 (etherbooted with LTSP from the server)
The Experience
I started on this project for my company to create a diskless cluster system which would allow any of our Windoze office machines to be booted into the cluster @ night to crunch on Computational Fluid Dynamics problems (I'm an aerospace engineering geek, so please forgive my bad spelling ).
As a total n00b, my first Gentoo experience was to create a Samba server for the Windoze machines and serve as a primary domain controller. That all went well (considering my general lack of experience) and the server worked liked a charm.
Next, I wanted to make the server the host for DISKLESS OpenMosix nodes and create our own personal cluster. To do this I wanted to use LTSP and use an OpenMosix patched kernel. Folllowing the general guidlines in the LTSP and OpenMosix docs, we got the system etherbooting thin clients into the cluster in about a week (fantastic!).
Everthing seemed to be going fairly well ... we ran some jobs, etc, but when time came to do some server maintenance emerge kept seg faulting. I got on the forums and found the info about probable issues with OM and python ... sure enough, if I turned off the omdiscd daemon emerge worked just fine ... there we also some of the other phantom issues of applications like mozilla closing randomly as well as gnome-terminals.
Currently I've moved the LTSP/OpenMosix server to one of the nodes (which I added a HD and CDROM to) instead of the main company server (sketchy behavior was not well recieved by my coworkers ).
All told, I've been impressed with the LTSP/OpenMosix combination - but I'm hoping some of the apparent issues can be resolved down the road. I'd like to (eventually) put our company server back onto the cluster (right now it's only running the main node and one diskless client) since it has lot's of CPU time to spare.
My coworker and I are working on some docs to cover what we did and detail all the sutff I left out ... I'll post a link if anyone is interested in proofing and giving opinions.
Ok .. it's time to get outta here ... later. _________________ ------------------------------------------
Troy B. Jones
troy (dot) b (dot) jones (at) gmail.com |
|
Back to top |
|
|
bashir Tux's lil' helper
Joined: 23 May 2003 Posts: 107 Location: EU (Ger)
|
Posted: Thu Jul 17, 2003 2:37 pm Post subject: |
|
|
Quote: | My coworker and I are working on some docs to cover what we did and detail all the sutff I left out ... I'll post a link if anyone is interested in proofing and giving opinions. |
I am interested in !
bashir |
|
Back to top |
|
|
gzhang27 n00b
Joined: 16 Jul 2003 Posts: 3
|
Posted: Thu Jul 17, 2003 2:48 pm Post subject: Gentoo+LTSP+openMosix |
|
|
Sirs, Madams, and Otherwise,
The Gentoo + LTSP + openMosix documentation is complete. You can access it at www.techsburg.com/gentoo/docs/ltspdoc.html. Please feel free to offer comments, questions, notes about bugs / typos / etc. as you see fit.
Thanks
Yingzhi Zhang |
|
Back to top |
|
|
sblainey n00b
Joined: 23 Jul 2003 Posts: 5
|
Posted: Wed Jul 23, 2003 6:51 am Post subject: recompile python |
|
|
I am receiving the same errors - segfault on 3/4 emerge's. I have done some research and it seems to be caused by i686 optimisations in libpthread. On other distros you can remove the optimised libpthread and it will default to the normal one which should work okay, however on gentoo we only have the optimised library. So, I recompiled python without threads...just edited the ebuild. I am going to try this out tonight, will post my results here. |
|
Back to top |
|
|
trjones4 n00b
Joined: 27 Feb 2003 Posts: 28 Location: Somerville, MA
|
Posted: Fri Jul 25, 2003 7:32 pm Post subject: |
|
|
And now for something completely similar ... well, sorta ...
sblainey, I'm interested to see how your fix for the seg faults turned out ... we still have the same issue... I tried editing the e-build and commented the "--with-threads" option, but then when I re-compile python it (of course) randomly seg-faults ... anyway, I could turn off omdiscd, or boot from the cd, etc, but right now people are still running lot's of jobs ...
In other openmosix news, I'm trying compile the masked linux-2.4.21-openmosix kernel and keep getting this error ...
Code: | ...
rivers/char/agp/agp.o drivers/char/drm/drm.o drivers/pci/driver.o drivers/pnp/pnp.o drivers/video/video.o drivers/media/media.o \
net/network.o \
/usr/src/linux-2.4.21-openmosix/arch/i386/lib/lib.a /usr/src/linux-2.4.21-openmosix/lib/lib.a /usr/src/linux-2.4.21-openmosix/arch/i386/lib/lib.a \
--end-group \
-o vmlinux
/usr/src/linux-2.4.21-openmosix/hpc/hpc.o(.text+0x1fa37): In function `get_remote_file':
: undefined reference to `bad_super_block'
make: *** [vmlinux] Error 1 |
I realized I'm playing with fire here ... but I really want my gigabit nic to work _________________ ------------------------------------------
Troy B. Jones
troy (dot) b (dot) jones (at) gmail.com |
|
Back to top |
|
|
sblainey n00b
Joined: 23 Jul 2003 Posts: 5
|
Posted: Sun Jul 27, 2003 8:28 am Post subject: python without threads doesn't work, but 2.4.21 looks good |
|
|
Python compiled withough threads still causes segfaults left right and centre. But 2.4.21 looks very promising, I've been running it for a couple of house and it hasn't had any problems yet *fingers-crossed*.
trjones4 - To make 2.4.21 compile you have to enable the openmosix file system. Found this in the openmosix mailing list archives. |
|
Back to top |
|
|
|