View previous topic :: View next topic |
Author |
Message |
alexice n00b
Joined: 08 Nov 2004 Posts: 30 Location: Vancouver, Canada
|
Posted: Thu Oct 04, 2007 12:14 am Post subject: [SOLVED] Can't run programs on diskless nodes |
|
|
Hello everybody,
I hope I did not miss any posts about this, and if, sorry for reposting the issue. I have a diskless cluster up and running since quite some time. Everything works nice if I start jobs from the master and the slaves are running smooth, MPI stuff and such things, no problem.
Now I ran into the problem, that if I want to start an application on a node when I log onto the node via ssh, than I always get:
Code: |
-bash: ./gmsh: No such file or directory
|
here the app I want to run is called gmsh and I am in the directory.
Now I think I have set up my diskless NFS export quite right, but to be sure here the configs:
/etc/exports
Code: |
# /etc/exports: NFS file systems being exported. See exports(5).
## node 1
# one line like this for each slave
/diskless/192.168.1.11 192.168.1.11(sync,rw,no_root_squash,no_all_squash)
/diskless/192.168.1.12 192.168.1.12(sync,rw,no_root_squash,no_all_squash)
/diskless/192.168.1.13 192.168.1.13(sync,rw,no_root_squash,no_all_squash)
/diskless/192.168.1.14 192.168.1.14(sync,rw,no_root_squash,no_all_squash)
# common to all slaves
/opt 192.168.1.0/24(sync,ro,no_root_squash,no_all_squash)
/usr 192.168.1.0/24(sync,rw,no_root_squash,no_all_squash)
/home 192.168.1.0/24(sync,rw,no_root_squash,no_all_squash)
/var/log 192.168.1.11(sync,rw,no_root_squash,no_all_squash)
/var/log 192.168.1.12(sync,rw,no_root_squash,no_all_squash)
/var/log 192.168.1.13(sync,rw,no_root_squash,no_all_squash)
/var/log 192.168.1.14(sync,rw,no_root_squash,no_all_squash)
|
and to see how node2 (192.168.1.12) is mounting the shared directories, the fstab from node 2:
Code: |
192.168.1.10:/diskless/192.168.1.12 / nfs sync,hard,intr,rw,rsize=8192,wsize=8192 0 0
192.168.1.10:/opt /opt nfs sync,hard,intr,ro,rsize=8192,wsize=8192 0 0
192.168.1.10:/usr /usr nfs sync,hard,intr,nolock,rw,rsize=8192,wsize=8192 0 0
192.168.1.10:/home /home nfs sync,hard,intr,rw,rsize=8192,wsize=8192 0 0
# NOTE: The next line is critical for boot!
none /proc proc defaults 0 0
# glibc 2.2 and above expects tmpfs to be mounted at /dev/shm for
# POSIX shared memory (shm_open, shm_unlink).
# (tmpfs is a dynamically expandable/shrinkable ramdisk, and will
# use almost no memory if not populated with files)
# Adding the following line to /etc/fstab should take care of this:
none /dev/shm tmpfs defaults 0 0
192.168.1.10:/var/log /var/log nfs hard,intr,rw 0 0
|
All the other nodes do the same thing.
I am pretty sure I miss something, but so far I can't remember what and also I did not find alot on the forum. I anybody has had this problem and has a solution, would be great to get it posted here.
Any suggestions or comments are very much appreciated!
Cheers,
alexice _________________ if you have to ask:
"why Linux?"
you will not understand the answer
Last edited by alexice on Thu Oct 04, 2007 5:49 pm; edited 1 time in total |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9679 Location: almost Mile High in the USA
|
Posted: Thu Oct 04, 2007 12:52 am Post subject: |
|
|
Usually those weird "No such file or directory" problems are due to your dynamic library missing (ld.so, ld-linux.so) - did you compile the program properly for the machine you're running it on?
Just a guess... _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
alexice n00b
Joined: 08 Nov 2004 Posts: 30 Location: Vancouver, Canada
|
Posted: Thu Oct 04, 2007 1:07 am Post subject: |
|
|
Well, thing is the programs which are not working are actually not compiled on the cluster. It's Matlab, which is closed source and also gmsh, which came precompiled.
Is there a way I can do something without compiling the apps again?
Thanks for the hint.
alexice _________________ if you have to ask:
"why Linux?"
you will not understand the answer |
|
Back to top |
|
|
alexice n00b
Joined: 08 Nov 2004 Posts: 30 Location: Vancouver, Canada
|
Posted: Thu Oct 04, 2007 5:48 pm Post subject: |
|
|
thanks eccerr0r for the hint about the libraries, that was the problem. Since the software (matlab) which I was running on the cluster needed some 32 bit libs, I just had to share those as well with the cluster nodes. Interestingly matlab also need the /tmp file to be shared to work, I did not look into that, but if one shares the /tmp as well, things work.
Thanks again
alexice _________________ if you have to ask:
"why Linux?"
you will not understand the answer |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|