Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[SOLVED] Can't run programs on diskless nodes
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Networking & Security
View previous topic :: View next topic  
Author Message
alexice
n00b
n00b


Joined: 08 Nov 2004
Posts: 30
Location: Vancouver, Canada

PostPosted: Thu Oct 04, 2007 12:14 am    Post subject: [SOLVED] Can't run programs on diskless nodes Reply with quote

Hello everybody,

I hope I did not miss any posts about this, and if, sorry for reposting the issue. I have a diskless cluster up and running since quite some time. Everything works nice if I start jobs from the master and the slaves are running smooth, MPI stuff and such things, no problem.

Now I ran into the problem, that if I want to start an application on a node when I log onto the node via ssh, than I always get:
Code:

 -bash: ./gmsh: No such file or directory


here the app I want to run is called gmsh and I am in the directory.

Now I think I have set up my diskless NFS export quite right, but to be sure here the configs:

/etc/exports
Code:

# /etc/exports: NFS file systems being exported.  See exports(5).

## node 1
# one line like this for each slave
/diskless/192.168.1.11   192.168.1.11(sync,rw,no_root_squash,no_all_squash)
/diskless/192.168.1.12   192.168.1.12(sync,rw,no_root_squash,no_all_squash)
/diskless/192.168.1.13   192.168.1.13(sync,rw,no_root_squash,no_all_squash)
/diskless/192.168.1.14   192.168.1.14(sync,rw,no_root_squash,no_all_squash)


# common to all slaves
/opt   192.168.1.0/24(sync,ro,no_root_squash,no_all_squash)
/usr   192.168.1.0/24(sync,rw,no_root_squash,no_all_squash)
/home  192.168.1.0/24(sync,rw,no_root_squash,no_all_squash)

/var/log   192.168.1.11(sync,rw,no_root_squash,no_all_squash)
/var/log   192.168.1.12(sync,rw,no_root_squash,no_all_squash)
/var/log   192.168.1.13(sync,rw,no_root_squash,no_all_squash)
/var/log   192.168.1.14(sync,rw,no_root_squash,no_all_squash)


and to see how node2 (192.168.1.12) is mounting the shared directories, the fstab from node 2:

Code:

192.168.1.10:/diskless/192.168.1.12 / nfs sync,hard,intr,rw,rsize=8192,wsize=8192 0 0
192.168.1.10:/opt /opt nfs sync,hard,intr,ro,rsize=8192,wsize=8192 0 0
192.168.1.10:/usr /usr nfs sync,hard,intr,nolock,rw,rsize=8192,wsize=8192 0 0
192.168.1.10:/home /home nfs sync,hard,intr,rw,rsize=8192,wsize=8192 0 0

# NOTE: The next line is critical for boot!
none /proc proc defaults 0 0

# glibc 2.2 and above expects tmpfs to be mounted at /dev/shm for
# POSIX shared memory (shm_open, shm_unlink).
# (tmpfs is a dynamically expandable/shrinkable ramdisk, and will
# use almost no memory if not populated with files)
# Adding the following line to /etc/fstab should take care of this:

none /dev/shm tmpfs defaults 0 0

192.168.1.10:/var/log /var/log nfs hard,intr,rw 0 0


All the other nodes do the same thing.
I am pretty sure I miss something, but so far I can't remember what and also I did not find alot on the forum. I anybody has had this problem and has a solution, would be great to get it posted here.

Any suggestions or comments are very much appreciated! :)

Cheers,
alexice
_________________
if you have to ask:
"why Linux?"

you will not understand the answer


Last edited by alexice on Thu Oct 04, 2007 5:49 pm; edited 1 time in total
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9679
Location: almost Mile High in the USA

PostPosted: Thu Oct 04, 2007 12:52 am    Post subject: Reply with quote

Usually those weird "No such file or directory" problems are due to your dynamic library missing (ld.so, ld-linux.so) - did you compile the program properly for the machine you're running it on?

Just a guess...
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
alexice
n00b
n00b


Joined: 08 Nov 2004
Posts: 30
Location: Vancouver, Canada

PostPosted: Thu Oct 04, 2007 1:07 am    Post subject: Reply with quote

Well, thing is the programs which are not working are actually not compiled on the cluster. It's Matlab, which is closed source and also gmsh, which came precompiled.

Is there a way I can do something without compiling the apps again?

Thanks for the hint.

alexice
_________________
if you have to ask:
"why Linux?"

you will not understand the answer
Back to top
View user's profile Send private message
alexice
n00b
n00b


Joined: 08 Nov 2004
Posts: 30
Location: Vancouver, Canada

PostPosted: Thu Oct 04, 2007 5:48 pm    Post subject: Reply with quote

thanks eccerr0r for the hint about the libraries, that was the problem. Since the software (matlab) which I was running on the cluster needed some 32 bit libs, I just had to share those as well with the cluster nodes. Interestingly matlab also need the /tmp file to be shared to work, I did not look into that, but if one shares the /tmp as well, things work.

Thanks again :D

alexice
_________________
if you have to ask:
"why Linux?"

you will not understand the answer
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Networking & Security All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum