Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
How do I use a cluster nicely?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo
View previous topic :: View next topic  
Author Message
odeSolver
Tux's lil' helper
Tux's lil' helper


Joined: 11 Jul 2010
Posts: 84
Location: NJ, USA

PostPosted: Fri Jul 26, 2013 6:06 pm    Post subject: How do I use a cluster nicely? Reply with quote

I have overloaded the math department's cluster and have been asked not to do that anymore. The cluster is running Gentoo Linux and I sign into the system using a character terminal (PuTTy and Cygwin Terminal). Since I don't have a graphical interface (that some other students have), I cannot get to some web page that they use to determine which nodes are being used. The system does not have any cluster queuing software installed (and won't any time soon).

I have two questions, which really both involve "How do I use this cluster nicely?". How do I use the nice command and how can I find out which nodes are not being used?

First, I have been told I can use the nice command to make sure my long-running, processor-hogging jobs give way to other users and don't crash them. However, the man nice (and nice --help) does not tell me if the nice command issued to a bash script will be applied to the commands within it, nor does it indicate what will happen if one of those commands starts an MPI program which runs on several other nodes.

In other words, I actually run my processes using an executable bash script called submitJOB which submits many jobs, sort of like this:
Code:
for ((parm1=0; parm1<=4; parm1++))
do
  for ((parm2=0; parm2<=30; parm2=parm2+6))
  do
    echo mpirun -np <num_processors> ...  # This line submits a job to run on many processors
         mpirun -np <num_processors> ...  # This line submits a job to run on many processors
  done
done

I (used to) run my job like this:
Quote:
>submitJOB


Should I now run it as
Quote:
nice -n 15 submitJOB
or do I modify the script file so that the command inside the loop reads
Quote:
nice -n 15 mpirun ...
(or both)?

Second, I need to find out which nodes are being used. I have been told about the top command, but it has an interactive output (and it's man page dosn't indicate a way to redirect that). That means I have to rsh to each node, and run the top command. We do have a bash script called rcom (and rcom-nodes) which will rsh to every node and issue a command. I am seeking a command line command which will tell me who the biggest users, or big processes, on a node are and how much processing and memory they've used up, and give me text output, so that I can use that command as rcom <that_command>, and thus get a quick read on which processors are in heavy use.
_________________
Depserately needs help learning Gentoo Linux in order to use a 32-node cluster for my master's thesis in mathematics.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9645
Location: almost Mile High in the USA

PostPosted: Fri Jul 26, 2013 6:41 pm    Post subject: Reply with quote

On a distributed computing cluster, there should be a master node that distributes jobs to nodes in a cluster. But you mention you don't have a computing cluster software system installed, so this means you are scheduling jobs by rsh'ing.

ps(1) will give you most of what you want in terms of memory utilization. You also should refer to free(1).

And nice(1) will reduce the priority to all child processes to the nice command. But if it hops to another machine it will lose that property - but you said you didn't have a queuing system so that means your program will rsh to another machine??? (bad practice IMHO without a centralized queuing system) which means you once again will have to nice it again when it hops to another machine...

If a lot of people are doing stuff like this, investment into a queuing system would be highly suggested...
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
odeSolver
Tux's lil' helper
Tux's lil' helper


Joined: 11 Jul 2010
Posts: 84
Location: NJ, USA

PostPosted: Fri Jul 26, 2013 7:05 pm    Post subject: Reply with quote

eccerr0r wrote:
On a distributed computing cluster, there should be a master node that distributes jobs to nodes in a cluster. But you mention you don't have a computing cluster software system installed, so this means you are scheduling jobs by rsh'ing.

ps(1) will give you most of what you want in terms of memory utilization. You also should refer to free(1).

And nice(1) will reduce the priority to all child processes to the nice command. But if it hops to another machine it will lose that property - but you said you didn't have a queuing system so that means your program will rsh to another machine??? (bad practice IMHO without a centralized queuing system) which means you once again will have to nice it again when it hops to another machine...

If a lot of people are doing stuff like this, investment into a queuing system would be highly suggested...

Are we using the same OS? The command ps(1) returns "-bash: syntax error near unexpected token `1'". I am misunderstanding your notation.

I am not manually scheduling jobs by rsh'ing. MPI may be doing that automatically in the background. Did you mean that I am manually scheduling jobs using rsh? Did I forget to mention that I'ms using MPI? Specifically, I'm using MPICH Version: 1.2.7.
_________________
Depserately needs help learning Gentoo Linux in order to use a 32-node cluster for my master's thesis in mathematics.
Back to top
View user's profile Send private message
vaxbrat
l33t
l33t


Joined: 05 Oct 2005
Posts: 731
Location: DC Burbs

PostPosted: Sat Jul 27, 2013 1:47 am    Post subject: Did you know that cygwin has an xorg server? Reply with quote

cygwin can install an xorg server and give you an xterm into the cluster or let you do "x -query" if the cluster members are providing xdmcp logins. It and a number of packages are not installed by default when you run the cygwin setup.exe though. It pains me to see people still shelling out big bucks at corporate for Exceed when they can just install a cygwin server for free.

Other stupid tricks that people don't know it can do are nfs server, tftp server and QT4 and PyQT4. However I hate the way that RedHat bastardizes the packaging of the latter two. Someone needs to get them to cut the crap and finally drop qt3 in favor of support for qt4 while they are at it so they don't do all of the crappy command renaming that they do.
Back to top
View user's profile Send private message
GabrielYYZ
n00b
n00b


Joined: 03 May 2012
Posts: 24
Location: Dominican Republic

PostPosted: Sat Jul 27, 2013 5:16 am    Post subject: Reply with quote

odeSolver wrote:
eccerr0r wrote:
On a distributed computing cluster, there should be a master node that distributes jobs to nodes in a cluster. But you mention you don't have a computing cluster software system installed, so this means you are scheduling jobs by rsh'ing.

ps(1) will give you most of what you want in terms of memory utilization. You also should refer to free(1).

And nice(1) will reduce the priority to all child processes to the nice command. But if it hops to another machine it will lose that property - but you said you didn't have a queuing system so that means your program will rsh to another machine??? (bad practice IMHO without a centralized queuing system) which means you once again will have to nice it again when it hops to another machine...

If a lot of people are doing stuff like this, investment into a queuing system would be highly suggested...

Are we using the same OS? The command ps(1) returns "-bash: syntax error near unexpected token `1'". I am misunderstanding your notation.

I am not manually scheduling jobs by rsh'ing. MPI may be doing that automatically in the background. Did you mean that I am manually scheduling jobs using rsh? Did I forget to mention that I'ms using MPI? Specifically, I'm using MPICH Version: 1.2.7.


ps(1), free(1) and nice(1) is how they appear in the man pages, the (number) is the section they appear in (general commands in this case). You're supposed to use ps, free and nice. :)

https://en.wikipedia.org/wiki/Man_page
Back to top
View user's profile Send private message
toralf
Developer
Developer


Joined: 01 Feb 2004
Posts: 3920
Location: Hamburg

PostPosted: Sat Jul 27, 2013 1:35 pm    Post subject: Re: How do I use a cluster nicely? Reply with quote

odeSolver wrote:
I sign into the system using a character terminal (PuTTy and Cygwin Terminal).
This
Code:
ssh -Y user@system
should give you a forwarded X11 port. Furtermore with
Code:
  -L rport:localhost:lport
you can forward every port to your system.
Back to top
View user's profile Send private message
odeSolver
Tux's lil' helper
Tux's lil' helper


Joined: 11 Jul 2010
Posts: 84
Location: NJ, USA

PostPosted: Sat Jul 27, 2013 4:01 pm    Post subject: Reply with quote

GabrielYYZ wrote:
ps(1), free(1) and nice(1) is how they appear in the man pages, the (number) is the section they appear in (general commands in this case). You're supposed to use ps, free and nice. :)

https://en.wikipedia.org/wiki/Man_page


OK (and duh!) :D. Thanks. IOW
Code:
man 1 ps
is how I learn about the ps command.

I think if I can figure out the right ps command (and understand it's output), I will be on track.
_________________
Depserately needs help learning Gentoo Linux in order to use a 32-node cluster for my master's thesis in mathematics.


Last edited by odeSolver on Sat Jul 27, 2013 4:06 pm; edited 1 time in total
Back to top
View user's profile Send private message
odeSolver
Tux's lil' helper
Tux's lil' helper


Joined: 11 Jul 2010
Posts: 84
Location: NJ, USA

PostPosted: Sat Jul 27, 2013 4:04 pm    Post subject: Re: How do I use a cluster nicely? Reply with quote

vaxbrat wrote:
cygwin can install an xorg server and give you an xterm into the cluster or let you do "x -query" if the cluster members are providing xdmcp logins. It and a number of packages are not installed by default when you run the cygwin setup.exe though. It pains me to see people still shelling out big bucks at corporate for Exceed when they can just install a cygwin server for free.


Thanks. OK, I installed the xorg server - at least I think - but I don't know where to go from here. What is an xterm, how do you do an x -query, and how does that help me?


toralf wrote:
This
Code:
ssh -Y user@system
should give you a forwarded X11 port. Furtermore with
Code:
  -L rport:localhost:lport
you can forward every port to your system.


Thanks, but I'm not sure what can I do with an X11 port? I have installed - I think - Cygwin/X and started the terminal. I tried
Code:
ssh -Y user@system
, but it just signed me into the same system just like before. I also tried
Code:
ssh -X user@system
. How does that help me?

What does forwarding everything to my port mean/do?
_________________
Depserately needs help learning Gentoo Linux in order to use a 32-node cluster for my master's thesis in mathematics.
Back to top
View user's profile Send private message
vaxbrat
l33t
l33t


Joined: 05 Oct 2005
Posts: 731
Location: DC Burbs

PostPosted: Sat Jul 27, 2013 7:19 pm    Post subject: ssh -X Reply with quote

ssh -X and ssh -Y set your xorg display back to your local desktop when you log into the cluster. Any application you run on the remote node will have its graphics windows go back to display on your desktop. That should allow you to use the web tools that the others were using to keep track of the cluster.

If you want to have the full desktop login experience of the remote node, that's where xdmcp comes in. However the remote node must have its login manager set up to enable remote xdmcp logins before this can happen. Check with your sysadmins.

Assuming that's enabled. Don't start the local cygwin X server. Instead bring up a standard cygwin terminal shell and do

Code:
X -query system


where system is the remote system's node name or ip address. That will start a cygwinx server but it will use the remote system for session management. You should then see the remote system's login window just as it you had sat down at its local console.
Back to top
View user's profile Send private message
odeSolver
Tux's lil' helper
Tux's lil' helper


Joined: 11 Jul 2010
Posts: 84
Location: NJ, USA

PostPosted: Sat Jul 27, 2013 8:43 pm    Post subject: Re: ssh -X Reply with quote

vaxbrat wrote:
ssh -X and ssh -Y set your xorg display back to your local desktop when you log into the cluster. Any application you run on the remote node will have its graphics windows go back to display on your desktop. That should allow you to use the web tools that the others were using to keep track of the cluster.

If you want to have the full desktop login experience of the remote node, that's where xdmcp comes in. However the remote node must have its login manager set up to enable remote xdmcp logins before this can happen. Check with your sysadmins.

Assuming that's enabled. Don't start the local cygwin X server. Instead bring up a standard cygwin terminal shell and do

Code:
X -query system


where system is the remote system's node name or ip address. That will start a cygwinx server but it will use the remote system for session management. You should then see the remote system's login window just as it you had sat down at its local console.


It seems we're on two different tracks here, and there is more about my setup that I didn't mention. I'm not actually logging directly into the cluster in question - I have to first sign a passthrough linux machine, then from there ssh into the cluster.

I used the X command you gave me, it opened a new window, but I never got the logon prompt in the new window. I suspect the passthrough does not have xdmcp enabled.

I also notice a new Cygwin-X program group in my start menu - but there are no programs in it. I appreciate the help you've given so far. Got any more for me?
_________________
Depserately needs help learning Gentoo Linux in order to use a 32-node cluster for my master's thesis in mathematics.
Back to top
View user's profile Send private message
vaxbrat
l33t
l33t


Joined: 05 Oct 2005
Posts: 731
Location: DC Burbs

PostPosted: Sun Jul 28, 2013 12:25 am    Post subject: X passthru Reply with quote

If you just got a blank screen from the X -query, then that proxy linux box probably doesn't have xdmcp enabled. So we're back to the local X session and xterm.

When you start your local cygwin x server, you should get an xterm window popping up with a cygwin bash shell. Do an

Code:
ipconfig /all


and note the ip address for your local desktop (eg 192.168.1.15). An insecure but easy way to allow others to open X windows on your local desktop is to disable access control with

Code:
xhost +


Then ssh to get into the proxy system or whever you eventually log in. On that final system you are going to do

Code:
export DISPLAY=192.168.1.15:0


You should now be able to get remote windows to pop up on your desktop from the linux applications. If the ssh -X and ssh -Y stuff weren't allowing windows to come back to your desktop in the first place, it's probably because your sysadmin never enabled X forwarding in the remote linux box's /etc/ssh/sshd_config file.
Back to top
View user's profile Send private message
odeSolver
Tux's lil' helper
Tux's lil' helper


Joined: 11 Jul 2010
Posts: 84
Location: NJ, USA

PostPosted: Sun Jul 28, 2013 12:48 am    Post subject: Re: X passthru Reply with quote

vaxbrat wrote:
If you just got a blank screen from the X -query, then that proxy linux box probably doesn't have xdmcp enabled. So we're back to the local X session and xterm.

When you start your local cygwin x server, you should get an xterm window popping up with a cygwin bash shell. Do an

Code:
ipconfig /all


and note the ip address for your local desktop (eg 192.168.1.15). An insecure but easy way to allow others to open X windows on your local desktop is to disable access control with

Code:
xhost +


Then ssh to get into the proxy system or whever you eventually log in. On that final system you are going to do

Code:
export DISPLAY=192.168.1.15:0


You should now be able to get remote windows to pop up on your desktop from the linux applications. If the ssh -X and ssh -Y stuff weren't allowing windows to come back to your desktop in the first place, it's probably because your sysadmin never enabled X forwarding in the remote linux box's /etc/ssh/sshd_config file.


Thanks. I probably won't be able to try this until at least Monday. But I'm pretty sure X forwarding is enabled, because other students are doing this or something similar.
_________________
Depserately needs help learning Gentoo Linux in order to use a 32-node cluster for my master's thesis in mathematics.
Back to top
View user's profile Send private message
unitstep
n00b
n00b


Joined: 17 Oct 2012
Posts: 9

PostPosted: Tue Jul 30, 2013 1:32 pm    Post subject: Reply with quote

Doesn't the department provide any guidelines/documentation on how to use the clusters responsively and how to use them in general?

I don't really see why you "Depserately needs help learning Gentoo Linux in order to use a 32-node cluster for my master's thesis in mathematics."
There shouldn't be any gentoo specific things that you have to learn if you are not a system admin/maintainer. There would rather be linux things in general.


vaxbrat wrote:

An insecure but easy way to allow others to open X windows on your local desktop is to disable access control with
Code:
xhost +


Insecure indeed, basically means anyone who is in your network can record your keystrokes, screen, everything..

Atleast use
Code:
xhost +login.node.hostname

But this wouldn't be very secure either since
1) It won't be encrypted or anything.
2) Any user on the login node could export to your X-server
Back to top
View user's profile Send private message
odeSolver
Tux's lil' helper
Tux's lil' helper


Joined: 11 Jul 2010
Posts: 84
Location: NJ, USA

PostPosted: Tue Jul 30, 2013 2:08 pm    Post subject: Reply with quote

unitstep wrote:
Doesn't the department provide any guidelines/documentation on how to use the clusters responsively and how to use them in general?

I don't really see why you "Depserately needs help learning Gentoo Linux in order to use a 32-node cluster for my master's thesis in mathematics."
There shouldn't be any gentoo specific things that you have to learn if you are not a system admin/maintainer. There would rather be linux things in general.


vaxbrat wrote:

An insecure but easy way to allow others to open X windows on your local desktop is to disable access control with
Code:
xhost +


Insecure indeed, basically means anyone who is in your network can record your keystrokes, screen, everything..

Atleast use
Code:
xhost +login.node.hostname

But this wouldn't be very secure either since
1) It won't be encrypted or anything.
2) Any user on the login node could export to your X-server

You're right, I don't need anything Gentoo specific. But this is a Gentoo forum, and one of the most helpful forums around, so I ask here (and I didn't think anyone really read the footers). The server is brand new and there are no guidelines and no MPI queue setup. The administrator is my adviser, a math professor, who knows more Linux than I do - but not much more. We are figuring these things out together.
_________________
Depserately needs help learning Gentoo Linux in order to use a 32-node cluster for my master's thesis in mathematics.
Back to top
View user's profile Send private message
unitstep
n00b
n00b


Joined: 17 Oct 2012
Posts: 9

PostPosted: Tue Jul 30, 2013 2:43 pm    Post subject: Reply with quote

Possibly, you could do something like:
Code:
mpirun -np .... ./nicempiscript.sh ...

and then put in nicempiscript.sh:
Code:
#!/bin/bash
nice -n 15 ./yourmpiexecutable $@
Back to top
View user's profile Send private message
erikm
l33t
l33t


Joined: 08 Feb 2005
Posts: 634

PostPosted: Wed Jul 31, 2013 11:37 am    Post subject: Reply with quote

My two cents:

I've built Beowulf clusters on Gentoo, and am currently running a multi-core workstation as a mini compute cluster. Having a large server where a number of users run whatever they want from the commandline, as you describe your situation, is begging for problems that no amount of nice'ing and shell script trickery will solve.

Tell your advisor to look into the Torque package (in Portage as sys-cluster/torque). It is a Resource manager based on the old PBS system. It works beautifully under Gentoo, and will allow your system admins to precisely allocate resources where needed, and avoid the kinds of problems you describe.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum