View previous topic :: View next topic |
Author |
Message |
odeSolver Tux's lil' helper
Joined: 11 Jul 2010 Posts: 84 Location: NJ, USA
|
Posted: Fri Jul 26, 2013 6:06 pm Post subject: How do I use a cluster nicely? |
|
|
I have overloaded the math department's cluster and have been asked not to do that anymore. The cluster is running Gentoo Linux and I sign into the system using a character terminal (PuTTy and Cygwin Terminal). Since I don't have a graphical interface (that some other students have), I cannot get to some web page that they use to determine which nodes are being used. The system does not have any cluster queuing software installed (and won't any time soon).
I have two questions, which really both involve "How do I use this cluster nicely?". How do I use the nice command and how can I find out which nodes are not being used?
First, I have been told I can use the nice command to make sure my long-running, processor-hogging jobs give way to other users and don't crash them. However, the man nice (and nice --help) does not tell me if the nice command issued to a bash script will be applied to the commands within it, nor does it indicate what will happen if one of those commands starts an MPI program which runs on several other nodes.
In other words, I actually run my processes using an executable bash script called submitJOB which submits many jobs, sort of like this:
Code: | for ((parm1=0; parm1<=4; parm1++))
do
for ((parm2=0; parm2<=30; parm2=parm2+6))
do
echo mpirun -np <num_processors> ... # This line submits a job to run on many processors
mpirun -np <num_processors> ... # This line submits a job to run on many processors
done
done |
I (used to) run my job like this:
Should I now run it as Quote: | nice -n 15 submitJOB | or do I modify the script file so that the command inside the loop reads Quote: | nice -n 15 mpirun ... | (or both)?
Second, I need to find out which nodes are being used. I have been told about the top command, but it has an interactive output (and it's man page dosn't indicate a way to redirect that). That means I have to rsh to each node, and run the top command. We do have a bash script called rcom (and rcom-nodes) which will rsh to every node and issue a command. I am seeking a command line command which will tell me who the biggest users, or big processes, on a node are and how much processing and memory they've used up, and give me text output, so that I can use that command as rcom <that_command>, and thus get a quick read on which processors are in heavy use. _________________ Depserately needs help learning Gentoo Linux in order to use a 32-node cluster for my master's thesis in mathematics. |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9656 Location: almost Mile High in the USA
|
Posted: Fri Jul 26, 2013 6:41 pm Post subject: |
|
|
On a distributed computing cluster, there should be a master node that distributes jobs to nodes in a cluster. But you mention you don't have a computing cluster software system installed, so this means you are scheduling jobs by rsh'ing.
ps(1) will give you most of what you want in terms of memory utilization. You also should refer to free(1).
And nice(1) will reduce the priority to all child processes to the nice command. But if it hops to another machine it will lose that property - but you said you didn't have a queuing system so that means your program will rsh to another machine??? (bad practice IMHO without a centralized queuing system) which means you once again will have to nice it again when it hops to another machine...
If a lot of people are doing stuff like this, investment into a queuing system would be highly suggested... _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
odeSolver Tux's lil' helper
Joined: 11 Jul 2010 Posts: 84 Location: NJ, USA
|
Posted: Fri Jul 26, 2013 7:05 pm Post subject: |
|
|
eccerr0r wrote: | On a distributed computing cluster, there should be a master node that distributes jobs to nodes in a cluster. But you mention you don't have a computing cluster software system installed, so this means you are scheduling jobs by rsh'ing.
ps(1) will give you most of what you want in terms of memory utilization. You also should refer to free(1).
And nice(1) will reduce the priority to all child processes to the nice command. But if it hops to another machine it will lose that property - but you said you didn't have a queuing system so that means your program will rsh to another machine??? (bad practice IMHO without a centralized queuing system) which means you once again will have to nice it again when it hops to another machine...
If a lot of people are doing stuff like this, investment into a queuing system would be highly suggested... |
Are we using the same OS? The command ps(1) returns "-bash: syntax error near unexpected token `1'". I am misunderstanding your notation.
I am not manually scheduling jobs by rsh'ing. MPI may be doing that automatically in the background. Did you mean that I am manually scheduling jobs using rsh? Did I forget to mention that I'ms using MPI? Specifically, I'm using MPICH Version: 1.2.7. _________________ Depserately needs help learning Gentoo Linux in order to use a 32-node cluster for my master's thesis in mathematics. |
|
Back to top |
|
|
vaxbrat l33t
Joined: 05 Oct 2005 Posts: 731 Location: DC Burbs
|
Posted: Sat Jul 27, 2013 1:47 am Post subject: Did you know that cygwin has an xorg server? |
|
|
cygwin can install an xorg server and give you an xterm into the cluster or let you do "x -query" if the cluster members are providing xdmcp logins. It and a number of packages are not installed by default when you run the cygwin setup.exe though. It pains me to see people still shelling out big bucks at corporate for Exceed when they can just install a cygwin server for free.
Other stupid tricks that people don't know it can do are nfs server, tftp server and QT4 and PyQT4. However I hate the way that RedHat bastardizes the packaging of the latter two. Someone needs to get them to cut the crap and finally drop qt3 in favor of support for qt4 while they are at it so they don't do all of the crappy command renaming that they do. |
|
Back to top |
|
|
GabrielYYZ n00b
Joined: 03 May 2012 Posts: 24 Location: Dominican Republic
|
Posted: Sat Jul 27, 2013 5:16 am Post subject: |
|
|
odeSolver wrote: | eccerr0r wrote: | On a distributed computing cluster, there should be a master node that distributes jobs to nodes in a cluster. But you mention you don't have a computing cluster software system installed, so this means you are scheduling jobs by rsh'ing.
ps(1) will give you most of what you want in terms of memory utilization. You also should refer to free(1).
And nice(1) will reduce the priority to all child processes to the nice command. But if it hops to another machine it will lose that property - but you said you didn't have a queuing system so that means your program will rsh to another machine??? (bad practice IMHO without a centralized queuing system) which means you once again will have to nice it again when it hops to another machine...
If a lot of people are doing stuff like this, investment into a queuing system would be highly suggested... |
Are we using the same OS? The command ps(1) returns "-bash: syntax error near unexpected token `1'". I am misunderstanding your notation.
I am not manually scheduling jobs by rsh'ing. MPI may be doing that automatically in the background. Did you mean that I am manually scheduling jobs using rsh? Did I forget to mention that I'ms using MPI? Specifically, I'm using MPICH Version: 1.2.7. |
ps(1), free(1) and nice(1) is how they appear in the man pages, the (number) is the section they appear in (general commands in this case). You're supposed to use ps, free and nice.
https://en.wikipedia.org/wiki/Man_page |
|
Back to top |
|
|
toralf Developer
Joined: 01 Feb 2004 Posts: 3922 Location: Hamburg
|
Posted: Sat Jul 27, 2013 1:35 pm Post subject: Re: How do I use a cluster nicely? |
|
|
odeSolver wrote: | I sign into the system using a character terminal (PuTTy and Cygwin Terminal). | Thisshould give you a forwarded X11 port. Furtermore with Code: | -L rport:localhost:lport | you can forward every port to your system. |
|
Back to top |
|
|
odeSolver Tux's lil' helper
Joined: 11 Jul 2010 Posts: 84 Location: NJ, USA
|
Posted: Sat Jul 27, 2013 4:01 pm Post subject: |
|
|
GabrielYYZ wrote: | ps(1), free(1) and nice(1) is how they appear in the man pages, the (number) is the section they appear in (general commands in this case). You're supposed to use ps, free and nice.
https://en.wikipedia.org/wiki/Man_page |
OK (and duh!) . Thanks. IOW is how I learn about the ps command.
I think if I can figure out the right ps command (and understand it's output), I will be on track. _________________ Depserately needs help learning Gentoo Linux in order to use a 32-node cluster for my master's thesis in mathematics.
Last edited by odeSolver on Sat Jul 27, 2013 4:06 pm; edited 1 time in total |
|
Back to top |
|
|
odeSolver Tux's lil' helper
Joined: 11 Jul 2010 Posts: 84 Location: NJ, USA
|
Posted: Sat Jul 27, 2013 4:04 pm Post subject: Re: How do I use a cluster nicely? |
|
|
vaxbrat wrote: | cygwin can install an xorg server and give you an xterm into the cluster or let you do "x -query" if the cluster members are providing xdmcp logins. It and a number of packages are not installed by default when you run the cygwin setup.exe though. It pains me to see people still shelling out big bucks at corporate for Exceed when they can just install a cygwin server for free. |
Thanks. OK, I installed the xorg server - at least I think - but I don't know where to go from here. What is an xterm, how do you do an x -query, and how does that help me?
toralf wrote: | Thisshould give you a forwarded X11 port. Furtermore with Code: | -L rport:localhost:lport | you can forward every port to your system. |
Thanks, but I'm not sure what can I do with an X11 port? I have installed - I think - Cygwin/X and started the terminal. I tried , but it just signed me into the same system just like before. I also tried . How does that help me?
What does forwarding everything to my port mean/do? _________________ Depserately needs help learning Gentoo Linux in order to use a 32-node cluster for my master's thesis in mathematics. |
|
Back to top |
|
|
vaxbrat l33t
Joined: 05 Oct 2005 Posts: 731 Location: DC Burbs
|
Posted: Sat Jul 27, 2013 7:19 pm Post subject: ssh -X |
|
|
ssh -X and ssh -Y set your xorg display back to your local desktop when you log into the cluster. Any application you run on the remote node will have its graphics windows go back to display on your desktop. That should allow you to use the web tools that the others were using to keep track of the cluster.
If you want to have the full desktop login experience of the remote node, that's where xdmcp comes in. However the remote node must have its login manager set up to enable remote xdmcp logins before this can happen. Check with your sysadmins.
Assuming that's enabled. Don't start the local cygwin X server. Instead bring up a standard cygwin terminal shell and do
where system is the remote system's node name or ip address. That will start a cygwinx server but it will use the remote system for session management. You should then see the remote system's login window just as it you had sat down at its local console. |
|
Back to top |
|
|
odeSolver Tux's lil' helper
Joined: 11 Jul 2010 Posts: 84 Location: NJ, USA
|
Posted: Sat Jul 27, 2013 8:43 pm Post subject: Re: ssh -X |
|
|
vaxbrat wrote: | ssh -X and ssh -Y set your xorg display back to your local desktop when you log into the cluster. Any application you run on the remote node will have its graphics windows go back to display on your desktop. That should allow you to use the web tools that the others were using to keep track of the cluster.
If you want to have the full desktop login experience of the remote node, that's where xdmcp comes in. However the remote node must have its login manager set up to enable remote xdmcp logins before this can happen. Check with your sysadmins.
Assuming that's enabled. Don't start the local cygwin X server. Instead bring up a standard cygwin terminal shell and do
where system is the remote system's node name or ip address. That will start a cygwinx server but it will use the remote system for session management. You should then see the remote system's login window just as it you had sat down at its local console. |
It seems we're on two different tracks here, and there is more about my setup that I didn't mention. I'm not actually logging directly into the cluster in question - I have to first sign a passthrough linux machine, then from there ssh into the cluster.
I used the X command you gave me, it opened a new window, but I never got the logon prompt in the new window. I suspect the passthrough does not have xdmcp enabled.
I also notice a new Cygwin-X program group in my start menu - but there are no programs in it. I appreciate the help you've given so far. Got any more for me? _________________ Depserately needs help learning Gentoo Linux in order to use a 32-node cluster for my master's thesis in mathematics. |
|
Back to top |
|
|
vaxbrat l33t
Joined: 05 Oct 2005 Posts: 731 Location: DC Burbs
|
Posted: Sun Jul 28, 2013 12:25 am Post subject: X passthru |
|
|
If you just got a blank screen from the X -query, then that proxy linux box probably doesn't have xdmcp enabled. So we're back to the local X session and xterm.
When you start your local cygwin x server, you should get an xterm window popping up with a cygwin bash shell. Do an
and note the ip address for your local desktop (eg 192.168.1.15). An insecure but easy way to allow others to open X windows on your local desktop is to disable access control with
Then ssh to get into the proxy system or whever you eventually log in. On that final system you are going to do
Code: | export DISPLAY=192.168.1.15:0 |
You should now be able to get remote windows to pop up on your desktop from the linux applications. If the ssh -X and ssh -Y stuff weren't allowing windows to come back to your desktop in the first place, it's probably because your sysadmin never enabled X forwarding in the remote linux box's /etc/ssh/sshd_config file. |
|
Back to top |
|
|
odeSolver Tux's lil' helper
Joined: 11 Jul 2010 Posts: 84 Location: NJ, USA
|
Posted: Sun Jul 28, 2013 12:48 am Post subject: Re: X passthru |
|
|
vaxbrat wrote: | If you just got a blank screen from the X -query, then that proxy linux box probably doesn't have xdmcp enabled. So we're back to the local X session and xterm.
When you start your local cygwin x server, you should get an xterm window popping up with a cygwin bash shell. Do an
and note the ip address for your local desktop (eg 192.168.1.15). An insecure but easy way to allow others to open X windows on your local desktop is to disable access control with
Then ssh to get into the proxy system or whever you eventually log in. On that final system you are going to do
Code: | export DISPLAY=192.168.1.15:0 |
You should now be able to get remote windows to pop up on your desktop from the linux applications. If the ssh -X and ssh -Y stuff weren't allowing windows to come back to your desktop in the first place, it's probably because your sysadmin never enabled X forwarding in the remote linux box's /etc/ssh/sshd_config file. |
Thanks. I probably won't be able to try this until at least Monday. But I'm pretty sure X forwarding is enabled, because other students are doing this or something similar. _________________ Depserately needs help learning Gentoo Linux in order to use a 32-node cluster for my master's thesis in mathematics. |
|
Back to top |
|
|
unitstep n00b
Joined: 17 Oct 2012 Posts: 9
|
Posted: Tue Jul 30, 2013 1:32 pm Post subject: |
|
|
Doesn't the department provide any guidelines/documentation on how to use the clusters responsively and how to use them in general?
I don't really see why you "Depserately needs help learning Gentoo Linux in order to use a 32-node cluster for my master's thesis in mathematics."
There shouldn't be any gentoo specific things that you have to learn if you are not a system admin/maintainer. There would rather be linux things in general.
vaxbrat wrote: |
An insecure but easy way to allow others to open X windows on your local desktop is to disable access control with
|
Insecure indeed, basically means anyone who is in your network can record your keystrokes, screen, everything..
Atleast use
Code: | xhost +login.node.hostname |
But this wouldn't be very secure either since
1) It won't be encrypted or anything.
2) Any user on the login node could export to your X-server |
|
Back to top |
|
|
odeSolver Tux's lil' helper
Joined: 11 Jul 2010 Posts: 84 Location: NJ, USA
|
Posted: Tue Jul 30, 2013 2:08 pm Post subject: |
|
|
unitstep wrote: | Doesn't the department provide any guidelines/documentation on how to use the clusters responsively and how to use them in general?
I don't really see why you "Depserately needs help learning Gentoo Linux in order to use a 32-node cluster for my master's thesis in mathematics."
There shouldn't be any gentoo specific things that you have to learn if you are not a system admin/maintainer. There would rather be linux things in general.
vaxbrat wrote: |
An insecure but easy way to allow others to open X windows on your local desktop is to disable access control with
|
Insecure indeed, basically means anyone who is in your network can record your keystrokes, screen, everything..
Atleast use
Code: | xhost +login.node.hostname |
But this wouldn't be very secure either since
1) It won't be encrypted or anything.
2) Any user on the login node could export to your X-server |
You're right, I don't need anything Gentoo specific. But this is a Gentoo forum, and one of the most helpful forums around, so I ask here (and I didn't think anyone really read the footers). The server is brand new and there are no guidelines and no MPI queue setup. The administrator is my adviser, a math professor, who knows more Linux than I do - but not much more. We are figuring these things out together. _________________ Depserately needs help learning Gentoo Linux in order to use a 32-node cluster for my master's thesis in mathematics. |
|
Back to top |
|
|
unitstep n00b
Joined: 17 Oct 2012 Posts: 9
|
Posted: Tue Jul 30, 2013 2:43 pm Post subject: |
|
|
Possibly, you could do something like:
Code: | mpirun -np .... ./nicempiscript.sh ... |
and then put in nicempiscript.sh:
Code: | #!/bin/bash
nice -n 15 ./yourmpiexecutable $@ |
|
|
Back to top |
|
|
erikm l33t
Joined: 08 Feb 2005 Posts: 634
|
Posted: Wed Jul 31, 2013 11:37 am Post subject: |
|
|
My two cents:
I've built Beowulf clusters on Gentoo, and am currently running a multi-core workstation as a mini compute cluster. Having a large server where a number of users run whatever they want from the commandline, as you describe your situation, is begging for problems that no amount of nice'ing and shell script trickery will solve.
Tell your advisor to look into the Torque package (in Portage as sys-cluster/torque). It is a Resource manager based on the old PBS system. It works beautifully under Gentoo, and will allow your system admins to precisely allocate resources where needed, and avoid the kinds of problems you describe. |
|
Back to top |
|
|
|