Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
NIS and groups
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Networking & Security
View previous topic :: View next topic  
Author Message
big_gie
Apprentice
Apprentice


Joined: 31 Aug 2004
Posts: 158

PostPosted: Tue Dec 07, 2010 10:43 pm    Post subject: NIS and groups Reply with quote

Hi,

A machine of ours is set up as a NIS server to propagate passwords and such to different other machines. These can then be used as a single cluster.

I'm having trouble submitting jobs to an installed torque to this cluster. The jobs sits in the queue indefinitely. Looking at the torque's logs, the compute nodes of the cluster seems to have trouble establishing a connection. I can ssh just fine between all the different machines though.

Here is a snippet of the log when I submit a job to run on two nodes. On one node of the cluster, I have:
Quote:
11/30/2010 20:31:31;0008; pbs_mom;Job;90858.mycluster;no group entry for group me, user=me, errno=0 (Success)
11/30/2010 20:31:31;0008; pbs_mom;Job;90858.mycluster;ERROR: received request 'ABORT_JOB' from 10.0.0.105:1023 for job '90858.mycluster' (job does not exist locally)
[repeated many times, until I cancel the job]

while another one I get:
Quote:
11/30/2010 20:10:09;0008; pbs_mom;Req;send_sisters;sending ABORT to sisters
11/30/2010 20:10:09;0008; pbs_mom;Job;90857.unicron.cl.uottawa.ca;Job Modified at request of PBS_Server@mycluster
11/30/2010 20:10:09;0080; pbs_mom;Svr;preobit_reply;top of preobit_reply
11/30/2010 20:10:09;0080; pbs_mom;Svr;preobit_reply;DIS_reply_read/decode_DIS_replySvr worked, top of while loop
11/30/2010 20:10:09;0080; pbs_mom;Svr;preobit_reply;cannot locate job that triggered req
11/30/2010 20:10:09;0080; pbs_mom;Svr;preobit_reply;top of preobit_reply
11/30/2010 20:10:09;0080; pbs_mom;Svr;preobit_reply;DIS_reply_read/decode_DIS_replySvr worked, top of while loop
11/30/2010 20:10:09;0080; pbs_mom;Svr;preobit_reply;in while loop, no error from job stat
11/30/2010 20:10:09;0008; pbs_mom;Job;90857.mycluster;checking job post-processing routine
11/30/2010 20:10:09;0080; pbs_mom;Job;90857.mycluster;obit sent to server
11/30/2010 20:10:10;0001; pbs_mom;Svr;pbs_mom;Bad UID for job execution (15023) in 90857.mycluster, job_start_error from node 10.0.0.104:15003 in job_start_error
11/30/2010 20:10:10;0001; pbs_mom;Svr;pbs_mom;Bad UID for job execution (15023) in 90857.mycluster, abort attempted 16 times in job_start_error. ignoring abort request from node 10.0.0.104:15003
[repeated many times, until I cancel the job]


It seems it might have something to do with ids... I thus checked them on the headnode and the compute nodes:
Quote:
me@headnode $ id me
uid=1001(me) gid=1009(me) groups=1009(me)

Quote:
me@node104 $ id me
uid=1001(me) gid=1009 groups=1009

It looks like the nodes don't know about the groups' name? When I type "ls -l" on the nodes, the files/folders group in my home directory is "1009" while on the headnode it's my username.

I initially though the problem was with torque, but could it be with NIS? I don't know anything about NIS, is there a way I can test it?

Thanks a lot for any insights, suggestions or help!
Back to top
View user's profile Send private message
tony-curtis
Tux's lil' helper
Tux's lil' helper


Joined: 20 May 2006
Posts: 111

PostPosted: Wed Dec 08, 2010 9:59 pm    Post subject: Reply with quote

what's the group setting in /etc/nsswitch.conf?
Back to top
View user's profile Send private message
big_gie
Apprentice
Apprentice


Joined: 31 Aug 2004
Posts: 158

PostPosted: Wed Dec 08, 2010 10:02 pm    Post subject: Reply with quote

Here's the content of the /etc/nsswitch.conf file (on the headnode):
Quote:
#passwd: compat
#shadow: compat
#group: compat
passwd: files nis
shadow: files nis
group: files nis

# passwd: db files nis
# shadow: db files nis
# group: db files nis

hosts: files dns
networks: files dns

services: db files
protocols: db files
rpc: db files
ethers: db files
netmasks: files
netgroup: files
bootparams: files

automount: files
aliases: files
Back to top
View user's profile Send private message
tony-curtis
Tux's lil' helper
Tux's lil' helper


Joined: 20 May 2006
Posts: 111

PostPosted: Wed Dec 08, 2010 10:04 pm    Post subject: Reply with quote

what's in nsswitch.conf on the compute nodes?
Back to top
View user's profile Send private message
big_gie
Apprentice
Apprentice


Joined: 31 Aug 2004
Posts: 158

PostPosted: Wed Dec 08, 2010 10:06 pm    Post subject: Reply with quote

I just checked and it is exactly the same file on the headnode and compute nodes...
Back to top
View user's profile Send private message
tony-curtis
Tux's lil' helper
Tux's lil' helper


Joined: 20 May 2006
Posts: 111

PostPosted: Wed Dec 08, 2010 10:10 pm    Post subject: Reply with quote

can you "ypcat passwd" (and group) on both the head and compute nodes?

Check also that "getent passwd" (and group) delivers the concatenation of /etc/passwd(group) and the NIS map.
Back to top
View user's profile Send private message
big_gie
Apprentice
Apprentice


Joined: 31 Aug 2004
Posts: 158

PostPosted: Wed Dec 08, 2010 10:15 pm    Post subject: Reply with quote

"ypcat passwd"'s output is identical on headnode and compute nodes. Here is an example:
Quote:
me:PASSWORDHASH:1001:1009:My name:/home/me:/bin/bash

(there's a dozen users though)

"ypcat group" does not return anything on either headnode or compute nodes. Is this normal?
Back to top
View user's profile Send private message
tony-curtis
Tux's lil' helper
Tux's lil' helper


Joined: 20 May 2006
Posts: 111

PostPosted: Wed Dec 08, 2010 10:18 pm    Post subject: Reply with quote

> normal?

depends on your setup. From what you've said, I'm guessing that the YP group map has been set up but is empty, and that the group(s) you're expecting to see are only in /etc/group on the head (so the "files" repository on the head picks up the groups for you, but neither "files" nor "nis" on the nodes will see anything). You need to set up the group map to get the local groups into YP, and then the nodes should see the groups properly.
Back to top
View user's profile Send private message
big_gie
Apprentice
Apprentice


Joined: 31 Aug 2004
Posts: 158

PostPosted: Wed Dec 08, 2010 10:47 pm    Post subject: Reply with quote

I think you are right: the yp group was set but empty as "ypcat group" returns without error but is empy.

I tried something to see if I could fix my original problem. I copied the headnode's /etc/group file to 2 compute nodes and then tried to submit a torque job. It seems the job ran on two nodes without failing!! I'll do more test to really verify this though, but it's encouraging and validate my initial guess of a problem with nis... ;)

Now to fix it permanently... As you said, I will need to "set up the group map into YP" so the nodes will see the different groups too. How can I achieve this?

Thanx a lot ;)
Back to top
View user's profile Send private message
tony-curtis
Tux's lil' helper
Tux's lil' helper


Joined: 20 May 2006
Posts: 111

PostPosted: Wed Dec 08, 2010 10:51 pm    Post subject: Reply with quote

The build for the YP maps is in /var/yp on the YP/NIS server (presumably also the head node? or use "ypwhich" to find the server).

A "make" in there will incorporate local changes into YP/NIS. passwd and group should be handled by default.
Back to top
View user's profile Send private message
big_gie
Apprentice
Apprentice


Joined: 31 Aug 2004
Posts: 158

PostPosted: Wed Dec 08, 2010 11:17 pm    Post subject: Reply with quote

Ok thanx.
I've followed the (archived) gentoo wiki for NIS[1] and read "Verifying the NIS/NYS Installation"[2] and "Creating and Updating NIS maps"[3]. The makefile already contained:
Quote:
[...]
all: passwd group hosts rpc services netid protocols netgrp mail \
shadow # publickey # networks ethers bootparams printcap \
# amd.home auto.master auto.home auto.local passwd.adjunct \
# timezone locale netmasks
[...]


Running make (as root) in /var/yp gives:
Quote:
sudo make
gmake[1]: Entering directory `/var/yp/[nisdomainname]'
Updating netid.byname...
Updating shadow.byname... Ignored -> merged with passwd
gmake[1]: Leaving directory `/var/yp/[nisdomainname]'

I then restarted ypbind on head node and compute node. But unfortunately, the same behavior is observed: "ypcat group" reports nothing and:
Quote:
me@computenode $ ypmatch me group
Can't match key me in map group.byname. Reason: No such key in map


Am I missing something?

[1] http://www.gentoo-wiki.info/HOWTO_Setup_NIS
[2] http://www.tldp.org/HOWTO/NIS-HOWTO/verification.html
[3] http://www.tldp.org/HOWTO/NIS-HOWTO/maps.html
Back to top
View user's profile Send private message
tony-curtis
Tux's lil' helper
Tux's lil' helper


Joined: 20 May 2006
Posts: 111

PostPosted: Thu Dec 09, 2010 1:15 am    Post subject: Reply with quote

is MERGE_GROUP=true in /var/yp/Makefile ?

also make sure MINGID is incorporating the groups you want visible to YP.

one thing to try is to force a group update: touch /etc/group and make in /var/yp
Back to top
View user's profile Send private message
big_gie
Apprentice
Apprentice


Joined: 31 Aug 2004
Posts: 158

PostPosted: Thu Dec 09, 2010 7:39 pm    Post subject: Reply with quote

MERGE_GROUP was set to true in the makefile. MINGID is set to 500, while our ids are 1000 and up.
I touched the /etc/group file, run make:
Quote:
$ sudo make
gmake[1]: Entering directory `/var/yp/[nisdomainname]'
Updating group.byname...
yphelper: This program is for internal use from some
ypserv scripts and should never be called
from a terminal
Updating group.bygid...
yphelper: This program is for internal use from some
ypserv scripts and should never be called
from a terminal
Updating netid.byname...
Updating shadow.byname... Ignored -> merged with passwd
gmake[1]: Leaving directory `/var/yp/[nisdomainname]'

Restarted ypbind and ypserve on the server, and ypbind on the compute node.

I still see "me 1009" instead of "me me" in "ls -l"'s output on the compute nodes. ypcat group is still empty...
Back to top
View user's profile Send private message
tony-curtis
Tux's lil' helper
Tux's lil' helper


Joined: 20 May 2006
Posts: 111

PostPosted: Thu Dec 09, 2010 8:35 pm    Post subject: Reply with quote

Baffling.

Could there perhaps be a minor formatting error in /etc/group that is being ignored locally, but the YP make process is choking on?
How did you create/edit the local groups? groupadd/vigr and friends, or ... ?
Back to top
View user's profile Send private message
big_gie
Apprentice
Apprentice


Joined: 31 Aug 2004
Posts: 158

PostPosted: Tue Mar 22, 2011 5:46 pm    Post subject: Reply with quote

I had to stop working on this for some time, but then I just checked again since I really need a queuing system (fighting for compute nodes is painful...)

The problem seemed to be the absence of gid correspondence between numbers and names on compute nodes. I tried installing sys-auth/munge-0.5.9 and one of test revealed this:
Quote:
headnode$ munge -n | ssh computenode unmunge
STATUS: Success (0)
ENCODE_HOST: headnode.cl.... (10.0.0.1)
ENCODE_TIME: 2011-03-22 13:37:28 (1300815448)
DECODE_TIME: 2011-03-22 13:37:30 (1300815450)
TTL: 300
CIPHER: aes128 (4)
MAC: sha1 (3)
ZIP: none (0)
UID: me (1001)
GID: ??? (1009)
LENGTH: 0

Note the "???".

I then grepped "group.bygid" in /var/yp/Makefile and found this line: "group.bygid: $(GROUP) $(GSHADOW) $(YPDIR)/Makefile". Then, grepping for "GSHADOW" revealed that the "GSHADOW=" line was commented! I uncommented it, ran make and now the gid seems to propagate correctly:
Quote:
headnode$ munge -n | ssh computenode unmunge
STATUS: Success (0)
ENCODE_HOST: headnode.cl.... (10.0.0.1)
ENCODE_TIME: 2011-03-22 13:37:28 (1300815448)
DECODE_TIME: 2011-03-22 13:37:30 (1300815450)
TTL: 300
CIPHER: aes128 (4)
MAC: sha1 (3)
ZIP: none (0)
UID: me (1001)
GID: me (1009)
LENGTH: 0


Now I don't know why the line was commented. Was it me or the vendor, can't tell. A backup I created last december has the line commented. I don't have a backup before that (I did not back'd-up /var...) so I can't verify.

Is the GSHADOW=... line normally commented? Could it be an old default value?

Thanks for all your help.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Networking & Security All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum