View previous topic :: View next topic |
Author |
Message |
Vieri l33t
Joined: 18 Dec 2005 Posts: 874
|
Posted: Mon Feb 27, 2006 5:37 pm Post subject: [SOLVED?] EVMS: not a valid root device; Start udevd w/ldap |
|
|
Hello,
I installed Gentoo on an Intel EM64T using the 2005.1-r1 AMD64 Universal CD with the default 2.6.12-r10.
I compiled it with genkernel --menuconfig --evms2 all
I selected ramdisk support, raid and device mapper.
Grub.conf contained:
title=INF-BL07 64bit EM64T NOCONA 2.6.12-r10 SCSI EVMS RAID1
root (hd0,0)
kernel /kernel-genkernel-x86_64-2.6.12-gentoo-r10 root=/dev/ram0 init=/linuxrc mem=4096M ramdisk=8192 doscsi vga=0 real_root=/dev/evms/root udev doevms2
initrd /initramfs-genkernel-x86_64-2.6.12-gentoo-r10
This system booted fine.
However, I made the "mistake?" of updating the whole system:
emerge --update --deep --newuse system
emerge --update --deep --newuse world
When I rebooted this system with the 2.6.12-r10 kernel, it "hung" on "Starting udev...".
If I pressed CTRL-C, the init process resumed but "hung again" on "Cleaning /tmp...".
So I commented out some lines in /etc/init.d/bootmisc (especially:
mkdir -p /tmp/.{ICE,X11}-unix
chown 0:0 /tmp/.{ICE,X11}-unix
chmod 1777 /tmp/.{ICE,X11}-unix
[[ -x /sbin/restorecon ]] && restorecon /tmp/.{ICE,X11}-unix
and all the >/dev/nulls)
and that allowed the system to boot. (I don't understand why)
I supposed that upgrading the whole system (baselayout 1.11.14-r5) without recompiling a recent kernel could have caused udev to "hang" so I recompiled the current 2.6.15-r5 kernel:
genkernel --menuconfig --evms2 all
I selected ramdisk support, raid and device mapper.
I updated grub.conf but when I rebooted I got these messages:
>> Activating udev OK
>> Activating EVMS OK
Determining root device...
Block device /dev/evms/root is not a valid root device
The root block device is unspecified or not detected.
Specify device to boot or "shell" for a shell.
So I "shelled"and noticed that even though evms_activate yields no errors/warnings, ls /dev/evms/ only lists "dm" and "dm/control".
Does anyone know why I can't see the /dev/evms/root or /dev/evms/.nodes/sda devices?
How could I debug? Any suggestions?
Last edited by Vieri on Tue Feb 28, 2006 6:53 pm; edited 1 time in total |
|
Back to top |
|
|
jschellhaass Guru
Joined: 20 Jan 2004 Posts: 341
|
Posted: Mon Feb 27, 2006 10:12 pm Post subject: |
|
|
What type of SCSI card are you using to boot?
I believe genkernel only puts sata drivers in the initrd. You can try compiling the disk controller driver into the kernel instead of as a module.
jeff |
|
Back to top |
|
|
Vieri l33t
Joined: 18 Dec 2005 Posts: 874
|
Posted: Tue Feb 28, 2006 7:20 am Post subject: |
|
|
The SCSI cards are LSI 1020 Ultra320 (integrated, one channel).
Two SCSI disks are connected via a PERC 4/im RAID controller.
Will double-check whether the LSI is built into the kernel (actually, genkernel worked fine for 2.6.12 - the problem has arisen for 2.6.15, oddly). |
|
Back to top |
|
|
Vieri l33t
Joined: 18 Dec 2005 Posts: 874
|
Posted: Tue Feb 28, 2006 9:46 am Post subject: |
|
|
I tried installing Gentoo with the new 2006.0 amd64 image on a Dell PowerEdge 1855 EM64T. This system only has a USB CD drive. 2006.0 and 2005.1 could not find/detect it. 2005.1-r1 detected it as /dev/sr0 and booted just fine.
It seems that the enhancements made to 2005.1-r1 were not propagated to 2006.0... |
|
Back to top |
|
|
Vieri l33t
Joined: 18 Dec 2005 Posts: 874
|
Posted: Tue Feb 28, 2006 12:36 pm Post subject: |
|
|
Just in case someone has the same problem, here's how I pinpointed mine (thanks to the evms mailing list).
- the evms not detecting the disk was due to the fact that the scsi adapter was not built in the kernel (was wrongly assuming 2.6.15 had more or less same defaults as 2.6.12; if you have the same system, enable FUSION drivers in the kernel)
- the endless "Starting udevd..." was due to my "special" configuration and I suppose quite a few users may be in this situation. Authentication in my system is done via LDAP so nsswitch.conf contained references to ldap. Somehow, the latest stable udev tries to resolve a tss user/group and udevd hangs on that.
To fix this problem there are 2 or 3 quick solutions:
-upgrade to an unstable udevd (untested and may not work but the developers are aware of this problem)
-edit /etc/nsswitch.conf and remove ldap. Of course that's not a permanent solution unless you change your authentication scheme. But at least you will be able to boot ok.
-edit /etc/udev/rules.d/50-udev.rules and comment the entry for the tss user/group (search for KERNEL=="tpm)
There's a Gentoo bug report on this issue: https://bugs.gentoo.org/show_bug.cgi?id=99564
I think the latest udev ebuild was marked stable too soon (LDAP environments weren't tested?).
[EDIT1]:
upgrading to an unstable udevd does not solve the issue (as of Feb 28th 2006)
[EDIT2]:
Comenting out the line
KERNEL=="tpm*", NAME="%k", OWNER="tss", GROUP="tss", MODE="0600"
is a quick solution to avoid udev eternal lookups.
However there's another step that also blocks the init process: "Cleaning /tmp"
/etc/init.d/bootmisc
on the line
chown 0:0 /tmp/.{ICE,X11}-unix
If I comment that line out then the system boots ok (not a definite solution though).
System is EM64T (amd64 iso), latest udev and latest baselayout.
nsswitch.conf needs ldap in my case.
Last edited by Vieri on Tue Feb 28, 2006 5:57 pm; edited 1 time in total |
|
Back to top |
|
|
skyPhyr Apprentice
Joined: 17 Sep 2004 Posts: 159 Location: London, UK
|
|
Back to top |
|
|
Vieri l33t
Joined: 18 Dec 2005 Posts: 874
|
Posted: Tue Feb 28, 2006 6:05 pm Post subject: |
|
|
Glad it could help someone.
Strangely, this udev "bug" I mentioned above has been reported 6 months ago.
Hope they at least put a big ewarn for ldap users. |
|
Back to top |
|
|
sedorox Apprentice
Joined: 13 Feb 2004 Posts: 206
|
Posted: Sat Mar 04, 2006 7:57 pm Post subject: |
|
|
This is weird... I have 2 machines... one is my new ldap test box.. the other is my 'production' box... Today (spring break, yay!) I booted up both. The 'test' box came up just fine, however, the 'production' box didn't. It hung at the udev thing. Your solution (commenting out the TPM device) did the trick.
Here's the kicker.... Both boxes has ldap (as server) and have the entries for nsswitch.conf... Both have udev-85... (I did notice -86 is out.. still gotta test). But one box had a problem, and one didn't...
Funky..... |
|
Back to top |
|
|
Vieri l33t
Joined: 18 Dec 2005 Posts: 874
|
Posted: Sat Mar 04, 2006 9:41 pm Post subject: |
|
|
Do both boxes have the same sys-apps/baselayout version? |
|
Back to top |
|
|
sedorox Apprentice
Joined: 13 Feb 2004 Posts: 206
|
Posted: Sun Mar 05, 2006 11:04 pm Post subject: |
|
|
Vieri wrote: | Do both boxes have the same sys-apps/baselayout version? |
Actually, yes, they are the same:
'test' box:
1.12.0_pre16-r1
'production' box:
1.12.0_pre16-r1
Thought I should do updates.. there are updates to both udev and baselayout.... |
|
Back to top |
|
|
twam Apprentice
Joined: 15 Feb 2005 Posts: 189 Location: Ammerbuch, Germany
|
Posted: Mon Mar 13, 2006 10:11 pm Post subject: |
|
|
Same problem here with sys-apps/baselayout-1.12.0_pre16-r3 on 2 machines: emt64 and a pentium-m. :/ |
|
Back to top |
|
|
net n00b
Joined: 18 Mar 2006 Posts: 5
|
Posted: Sat Mar 18, 2006 10:55 pm Post subject: |
|
|
The same problem here after the laste emerge -uD world yesterday.
(system stable x86 : Linux sk-srv 2.6.14-hardened-r5 #1 PREEMPT Wed Feb 1 22:17:18 CET 2006 i686 Pentium II (Deschutes) GenuineIntel GNU/Linux)
As a workaround I removed ldap from nssswitch.conf
Any idea about that ?
It's not a big probem at this time, but i'm working on ldap , so it has to work in the future.
Regards |
|
Back to top |
|
|
sedorox Apprentice
Joined: 13 Feb 2004 Posts: 206
|
Posted: Sun Mar 19, 2006 7:45 am Post subject: |
|
|
I've developed some other problems on my test box.. (yea.. the tpm bug finally appeared) but not a few things lag on start.. and i have problems with when slapd starts.. it tries to bind to itself.. and other stuff... nsswitch.conf related (looking for users) so I'm hesitant about upgrading my production box, however, I think I'm going to do it package by package, and see what breaks it... |
|
Back to top |
|
|
BernieKe Tux's lil' helper
Joined: 02 Jul 2002 Posts: 130 Location: California/Bangalore/Belgium
|
Posted: Thu Mar 30, 2006 6:04 am Post subject: |
|
|
putting the following in /etc/ldap.conf fixed the udev problem for me:
|
|
Back to top |
|
|
sedorox Apprentice
Joined: 13 Feb 2004 Posts: 206
|
Posted: Fri Apr 07, 2006 2:27 am Post subject: |
|
|
Ok.. here's the thing... I upgrade my 'production' box slowly, and it isn't baselayout. Its sys-auth/nss_ldap.
The system was running: 239-r1
As soon as I upgraded to 249 I started having issues
Mine is when slapd starts, it tries to bind to itself (itsn't this a bad thing?) and of course udev, and apache, and other things that start before slapd does.
The only fix was to do the 'bind_policy soft' thingy. besides downgrading, that I've found.
Granted, I don't know which version broke this, but at least we know what package it is... Maybe I should file a bug report? (tho I don't know what to report) _________________ Home Desktop: Ryzen 3900X 3.8ghz | 32G Ram | 2x 1TB NVMe
Previous 7 Year Build: Intel i5-2400 3.1ghz | 16G Ram | 1x 60G SSD, 1x 1TB HDD |
|
Back to top |
|
|
Ausdonky n00b
Joined: 12 May 2004 Posts: 15 Location: Brisbane, Oztralia :)
|
Posted: Tue Apr 18, 2006 10:38 am Post subject: |
|
|
Hi guys..
After having spent the last 4 hours thinking that my semi-production box had farked itself after a forced reboot (I was getting segfaults from udev?!) I managed to find out that there was nothing wrong with it?! It was ldap.. I managed to boot the bugger then re-enable ldap in the nsswitch.conf file but of course this was just a temp fix. Anyway.. after re-enabling ldap i rebooted to see if it still had issues but this time it just hung on udevd. In a fit of rage i gave the keyboard a good whack and then out of habit hit Ctrl-C and to my amazment it booted! I would assume that this will cause udev to not load devices after the point i break at but it will get you to a shell to fix it if you need to (rather than having to boot a livecd or similar)
btw i applied the patch as per above to the udev.rules file and this worked great. I also tried setting the bind_policy to soft but this didnt seem to work..
HTH
Andrew |
|
Back to top |
|
|
cantao Apprentice
Joined: 07 Jan 2004 Posts: 166
|
Posted: Tue Apr 18, 2006 1:53 pm Post subject: |
|
|
I've had the same problem, as described here:
https://forums.gentoo.org/viewtopic-t-448608-highlight-.html
and commenting out the appropriate line on /etc/udev/rules.d/50-udev.rules worked flawlessly. No need to mess with /etc/ldap.conf (yes, I'm using ldap also).
I know it's something that can be easily sent to oblivion by a bad etc-update, but nice hack anyway
Thanks a lot, Cantão! |
|
Back to top |
|
|
sedorox Apprentice
Joined: 13 Feb 2004 Posts: 206
|
Posted: Wed Apr 19, 2006 4:12 am Post subject: |
|
|
cantao wrote: |
and commenting out the appropriate line on /etc/udev/rules.d/50-udev.rules worked flawlessly. No need to mess with /etc/ldap.conf (yes, I'm using ldap also).
|
This works.. however I found when starting other services (like ldap itself) or apache... etc.. that it tries to bind to ldap.. and since, for some reason, its one of the last things to be started, that it fails, so I needed the ldap.conf setting...I wish I knew exactly what caused this in the first place.. was working so fine untill that one package update... _________________ Home Desktop: Ryzen 3900X 3.8ghz | 32G Ram | 2x 1TB NVMe
Previous 7 Year Build: Intel i5-2400 3.1ghz | 16G Ram | 1x 60G SSD, 1x 1TB HDD |
|
Back to top |
|
|
McManus Apprentice
Joined: 10 Apr 2002 Posts: 176 Location: Austin, TX
|
Posted: Sun Jun 11, 2006 2:54 am Post subject: |
|
|
sedorox wrote: | cantao wrote: |
and commenting out the appropriate line on /etc/udev/rules.d/50-udev.rules worked flawlessly. No need to mess with /etc/ldap.conf (yes, I'm using ldap also).
|
This works.. however I found when starting other services (like ldap itself) or apache... etc.. that it tries to bind to ldap.. and since, for some reason, its one of the last things to be started, that it fails, so I needed the ldap.conf setting...I wish I knew exactly what caused this in the first place.. was working so fine untill that one package update... |
I am experiencing exactly the same thing. Any ideas, short of removing ldap support? _________________ McManus
----
Linux user #267375 - http://counter.li.org |
|
Back to top |
|
|
sedorox Apprentice
Joined: 13 Feb 2004 Posts: 206
|
Posted: Fri Jun 23, 2006 7:39 pm Post subject: |
|
|
McManus wrote: |
I am experiencing exactly the same thing. Any ideas, short of removing ldap support? |
Sorry it took me a while to get back to you.... here is what I have changed in my ldap.conf that has seemed to work:
Code: |
bind_policy soft
nss_reconnect_tries 3
|
I also still have the tpm device commented out in /etc/udev/rules/50-udev.rules _________________ Home Desktop: Ryzen 3900X 3.8ghz | 32G Ram | 2x 1TB NVMe
Previous 7 Year Build: Intel i5-2400 3.1ghz | 16G Ram | 1x 60G SSD, 1x 1TB HDD |
|
Back to top |
|
|
MorpheuS.Ibis Tux's lil' helper
Joined: 22 Apr 2006 Posts: 143
|
Posted: Sun Oct 08, 2006 5:41 pm Post subject: |
|
|
i am just kind of a n00b in this but i also use LDAP and udev...
what about make nsswitch.conf a symlink and change it using local initscript (/etc/conf.d/local.start)? local starts at the end of booting process so network connection should be up and also the LDAP server (if you have it on that machine). also, change the symlink back when stopping the system (/etc/conf.d/local.stop)...
this actually kind of needs to have the local initscript for its disposal (so you dont mess with traffic shaping or something like that when experimenting with LDAP) so maybe creating an initscript for it (copied and a bit edited local shoud be sufficient) should be good idea. but there is one more thing....its too simple to work, but why dont give it a try? |
|
Back to top |
|
|
|