View previous topic :: View next topic |
Author |
Message |
smiffy Apprentice
Joined: 28 Jun 2006 Posts: 259 Location: SA.AU.AP.EARTH
|
Posted: Fri Sep 29, 2006 10:16 am Post subject: Human intervention always required on boot (Blade100, RAID1) |
|
|
I have been - very slowly - ironing out the problems on my Blade 100.
My system is based on 2006.0, kernel 2.6.17.5, "world" up to date. Running software RAID 1 on a pair of 80Gb IDE discs.
The most annoying problem that I still have is the fact that when I boot, I have to tell SILO my boot parameters every time, despite having a valid /etc/silo.conf - SILO always says "No config file loaded, you can boot from..."
I type in /boot/vmlinux root=/dev/md0, everything works fine. (/boot is not on a separate partition.)
Today, I just happened to try this:
Code: |
narsil ~ # lsraid -a /dev/md0
lsraid: Device "/dev/hda1" does not have a valid raid superblock
lsraid: Device "/dev/hdc1" does not have a valid raid superblock
[dev 9, 0] /dev/md0 00000000.00000000.00000000.00000000 online
[dev ?, ?] (unknown) 00000000.00000000.00000000.00000000 missing
[dev ?, ?] (unknown) 00000000.00000000.00000000.00000000 missing
|
Could this be related to the problem? I shut down, booted from the liveCD and created the array from scratch (mdadm --create ... ), thinking that this might fix things, but I still don't have valid raid superblocks (as above). My /dev/md1 (/swap), however, looks a lot more healthy:
Code: |
narsil ~ # lsraid -a /dev/md1
[dev 9, 1] /dev/md1 F3902FDF.A1C764DA.547D8643.F45680D9 online
[dev 22, 2] /dev/hdc2 F3902FDF.A1C764DA.547D8643.F45680D9 good
[dev 3, 2] /dev/hda2 F3902FDF.A1C764DA.547D8643.F45680D9 good
|
So, 1) would the bad superblocks be the cause of my boot problems and 2) how do I fix them?
Ideas anyone? _________________ Matthew Smith |
|
Back to top |
|
|
Weeve Retired Dev
Joined: 30 Oct 2002 Posts: 641
|
Posted: Fri Sep 29, 2006 4:12 pm Post subject: |
|
|
SILO cannot read from anything other than ext2/3, so you have to make sure that when you run SILO from Linux, you tell it the path to a config file where it will be able to read it from. It won't be able to read it from inside the RAID configuration.
Also if you already have a /boot partition seperate from /, then /etc/silo.conf needs to live in that /boot parition. You can tell SILO the alternative location to the config using the command silo -C /boot/silo.conf. |
|
Back to top |
|
|
smiffy Apprentice
Joined: 28 Jun 2006 Posts: 259 Location: SA.AU.AP.EARTH
|
Posted: Fri Sep 29, 2006 9:45 pm Post subject: |
|
|
Hmm. I wonder if, therefore, I should create a separate /boot, non-RAID, with a copy on each disc...
Due to mis-reading the partition data when doing the original partitioning, I happen to have 32Mb spare on each disc. _________________ Matthew Smith |
|
Back to top |
|
|
smiffy Apprentice
Joined: 28 Jun 2006 Posts: 259 Location: SA.AU.AP.EARTH
|
Posted: Sat Sep 30, 2006 12:16 am Post subject: |
|
|
OK - I now have a separate /boot (partition 4 of my first disc). I have put my silo.conf in there and am running silo with -C /boot/silo.conf. I don't seem to have got any further with the thing booting though.
No matter what I seem to do, I just get "No config file loaded, you can boot just from this command line". And yes, I can, no problems. Now that I'm using the separate (non-RAID) partition, I just boot thus:
Code: | boot: 4/vmlinux root=/dev/md0 |
My silo.conf looked like this:
Code: |
partition = 4
root = /dev/md0
timeout = 50
image = /vmlinux
label = Linux
|
(Previously had image = /boot/vmlinux, but this did not work either).
Ideas anyone? _________________ Matthew Smith |
|
Back to top |
|
|
Weeve Retired Dev
Joined: 30 Oct 2002 Posts: 641
|
Posted: Sat Sep 30, 2006 7:35 pm Post subject: |
|
|
What does your partition table look like? |
|
Back to top |
|
|
smiffy Apprentice
Joined: 28 Jun 2006 Posts: 259 Location: SA.AU.AP.EARTH
|
Posted: Sat Sep 30, 2006 8:38 pm Post subject: |
|
|
Here you go Weeve:
Code: |
Disk /dev/hda (Sun disk label): 255 heads, 63 sectors, 10009 cylinders
Units = cylinders of 16065 * 512 bytes
Device Flag Start End Blocks Id System
/dev/hda1 0 9483 76172197+ fd Linux raid autodetect
/dev/hda2 9843 10009 1333395 fd Linux raid autodetect
/dev/hda3 0 10009 80397292+ 5 Whole disk
/dev/hda4 9483 9843 2891700 83 Linux native
|
Note that /dev/hda4 is the new partition that I created with "spare" space for /boot. Originally, /boot was on /dev/md0, which is comprised of /dev/hda1, /dev/hdc1. The new /boot is formatted with ext2.[/code] _________________ Matthew Smith |
|
Back to top |
|
|
Weeve Retired Dev
Joined: 30 Oct 2002 Posts: 641
|
Posted: Sat Sep 30, 2006 9:14 pm Post subject: |
|
|
Ok that looks normal. Do you have a line in your silo.conf that looks similar to the following:
Code: | append="md=0,/dev/sda4,/dev/sdb1" | where /dev/sda4 is the first partition in your raid and /dev/sdb1 is the second partition in your raid?
I have a linear raid setup here with /dev/sda4 and /dev/sdb1 as the two partitions in the raid and /dev/sda1 as my /boot partition. My silo.conf looks like the following:
Code: | partition = 1
root = /dev/md0
timeout = 100
append="md=0,/dev/sda4,/dev/sdb1"
read-only
default=2.6.17
image = /kernel-2.6.17
label = 2.6.17 |
|
|
Back to top |
|
|
smiffy Apprentice
Joined: 28 Jun 2006 Posts: 259 Location: SA.AU.AP.EARTH
|
Posted: Sat Sep 30, 2006 11:59 pm Post subject: |
|
|
OK, thanks Weeve, let's give that a go. My /boot/silo.conf now reads:
Code: |
partition = 4
root = /dev/md0
timeout = 50
image = /vmlinux
label = Linux
append = "ide=nodma, md=0, /dev/hda1, /dev/hdc1"
|
/me runs silo -C /boot/silo.conf and reboots...
...still the same result. For whatever reason, silo is not finding my silo.conf (I guess), even though it is now in /boot, on it's own non-RAID, ext2 partition.
All I have to do to fire it up is to type 4/vmlinux ide=nodma root=/dev/md0 at the silo boot prompt - then it all fires up normally.
So, it isn't the lack of those parameters being passed to the kernel, as we aren't even getting to the second boot stage (the one on disc). _________________ Matthew Smith |
|
Back to top |
|
|
Weeve Retired Dev
Joined: 30 Oct 2002 Posts: 641
|
Posted: Sun Oct 01, 2006 11:00 pm Post subject: |
|
|
In your append line, try removing the comma after ide=nodma and remove the spaces after md0=, and /dev/hda1,. The resulting line should read:
Code: | append="ide=nodma md=0,/dev/hda1,/dev/hdc1" |
See if that does anything for you. |
|
Back to top |
|
|
smiffy Apprentice
Joined: 28 Jun 2006 Posts: 259 Location: SA.AU.AP.EARTH
|
Posted: Mon Oct 02, 2006 12:10 am Post subject: |
|
|
Tried changing that string - still no joy.
I don't think that the kernel parameters are even getting read here - Silo reckons that it can't even FIND a config file, and this is weird because, as I understand it, to see "S I L O" appear on the screen, it must have found the second stage boot loader, which lives in /boot.
Here is what I have in /boot:
Code: |
narsil boot # ls -l
total 4440
lrwxrwxrwx 1 root root 5 Sep 30 07:14 boot -> /boot
-rw-r--r-- 1 root root 1024 Sep 30 07:14 fd.b
-rw-r--r-- 1 root root 512 Sep 30 07:14 first.b
-rw-r--r-- 1 root root 1024 Sep 30 07:14 generic.b
-rw-r--r-- 1 root root 816 Sep 30 07:14 ieee32.b
-rw-r--r-- 1 root root 7112 Sep 30 07:14 isofs.b
drwx------ 2 root root 16384 Sep 30 07:04 lost+found
-rw-r--r-- 1 root root 7680 Sep 30 07:14 old.b
-rw-r--r-- 1 root root 7680 Sep 30 07:14 old.b-raid1-0
-rw-r--r-- 1 root root 7680 Sep 30 07:14 old.b-raid1-1
-rw-r--r-- 1 root root 67072 Oct 2 09:26 second.b
-rw-r--r-- 1 root root 125 Oct 2 09:26 silo.conf
-rw-r--r-- 1 root root 62436 Sep 30 07:14 silotftp.b
-rw-r--r-- 1 root root 512 Sep 30 07:14 ultra.b
-rwxr-xr-x 1 root root 4370088 Sep 30 07:14 vmlinux
|
Bearing in mind that I'm running silo -C /boot/silo.conf _________________ Matthew Smith |
|
Back to top |
|
|
Weeve Retired Dev
Joined: 30 Oct 2002 Posts: 641
|
Posted: Mon Oct 02, 2006 12:18 am Post subject: |
|
|
A few other suggestions to try if you haven't already (and they assume that /boot is now in your /etc/fstab file)
- Re-emerg SILO and re-run silo -C /boot/silo.conf
- When running silo, use the -f argument as that will force the overwriting of the boot block and may help.
- Use the silo -a /boot/silo.conf command to verify that silo thinks the syntax of your config file is OK
|
|
Back to top |
|
|
smiffy Apprentice
Joined: 28 Jun 2006 Posts: 259 Location: SA.AU.AP.EARTH
|
Posted: Mon Oct 02, 2006 2:00 am Post subject: |
|
|
I'd tried the -f, but hadn't got to the point of re-emerging silo.
Done that and just rebooting now...
Doh! No difference. OK, let's try deleting everything in /boot (except for the kernel and silo.conf) and re-emerge silo again:
Still no difference. So, we have:
* Completely replaced Silo (1.4.13), including prior removal of all of its files in /boot
* Re-written the boot block using -f
But:
* Silo cannot find a configuration file (I wish it would say what it was actually looking for - by name)
* Passing the parameters in silo.conf to the Silo command line results in a succesful boot.
Any more ideas here, or should I chuck this one to the Silo developers?
Thanks for your help on this, Weeve! _________________ Matthew Smith |
|
Back to top |
|
|
Weeve Retired Dev
Joined: 30 Oct 2002 Posts: 641
|
Posted: Mon Oct 02, 2006 2:20 am Post subject: |
|
|
I noticed in your earlier post of what was in /boot that boot was symlinked to /boot. What is it linked to now when /boot is mounted? It should be symlinked to . so if it isn't, try this in the /boot directory: ln -s . boot |
|
Back to top |
|
|
smiffy Apprentice
Joined: 28 Jun 2006 Posts: 259 Location: SA.AU.AP.EARTH
|
Posted: Mon Oct 02, 2006 2:35 am Post subject: |
|
|
Ah, that must have crept in when I created the new /boot partition (after I took it out of md0) - that was me creating the symlink incorrectly. (Don't even know why it's there - can't see the point in a /boot/boot===/boot)
Putting the correct symlink in doesn't fix the problem, but at least everything now looks like it should _________________ Matthew Smith |
|
Back to top |
|
|
Weeve Retired Dev
Joined: 30 Oct 2002 Posts: 641
|
Posted: Mon Oct 02, 2006 2:54 am Post subject: |
|
|
Hrm, one other thing to check. What is your boot-device in OBP set to and what does that device alias translate to?
For instance, my blade 100 is has its boot-device set to disk, and using the devalias command I can see that disk is set to /pci@1f,0/ide@d/disk@0,0. If you've built in /dev/openprom support and have it loaded in, you can use the command prtconf -p -v to see these as well. You'll want to pipe it into a pager like less as it's going to dump out a lot of output. |
|
Back to top |
|
|
smiffy Apprentice
Joined: 28 Jun 2006 Posts: 259 Location: SA.AU.AP.EARTH
|
Posted: Mon Oct 02, 2006 3:06 am Post subject: |
|
|
Just had a play with that. I'm not auto-booting as I had some disc trouble (turned out to be DMA switched on) and always want the option to 'boot cdrom'.
Default is just 'disk', although I have also tried 'disk0'. Working through /dev/openprom (trying to avoid yet another reboot), I find this:
Code: |
bootpath: '/pci@1f,0/ide@d/disk@0,0:a'
|
Whole prom thing dumped [url]http://www.smiffysplace.com/files/narsil_promconfig here[/url], just for the record.
I'm guessing that this is the default boot thingy - it certainly looks like what comes up on startup (when I type boot at the OK prompt), although I don't recall seeing the 'a' at the end before.
And yes, I will be changing the hostname from narsil to andruil, once everything's working OK. _________________ Matthew Smith |
|
Back to top |
|
|
Weeve Retired Dev
Joined: 30 Oct 2002 Posts: 641
|
Posted: Mon Oct 02, 2006 3:21 am Post subject: |
|
|
Yeah I get the same for bootpath here so I don't think that's an issue.
Sometimes SILO can have issues with certain versions of OBP (though we haven't really nailed down which). We normally suggest running the latest. I'm running 4.17.1 here and it appears you are running 4.0.45. You might consider upgrading.
There's no way to do it via Linux currently, but you can netboot the update itself. We have a guide for setting up a netboot server here if you are interested. Just substitute the OBP update for where the guide talks about the netboot image. You can get the OBP update from http://sunsolve.sun.com under the System Handbook entry for the Blade 100. |
|
Back to top |
|
|
smiffy Apprentice
Joined: 28 Jun 2006 Posts: 259 Location: SA.AU.AP.EARTH
|
Posted: Mon Oct 02, 2006 3:54 am Post subject: |
|
|
OK, thanks, netboot upgrade looks the way to go. As far as I can make out, I just supply the patch as the netboot kernel, yes?
I'll give it a go. (Might as well upgrade my Axi as well, if I have to set up the netboot server. That's got some daft boot issues, too.) _________________ Matthew Smith |
|
Back to top |
|
|
smiffy Apprentice
Joined: 28 Jun 2006 Posts: 259 Location: SA.AU.AP.EARTH
|
Posted: Mon Oct 02, 2006 4:37 am Post subject: |
|
|
Well, I now have shiny, new, firmware. That netboot upgrade was far simpler that I expected and not at all painful. Will now do my AXi as well.
Sadly, it hasn't fixed the Silo issue, even after re-running Silo with the -f flag.
For the record, the new prom data is here: http://www.smiffysplace.com/files/narsil_newpromconfig. _________________ Matthew Smith |
|
Back to top |
|
|
|