Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Human intervention always required on boot (Blade100, RAID1)
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Gentoo on Sparc
View previous topic :: View next topic  
Author Message
smiffy
Apprentice
Apprentice


Joined: 28 Jun 2006
Posts: 259
Location: SA.AU.AP.EARTH

PostPosted: Fri Sep 29, 2006 10:16 am    Post subject: Human intervention always required on boot (Blade100, RAID1) Reply with quote

I have been - very slowly - ironing out the problems on my Blade 100.

My system is based on 2006.0, kernel 2.6.17.5, "world" up to date. Running software RAID 1 on a pair of 80Gb IDE discs.

The most annoying problem that I still have is the fact that when I boot, I have to tell SILO my boot parameters every time, despite having a valid /etc/silo.conf - SILO always says "No config file loaded, you can boot from..."

I type in /boot/vmlinux root=/dev/md0, everything works fine. (/boot is not on a separate partition.)

Today, I just happened to try this:

Code:

narsil ~ # lsraid -a /dev/md0
lsraid: Device "/dev/hda1" does not have a valid raid superblock
lsraid: Device "/dev/hdc1" does not have a valid raid superblock
[dev   9,   0] /dev/md0         00000000.00000000.00000000.00000000 online
[dev   ?,   ?] (unknown)        00000000.00000000.00000000.00000000 missing
[dev   ?,   ?] (unknown)        00000000.00000000.00000000.00000000 missing


Could this be related to the problem? I shut down, booted from the liveCD and created the array from scratch (mdadm --create ... ), thinking that this might fix things, but I still don't have valid raid superblocks (as above). My /dev/md1 (/swap), however, looks a lot more healthy:

Code:

narsil ~ # lsraid -a /dev/md1
[dev   9,   1] /dev/md1         F3902FDF.A1C764DA.547D8643.F45680D9 online
[dev  22,   2] /dev/hdc2        F3902FDF.A1C764DA.547D8643.F45680D9 good
[dev   3,   2] /dev/hda2        F3902FDF.A1C764DA.547D8643.F45680D9 good


So, 1) would the bad superblocks be the cause of my boot problems and 2) how do I fix them?

Ideas anyone?
_________________
Matthew Smith
Back to top
View user's profile Send private message
Weeve
Retired Dev
Retired Dev


Joined: 30 Oct 2002
Posts: 641

PostPosted: Fri Sep 29, 2006 4:12 pm    Post subject: Reply with quote

SILO cannot read from anything other than ext2/3, so you have to make sure that when you run SILO from Linux, you tell it the path to a config file where it will be able to read it from. It won't be able to read it from inside the RAID configuration.

Also if you already have a /boot partition seperate from /, then /etc/silo.conf needs to live in that /boot parition. You can tell SILO the alternative location to the config using the command silo -C /boot/silo.conf.
Back to top
View user's profile Send private message
smiffy
Apprentice
Apprentice


Joined: 28 Jun 2006
Posts: 259
Location: SA.AU.AP.EARTH

PostPosted: Fri Sep 29, 2006 9:45 pm    Post subject: Reply with quote

Hmm. I wonder if, therefore, I should create a separate /boot, non-RAID, with a copy on each disc...

Due to mis-reading the partition data when doing the original partitioning, I happen to have 32Mb spare on each disc.
_________________
Matthew Smith
Back to top
View user's profile Send private message
smiffy
Apprentice
Apprentice


Joined: 28 Jun 2006
Posts: 259
Location: SA.AU.AP.EARTH

PostPosted: Sat Sep 30, 2006 12:16 am    Post subject: Reply with quote

OK - I now have a separate /boot (partition 4 of my first disc). I have put my silo.conf in there and am running silo with -C /boot/silo.conf. I don't seem to have got any further with the thing booting though.

No matter what I seem to do, I just get "No config file loaded, you can boot just from this command line". And yes, I can, no problems. Now that I'm using the separate (non-RAID) partition, I just boot thus:
Code:
boot: 4/vmlinux root=/dev/md0


My silo.conf looked like this:
Code:

partition = 4
root = /dev/md0
timeout = 50

image = /vmlinux
label = Linux

(Previously had image = /boot/vmlinux, but this did not work either).

Ideas anyone?
_________________
Matthew Smith
Back to top
View user's profile Send private message
Weeve
Retired Dev
Retired Dev


Joined: 30 Oct 2002
Posts: 641

PostPosted: Sat Sep 30, 2006 7:35 pm    Post subject: Reply with quote

What does your partition table look like?
Back to top
View user's profile Send private message
smiffy
Apprentice
Apprentice


Joined: 28 Jun 2006
Posts: 259
Location: SA.AU.AP.EARTH

PostPosted: Sat Sep 30, 2006 8:38 pm    Post subject: Reply with quote

Here you go Weeve:

Code:

Disk /dev/hda (Sun disk label): 255 heads, 63 sectors, 10009 cylinders
Units = cylinders of 16065 * 512 bytes

   Device Flag    Start       End    Blocks   Id  System
/dev/hda1             0      9483  76172197+  fd  Linux raid autodetect
/dev/hda2          9843     10009   1333395   fd  Linux raid autodetect
/dev/hda3             0     10009  80397292+   5  Whole disk
/dev/hda4          9483      9843   2891700   83  Linux native


Note that /dev/hda4 is the new partition that I created with "spare" space for /boot. Originally, /boot was on /dev/md0, which is comprised of /dev/hda1, /dev/hdc1. The new /boot is formatted with ext2.[/code]
_________________
Matthew Smith
Back to top
View user's profile Send private message
Weeve
Retired Dev
Retired Dev


Joined: 30 Oct 2002
Posts: 641

PostPosted: Sat Sep 30, 2006 9:14 pm    Post subject: Reply with quote

Ok that looks normal. Do you have a line in your silo.conf that looks similar to the following:
Code:
append="md=0,/dev/sda4,/dev/sdb1"
where /dev/sda4 is the first partition in your raid and /dev/sdb1 is the second partition in your raid?

I have a linear raid setup here with /dev/sda4 and /dev/sdb1 as the two partitions in the raid and /dev/sda1 as my /boot partition. My silo.conf looks like the following:
Code:
partition = 1
root = /dev/md0
timeout = 100
append="md=0,/dev/sda4,/dev/sdb1"
read-only
default=2.6.17
image = /kernel-2.6.17
        label = 2.6.17
Back to top
View user's profile Send private message
smiffy
Apprentice
Apprentice


Joined: 28 Jun 2006
Posts: 259
Location: SA.AU.AP.EARTH

PostPosted: Sat Sep 30, 2006 11:59 pm    Post subject: Reply with quote

OK, thanks Weeve, let's give that a go. My /boot/silo.conf now reads:
Code:

partition = 4
root = /dev/md0
timeout = 50
image = /vmlinux
  label = Linux
  append = "ide=nodma, md=0, /dev/hda1, /dev/hdc1"


/me runs silo -C /boot/silo.conf and reboots...

...still the same result. For whatever reason, silo is not finding my silo.conf (I guess), even though it is now in /boot, on it's own non-RAID, ext2 partition.

All I have to do to fire it up is to type 4/vmlinux ide=nodma root=/dev/md0 at the silo boot prompt - then it all fires up normally.

So, it isn't the lack of those parameters being passed to the kernel, as we aren't even getting to the second boot stage (the one on disc).
_________________
Matthew Smith
Back to top
View user's profile Send private message
Weeve
Retired Dev
Retired Dev


Joined: 30 Oct 2002
Posts: 641

PostPosted: Sun Oct 01, 2006 11:00 pm    Post subject: Reply with quote

In your append line, try removing the comma after ide=nodma and remove the spaces after md0=, and /dev/hda1,. The resulting line should read:
Code:
append="ide=nodma md=0,/dev/hda1,/dev/hdc1"

See if that does anything for you.
Back to top
View user's profile Send private message
smiffy
Apprentice
Apprentice


Joined: 28 Jun 2006
Posts: 259
Location: SA.AU.AP.EARTH

PostPosted: Mon Oct 02, 2006 12:10 am    Post subject: Reply with quote

Tried changing that string - still no joy.

I don't think that the kernel parameters are even getting read here - Silo reckons that it can't even FIND a config file, and this is weird because, as I understand it, to see "S I L O" appear on the screen, it must have found the second stage boot loader, which lives in /boot.

Here is what I have in /boot:

Code:

narsil boot # ls -l
total 4440
lrwxrwxrwx 1 root root       5 Sep 30 07:14 boot -> /boot
-rw-r--r-- 1 root root    1024 Sep 30 07:14 fd.b
-rw-r--r-- 1 root root     512 Sep 30 07:14 first.b
-rw-r--r-- 1 root root    1024 Sep 30 07:14 generic.b
-rw-r--r-- 1 root root     816 Sep 30 07:14 ieee32.b
-rw-r--r-- 1 root root    7112 Sep 30 07:14 isofs.b
drwx------ 2 root root   16384 Sep 30 07:04 lost+found
-rw-r--r-- 1 root root    7680 Sep 30 07:14 old.b
-rw-r--r-- 1 root root    7680 Sep 30 07:14 old.b-raid1-0
-rw-r--r-- 1 root root    7680 Sep 30 07:14 old.b-raid1-1
-rw-r--r-- 1 root root   67072 Oct  2 09:26 second.b
-rw-r--r-- 1 root root     125 Oct  2 09:26 silo.conf
-rw-r--r-- 1 root root   62436 Sep 30 07:14 silotftp.b
-rw-r--r-- 1 root root     512 Sep 30 07:14 ultra.b
-rwxr-xr-x 1 root root 4370088 Sep 30 07:14 vmlinux


Bearing in mind that I'm running silo -C /boot/silo.conf
_________________
Matthew Smith
Back to top
View user's profile Send private message
Weeve
Retired Dev
Retired Dev


Joined: 30 Oct 2002
Posts: 641

PostPosted: Mon Oct 02, 2006 12:18 am    Post subject: Reply with quote

A few other suggestions to try if you haven't already (and they assume that /boot is now in your /etc/fstab file)

  • Re-emerg SILO and re-run silo -C /boot/silo.conf
  • When running silo, use the -f argument as that will force the overwriting of the boot block and may help.
  • Use the silo -a /boot/silo.conf command to verify that silo thinks the syntax of your config file is OK
Back to top
View user's profile Send private message
smiffy
Apprentice
Apprentice


Joined: 28 Jun 2006
Posts: 259
Location: SA.AU.AP.EARTH

PostPosted: Mon Oct 02, 2006 2:00 am    Post subject: Reply with quote

I'd tried the -f, but hadn't got to the point of re-emerging silo.

Done that and just rebooting now...

Doh! No difference. OK, let's try deleting everything in /boot (except for the kernel and silo.conf) and re-emerge silo again:

Still no difference. So, we have:

* Completely replaced Silo (1.4.13), including prior removal of all of its files in /boot
* Re-written the boot block using -f

But:

* Silo cannot find a configuration file (I wish it would say what it was actually looking for - by name)
* Passing the parameters in silo.conf to the Silo command line results in a succesful boot.

Any more ideas here, or should I chuck this one to the Silo developers?

Thanks for your help on this, Weeve!
_________________
Matthew Smith
Back to top
View user's profile Send private message
Weeve
Retired Dev
Retired Dev


Joined: 30 Oct 2002
Posts: 641

PostPosted: Mon Oct 02, 2006 2:20 am    Post subject: Reply with quote

I noticed in your earlier post of what was in /boot that boot was symlinked to /boot. What is it linked to now when /boot is mounted? It should be symlinked to . so if it isn't, try this in the /boot directory: ln -s . boot
Back to top
View user's profile Send private message
smiffy
Apprentice
Apprentice


Joined: 28 Jun 2006
Posts: 259
Location: SA.AU.AP.EARTH

PostPosted: Mon Oct 02, 2006 2:35 am    Post subject: Reply with quote

Ah, that must have crept in when I created the new /boot partition (after I took it out of md0) - that was me creating the symlink incorrectly. (Don't even know why it's there - can't see the point in a /boot/boot===/boot)

Putting the correct symlink in doesn't fix the problem, but at least everything now looks like it should ;-)
_________________
Matthew Smith
Back to top
View user's profile Send private message
Weeve
Retired Dev
Retired Dev


Joined: 30 Oct 2002
Posts: 641

PostPosted: Mon Oct 02, 2006 2:54 am    Post subject: Reply with quote

Hrm, one other thing to check. What is your boot-device in OBP set to and what does that device alias translate to?

For instance, my blade 100 is has its boot-device set to disk, and using the devalias command I can see that disk is set to /pci@1f,0/ide@d/disk@0,0. If you've built in /dev/openprom support and have it loaded in, you can use the command prtconf -p -v to see these as well. You'll want to pipe it into a pager like less as it's going to dump out a lot of output.
Back to top
View user's profile Send private message
smiffy
Apprentice
Apprentice


Joined: 28 Jun 2006
Posts: 259
Location: SA.AU.AP.EARTH

PostPosted: Mon Oct 02, 2006 3:06 am    Post subject: Reply with quote

Just had a play with that. I'm not auto-booting as I had some disc trouble (turned out to be DMA switched on) and always want the option to 'boot cdrom'.

Default is just 'disk', although I have also tried 'disk0'. Working through /dev/openprom (trying to avoid yet another reboot), I find this:
Code:

bootpath: '/pci@1f,0/ide@d/disk@0,0:a'


Whole prom thing dumped [url]http://www.smiffysplace.com/files/narsil_promconfig here[/url], just for the record.

I'm guessing that this is the default boot thingy - it certainly looks like what comes up on startup (when I type boot at the OK prompt), although I don't recall seeing the 'a' at the end before.

And yes, I will be changing the hostname from narsil to andruil, once everything's working OK.
_________________
Matthew Smith
Back to top
View user's profile Send private message
Weeve
Retired Dev
Retired Dev


Joined: 30 Oct 2002
Posts: 641

PostPosted: Mon Oct 02, 2006 3:21 am    Post subject: Reply with quote

Yeah I get the same for bootpath here so I don't think that's an issue.

Sometimes SILO can have issues with certain versions of OBP (though we haven't really nailed down which). We normally suggest running the latest. I'm running 4.17.1 here and it appears you are running 4.0.45. You might consider upgrading.

There's no way to do it via Linux currently, but you can netboot the update itself. We have a guide for setting up a netboot server here if you are interested. Just substitute the OBP update for where the guide talks about the netboot image. You can get the OBP update from http://sunsolve.sun.com under the System Handbook entry for the Blade 100.
Back to top
View user's profile Send private message
smiffy
Apprentice
Apprentice


Joined: 28 Jun 2006
Posts: 259
Location: SA.AU.AP.EARTH

PostPosted: Mon Oct 02, 2006 3:54 am    Post subject: Reply with quote

OK, thanks, netboot upgrade looks the way to go. As far as I can make out, I just supply the patch as the netboot kernel, yes?

I'll give it a go. (Might as well upgrade my Axi as well, if I have to set up the netboot server. That's got some daft boot issues, too.)
_________________
Matthew Smith
Back to top
View user's profile Send private message
smiffy
Apprentice
Apprentice


Joined: 28 Jun 2006
Posts: 259
Location: SA.AU.AP.EARTH

PostPosted: Mon Oct 02, 2006 4:37 am    Post subject: Reply with quote

Well, I now have shiny, new, firmware. That netboot upgrade was far simpler that I expected and not at all painful. Will now do my AXi as well.

Sadly, it hasn't fixed the Silo issue, even after re-running Silo with the -f flag.

For the record, the new prom data is here: http://www.smiffysplace.com/files/narsil_newpromconfig.
_________________
Matthew Smith
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo on Sparc All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum