Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
RAID Problems on boot Kernel: 2.4.32-sparc-r2
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Gentoo on Sparc
View previous topic :: View next topic  
Author Message
endemic
n00b
n00b


Joined: 06 Oct 2003
Posts: 23
Location: Dayton, OH

PostPosted: Thu Apr 20, 2006 4:38 am    Post subject: RAID Problems on boot Kernel: 2.4.32-sparc-r2 Reply with quote

After upgrading my Ultra 60 I am unable to mount my RAID1 array to / on boot. Here is what happens during boot:

Code:

Remapping the kernel... done.
Booting Linux...
Starting CPU 2... OK
PROMLIB: Sun IEEE Boot Prom 3.17.0 1998/10/23 11:26
Linux version 2.4.32-sparc-r2 (root@sirius) (gcc version 3.4.5 (Gentoo Linux 3.4.5)) #5 SMP Thu Apr 20 03:30:05 UTC 2006
ARCH: SUN4U
Ethernet address: 08:00:20:a1:50:7a
On node 0 totalpages: 130532
zone(0): 130967 pages.
zone(1): 0 pages.
zone(2): 0 pages.
Found CPU 0 (node=f006d394,mid=0)
Found CPU 1 (node=f006d700,mid=2)
Found 2 CPU prom device tree node(s).
Kernel command line: root=/dev/md0 md=0,/dev/sda4,/dev/sdb4
md: Will configure md0 (super-block) from /dev/sda4,/dev/sdb4, below.
Calibrating delay loop... 897.84 BogoMIPS
Memory: 1031912k available (1800k kernel code, 464k data, 360k init) [fffff80000000000,00000000bff2e000]
Dentry cache hash table entries: 131072 (order: 8, 2097152 bytes)
Inode cache hash table entries: 65536 (order: 7, 1048576 bytes)
Mount cache hash table entries: 512 (order: 0, 8192 bytes)
Buffer cache hash table entries: 65536 (order: 6, 524288 bytes)
Page-cache hash table entries: 131072 (order: 7, 1048576 bytes)
POSIX conformance testing by UNIFIX
Entering UltraSMPenguin Mode...
Calibrating delay loop... 897.84 BogoMIPS
Total of 2 processors activated (1795.68 BogoMIPS).
Waiting on wait_init_idle (map = 0x4)
CPU 2: synchronized TICK with master CPU (last diff 0 cycles,maxerr 544 cycles)
All processors have done init_idle
PCI: Probing for controllers.
PCI: Found PSYCHO, control regs at 000001fe00000000
PSYCHO: Shared PCI config space at 000001fe01000000
PCI: Address space collision on region 2 [000001ff80080000:000001ff800fffff] of device Digital Equipment Corporation DECchip 21554
PCI-IRQ: Routing bus[ 0] slot[ 1] map[0] to INO[21]
PCI-IRQ: Routing bus[ 0] slot[ 3] map[0] to INO[20]
PCI-IRQ: Routing bus[ 0] slot[ 3] map[0] to INO[26]
PCI-IRQ: Routing bus[ 0] slot[ 4] map[0] to INO[18]
PCI0(PBMB): Bus running at 33MHz
PCI0(PBMA): Bus running at 66MHz
ebus0: [auxio] [power] [SUNW,pll] [sc] [se] [su] [su] [ecpp] [fdthree] [eeprom] [flashprom] [SUNW,CS4231]
PCIO serial driver version 1.54
su(mouse) at 0x1fff13062f8 (irq = 4,7ea) is a 16550A
Sun Mouse-Systems mouse driver version 1.00
su(kbd) at 0x1fff13083f8 (irq = 9,7e9) is a 16550A
keyboard: not present
SAB82532 serial driver version 1.65
ttyS00 at 0x1fff1400000 (irq = 12,7eb) is a SAB82532 V3.2
ttyS01 at 0x1fff1400040 (irq = 12,7eb) is a SAB82532 V3.2
Console: ttyS0 (SAB82532)
power: Control reg at 000001fff1724000 ... not using powerd.
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
Starting kswapd
VFS: Disk quotas vdquot_6.5.1
Journalled Block Device driver loaded
devfs: v1.12c (20020818) Richard Gooch (rgooch@atnf.csiro.au)
devfs: boot_options: 0x1
pty: 256 Unix98 ptys configured
rtc_init: no PC rtc found
Floppy drive(s): fd0 is 1.44M
FDC 0 is a National Semiconductor PC87306
SCSI subsystem driver Revision: 1.00
sym.0.3.0: setting PCI_COMMAND_INVALIDATE.
sym.0.3.1: setting PCI_COMMAND_PARITY...
sym.0.3.1: setting PCI_COMMAND_INVALIDATE.
sym0: <875> rev 0x14 on pci bus 0 device 3 function 0 irq 4,7e0
sym0: No NVRAM, ID 7, Fast-20, SE, parity checking
sym0: SCSI BUS has been reset.
sym1: <875> rev 0x14 on pci bus 0 device 3 function 1 irq 4,7e6
sym1: No NVRAM, ID 7, Fast-20, SE, parity checking
sym1: SCSI BUS has been reset.
scsi0 : sym-2.1.17a
scsi1 : sym-2.1.17a
  Vendor: FUJITSU   Model: MAN3367MC         Rev: 0107
  Type:   Direct-Access                      ANSI SCSI revision: 03
  Vendor: FUJITSU   Model: MAN3367MC         Rev: 0108
  Type:   Direct-Access                      ANSI SCSI revision: 03
  Vendor: TOSHIBA   Model: XM6201TASUN32XCD  Rev: 1103
  Type:   CD-ROM                             ANSI SCSI revision: 02
sym0:0:0: tagged command queuing enabled, command queue depth 16.
sym0:1:0: tagged command queuing enabled, command queue depth 16.
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Attached scsi disk sdb at scsi0, channel 0, id 1, lun 0
sym0:0: FAST-20 WIDE SCSI 40.0 MB/s ST (50.0 ns, offset 16)
SCSI device sda: 71771688 512-byte hdwr sectors (36747 MB)
Partition check:
 /dev/scsi/host0/bus0/target0/lun0: p1 p2 p3 p4
sym0:1: FAST-20 WIDE SCSI 40.0 MB/s ST (50.0 ns, offset 16)
SCSI device sdb: 71771688 512-byte hdwr sectors (36747 MB)
 /dev/scsi/host0/bus0/target1/lun0: p1 p2 p3 p4
md: raid1 personality registered as nr 3
md: Autodetecting RAID arrays.
 [events: 000000c8]
 [events: 000000c8]
md: autorun ...
md: considering scsi/host0/bus0/target1/lun0/part4 ...
md:  adding scsi/host0/bus0/target1/lun0/part4 ...
md:  adding scsi/host0/bus0/target0/lun0/part4 ...
md: created md0
md: bind<scsi/host0/bus0/target0/lun0/part4,1>
md: bind<scsi/host0/bus0/target1/lun0/part4,2>
md: running: <scsi/host0/bus0/target1/lun0/part4><scsi/host0/bus0/target0/lun0/part4>
md: scsi/host0/bus0/target1/lun0/part4's event counter: 000000c8
md: scsi/host0/bus0/target0/lun0/part4's event counter: 000000c8
md: RAID level 1 does not need chunksize! Continuing anyway.
md0: max total readahead window set to 248k
md0: 1 data-disks, max readahead per data-disk: 248k
raid1: device scsi/host0/bus0/target1/lun0/part4 operational as mirror 0
raid1: device scsi/host0/bus0/target0/lun0/part4 operational as mirror 1
raid1: raid set md0 active with 2 out of 2 mirrors
md: updating md0 RAID superblock on device
md: scsi/host0/bus0/target1/lun0/part4 [events: 000000c9]<6>(write) scsi/host0/bus0/target1/lun0/part4's sb offset: 35316928
md: scsi/host0/bus0/target0/lun0/part4 [events: 000000c9]<6>(write) scsi/host0/bus0/target0/lun0/part4's sb offset: 35316928
md: ... autorun DONE.
md: Ignoring md=0, already autodetected. (Use raid=noautodetect)
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
Initializing Cryptographic API
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 8192 buckets, 128Kbytes
TCP: Hash tables configured (established 65536 bind 65536)
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Mounted devfs on /dev
þversion 2.86 booting


Gentoo Linux; http://www.gentoo.org/
 Copyright 1999-2005 Gentoo Foundation; Distributed under the GPLv2

 * Mounting proc at /proc ...                                             [ ok ]
 * Kernel automatically mounted devfs at /dev ...                         [ ok ]
 * Starting devfsd ...Started device management daemon v1.3.25 for /dev
                                                    [ ok ]
 Adding Swap: 499664k swap-space (priority -1)
* ActivatinAdding Swap: 499664k swap-space (priority -2)
g (possible) swap ...                                         [ ok ]
 * Checking root filesystem ...fsck.ext3: Invalid argument while trying to open /dev/md0

/dev/md0:
The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>


 * Filesystem couldn't be fixed :(
                                                                          [ !! ]
Give root password for maintenance
(or type Control-D to continue): Type  'go' to resume


I have raid all configured properly and I believe everything is set up in the kernel as such as well: i.e. RAID1 built in as well as RAID support. As you can see, the RAID array starts properly but it is unable to mount it as the root device. I dont have this problem rolling back to kernel 2.4.23-sparc-r1 however I only have the image for this kernel so it is somewhat important I upgrade to the newer kernel available as the latest stable kernel on portage for sparc64.

If I boot the live cd I am able to mount /dev/md0 with no issues (i.e. filesystem is intact and working properly).

Does anyone have any ideas?
Back to top
View user's profile Send private message
overkll
Veteran
Veteran


Joined: 21 Sep 2004
Posts: 1249
Location: Austin, Texas

PostPosted: Sat Apr 22, 2006 4:57 pm    Post subject: Reply with quote

Looking at your boot log, fsck.ext3 is being called, but the error message is for an ext2 partition.

Do you have an ext3 or ext2 partition on the md devices?

I don't have a sparc, but I did see a similar error message when I was converting my raid devices from reiserfs to ext3. If I remember correctly, mke2fs was the issue. I thought I was making ext3 files system with the command
Code:
mke2fs -j -O dir_index /dev/<md device>

Instead of adding the the feature "dir_index" to the default filesystem features, it eliminated the defaults, just adding the "dir_index" option. So I lost the "has_journal" feature which turned what I thought was an ext3 partition into a ext2 partition. I had to specify all of the filesystem options on the command line for it to work successfully.

Again, relying on memory here, but I think the command I used to correct the issue was:
Code:
mke2fs -O has_journal,dir_index,filetype,sparse_super /dev/mdx

to format the partition or
Code:
tune2fs -O <needed feature(s)> /dev/mdx

to add the features (like has_journal) to an existing ext partition.

What is the output of tune2fs -l /dev/md(x) where x is the raid device in question?, especially line 7 of the output labeled "Filesystem Features". Here's what a correct ext3 Filesystem Features with the "dir_index" feature should look like with tune2fs:
Code:
tune2fs 1.38 (30-Jun-2005)
Filesystem volume name:   <none>
Last mounted on:          <not available>
Filesystem UUID:          7f357a52-c6cb-4353-bddf-0dbc4b130d0f
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal dir_index filetype needs_recovery sparse_super
...


You may also want to check your /etc/fstab and make sure the right fs is specified.

WARNING: While one can use "tune2fs -l" (lowercase L) on a mounted filesystem, do not use tune2fs to MODIFY a mounted filesystem. Its adviseable to read the man pages for tune2fs and mke2fs.
Back to top
View user's profile Send private message
endemic
n00b
n00b


Joined: 06 Oct 2003
Posts: 23
Location: Dayton, OH

PostPosted: Sat Apr 22, 2006 5:39 pm    Post subject: Reply with quote

The md device is most certainly ext3. This is the first kernel I've experienced this issue with.

Here is the output of tune2fs -l /dev/md0:

Code:

tune2fs 1.38 (30-Jun-2005)
Filesystem volume name:   <none>
Last mounted on:          <not available>
Filesystem UUID:          45e9618d-3ec6-4ff4-a4bc-fcddc3a3d654
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal filetype needs_recovery sparse_super
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              4415040
Block count:              8829232
Reserved block count:     441461
Free blocks:              7991830
Free inodes:              4165787
First block:              0
Block size:               4096
Fragment size:            4096
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         16352
Inode blocks per group:   511
Filesystem created:       Tue Apr  4 20:37:26 2006
Last mount time:          Thu Apr 20 10:31:41 2006
Last write time:          Thu Apr 20 10:31:41 2006
Mount count:              11
Maximum mount count:      24
Last checked:             Mon Apr 17 18:49:17 2006
Check interval:           15552000 (6 months)
Next check after:         Sat Oct 14 18:49:17 2006
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               128
Journal inode:            8
First orphan inode:       3385433
Default directory hash:   tea
Directory Hash Seed:      080840b6-ffb8-4037-b70a-04f94c7fb081
Journal backup:           inode blocks


Also for good measure incase I'm missing something here is the contents of /etc/fstab:

Code:

# <fs>                  <mountpoint>    <type>          <opts>          <dump/pass>

# NOTE: If your BOOT partition is ReiserFS, add the notail option to opts.
/dev/sda1               /boot           ext2            noauto,noatime  1 2
/dev/md0                /               ext3            defaults        1 2
/dev/sda2               none            swap            sw              0 0
/dev/sdb2               none            swap            sw              0 0
/dev/cdroms/cdrom0      /mnt/cdrom      iso9660         noauto,ro       0 0
#/dev/fd0               /mnt/floppy     auto            noauto          0 0

# NOTE: The next line is critical for boot!
proc                    /proc           proc            defaults        0 0

# glibc 2.2 and above expects tmpfs to be mounted at /dev/shm for
# POSIX shared memory (shm_open, shm_unlink).
# (tmpfs is a dynamically expandable/shrinkable ramdisk, and will
#  use almost no memory if not populated with files)
shm                     /dev/shm        tmpfs           nodev,nosuid,noexec     0 0


What doesn't make sense to me is how it works flawlessly when using an older Kernel image I had backed up for this machine (2.4.23-sparc-r1). I'm pretty certain I have everything I need compiled in the new kernel I would like to migrate to (2.4.32-sparc-r2): i.e. ext3 support and RAID 1
Back to top
View user's profile Send private message
overkll
Veteran
Veteran


Joined: 21 Sep 2004
Posts: 1249
Location: Austin, Texas

PostPosted: Sun Apr 23, 2006 2:30 pm    Post subject: Reply with quote

Quote:
What doesn't make sense to me is how it works flawlessly when using an older Kernel image I had ...

Yes, that is odd. What you posted looks correct, except for the md0 line in your /etc/fstab file. According to the fstab man page, the root file system should have "1" in the last field, not "2". Don't know if thats the issue or not but it wouldn't hurt to check. It would be nice if the fix was that easy.
Back to top
View user's profile Send private message
endemic
n00b
n00b


Joined: 06 Oct 2003
Posts: 23
Location: Dayton, OH

PostPosted: Sun Apr 23, 2006 3:51 pm    Post subject: Reply with quote

Yeah, I've tried so many different things thinking I was doing something stupid with fstab... The old kernel image mounts it fine with fstab the way it is. I'll change it anyways just to be correct.

I'm wondering if it could be a bug with this version of sparc sources?
Back to top
View user's profile Send private message
starrbuck
Tux's lil' helper
Tux's lil' helper


Joined: 04 Apr 2005
Posts: 138
Location: North Texas

PostPosted: Wed Apr 26, 2006 9:19 pm    Post subject: Reply with quote

This issue just started for me, however, my 2.4.32-sparc-r2 is working fine while my 2.4.32-sparc-r4 is broken as described here.

I got around the issue for now by disabling the bootup file system check for root:

Code:
/dev/md0                /               ext3            noatime                 0 0

I'd rather not leave it like this so I would like to find the solution.

Interestingly, I noticed that my r4 kernel is a bit smaller than my r2 kernel, which still boots fine:

Code:
lavon boot # ls -l kernel*
-rwxr-xr-x 1 root root 2404464 Apr  4 11:12 kernel-2.4.32-r2
-rwxr-xr-x 1 root root 2365812 Apr 26 15:17 kernel-2.4.32-r4

I used the .config file from the r2 kernel build to make the r4. I wonder what could be different between the two kernels to cause the bootup error?
_________________
Gentoo Linux is groovy, baby! Yeah!
Back to top
View user's profile Send private message
overkll
Veteran
Veteran


Joined: 21 Sep 2004
Posts: 1249
Location: Austin, Texas

PostPosted: Thu Apr 27, 2006 6:34 pm    Post subject: Reply with quote

Have y'all check bugs.gentoo.org for this bug? If there isn't one, maybe y'all should report this do the devs.
Back to top
View user's profile Send private message
starrbuck
Tux's lil' helper
Tux's lil' helper


Joined: 04 Apr 2005
Posts: 138
Location: North Texas

PostPosted: Thu Apr 27, 2006 6:57 pm    Post subject: Reply with quote

This bug appears to describe the issue but it hasn't been touched since last month.

The SPARC devs read this forum, so between the bug and this message thread, I'm confident they are at least aware of it.
_________________
Gentoo Linux is groovy, baby! Yeah!
Back to top
View user's profile Send private message
NetrixTardis
n00b
n00b


Joined: 21 Sep 2004
Posts: 12
Location: Selma, TX

PostPosted: Fri Apr 28, 2006 2:01 am    Post subject: Reply with quote

my findings are as followed:

the changes with tune2fs recommended by overkll didn't change anything. the changes to the fstab for the / (root) to 0 0 "fixed" the issue. however, if you force a fsck upon reboot, back to the same error. so, for the temp fix, just edit your /etc/fstab, and change the check/dump to "0 0" for the time being to get your box working with 2.4.32-r4. just don't attempt to force a fsck. in this case, reboot to the old kernel (you do keep a backup kernel to boot from don't you? <G>) so that it may run with the fsck, then boot back into -r4.

hopefully, some sparc devs are looking at this...

NetrixTardis



ps. tested and working on a Sun Ultra1-170E.
Back to top
View user's profile Send private message
markpr
n00b
n00b


Joined: 07 Nov 2004
Posts: 26

PostPosted: Sat Apr 29, 2006 2:54 am    Post subject: Reply with quote

That's great info. I had the same problems tried several rebuilds and came up with the same worakround (no fsck). This just wasn't good enough for me so I scrapped my plans for some servers and went back to Solaris and SVM. I'm now going to go with your suggestion and after building with r4 I'll build an r2 kernel just-in-case. I had abandoned raid on gentoo-sparc unti this problem was better understand.. but this is good enough for now.
Back to top
View user's profile Send private message
starrbuck
Tux's lil' helper
Tux's lil' helper


Joined: 04 Apr 2005
Posts: 138
Location: North Texas

PostPosted: Sat Apr 29, 2006 9:36 pm    Post subject: Reply with quote

Excellent advice, NetrixTardis. I always keep the previous kernel around for issues like this. Hopefully by the next kernal release the issue will be repaired, but if not I can keep the r2 kernel around for its properly-working fsck.
_________________
Gentoo Linux is groovy, baby! Yeah!
Back to top
View user's profile Send private message
gaidh
n00b
n00b


Joined: 23 Jun 2006
Posts: 4

PostPosted: Wed Jun 28, 2006 7:15 pm    Post subject: Reply with quote

Just to add a little confirmation, I've got the same problem on an Enterprise 220R (2xT1 UltraSparc II, 2GB mem, 2x 37GB U160s on a SYM53C875).

I tried setting up root raid1 (which I've done may times on other archs) during system install, and it all went smooth until rebooting. At that time I received exactly what is described in the first post on this thread: fsck.ext3 errors relating to a corrupt superblock. `tune2fs -l /dev/md0` would also give the corrupt superblock error. However, if I boot under the liveCD (2006.0) and assemble the device there, fsck and tune2fs would work perfectly. It seems most likely to be a problem with the way md assembles the array at boot time, which is different from the way it is done after the system has booted.

I tried both 2.4.32-gentoo-r2 and r4, both give the same problem. Unfortunately I don't have a source tree or an image from an earlier kernel, as this is a fresh install. Can someone provide a source tree for sparc-sources-2.4.23 or a similar version (which seems to work)? If I can confirm that this kernel works (as described in the first post) I'll do some hunting for where it went wrong (unless someone is already doing this, which I don't see any indication of).
Back to top
View user's profile Send private message
Weeve
Retired Dev
Retired Dev


Joined: 30 Oct 2002
Posts: 641

PostPosted: Wed Jun 28, 2006 9:27 pm    Post subject: Reply with quote

For folks having problems, what do your silo.conf files look like?
Back to top
View user's profile Send private message
P3SM
Tux's lil' helper
Tux's lil' helper


Joined: 13 Apr 2006
Posts: 93
Location: Gronsveld - The Netherlands

PostPosted: Thu Jun 29, 2006 11:07 am    Post subject: Reply with quote

Just my 2c but I had a similar problem with my Raid 1 that was working fine until reboot, giving corrupt superblocks. I solved it by rewriting the superblocks and resizing the array. Never showed up again :D
Not sure if this is the same thing though :?:
_________________
Smaug: Sun Netra T1 105, UltraSPARC-IIi 440MHz, 512MB, 2*36GB 10kRPM; 2 Sun Netra D130: 6*36GB 10kRPM, swraid 0
Haku: Dual P3 Xeon 500MHz, 512MB; Sun Multipack: 12*18GB 10kRPM, hwraid 5
Falkor: Sun SparcStation LX, 128 MB, 2.1GB
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo on Sparc All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum