lvm2 corruption on reboot

redwood · Guru Joined: 27 Jan 2006 Posts: 306

Hi, I'm hoping somebody on this forum might know why
some of my LVM2 lv's are getting corrupted upon rebooting.

My setup is and amd64 computer with 3 160g SATAII harddrives
connected to a PromiseTX4 PCI card.
Two of the drives, sdb & sdc, are mirrored using software RAID1,
divided into 4 partitions:
/dev/sd{b,c}1 /dev/md1 /boot
/dev/sd{b,c}2 sw,pri=1
/dev/sd{b,c}3 /dev/md3 /
/dev/sd{b,c}4 /dev/md4 LVM2 /dev/mapper/main-{opt,usr,home,var,vartmp,archives}

# tail -n 4 /etc/mdadm.conf
#ARRAY /dev/md2 level=raid0 num-devices=2 UUID=1804dc2c:dd6f8317:556141bf:402ddc1e
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=3a2324a3:84d53c59:a2e40f9f:47883ab1
ARRAY /dev/md3 level=raid1 num-devices=2 UUID=d3501709:d15904eb:b42d50ea:bbe26b87
ARRAY /dev/md4 level=raid1 num-devices=2 UUID=a3d99254:3cc875d7:166fc9cd:fa754fb8

# sfdisk -l /dev/sdb /dev/sdc

Disk /dev/sdb: 19457 cylinders, 255 heads, 63 sectors/track
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

Device Boot Start End #cyls #blocks Id System
/dev/sdb1 * 0+ 11 12- 96358+ fd Linux raid autodetect
/dev/sdb2 12 254 243 1951897+ 82 Linux swap / Solaris
/dev/sdb3 255 862 608 4883760 fd Linux raid autodetect
/dev/sdb4 863 19456 18594 149356305 fd Linux raid autodetect

Disk /dev/sdc: 19457 cylinders, 255 heads, 63 sectors/track
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

Device Boot Start End #cyls #blocks Id System
/dev/sdc1 * 0+ 11 12- 96358+ fd Linux raid autodetect
/dev/sdc2 12 254 243 1951897+ 82 Linux swap / Solaris
/dev/sdc3 255 862 608 4883760 fd Linux raid autodetect
/dev/sdc4 863 19456 18594 149356305 fd Linux raid autodetect

The other is a backup LVM2 physical volume (pvcreate /dev/sda; vgcreate bkup /dev/sda)
with no partitions:
## sfdisk -l /dev/sda

Disk /dev/sda: 19457 cylinders, 255 heads, 63 sectors/track

sfdisk: ERROR: sector 0 does not have an msdos signature
/dev/sda: unrecognized partition table type
No partitions found

My problem is that after repartitioning my harddrives and transferring my old system back,=
when I rebooted three lv {/opt,/usr,/archives} weren't mounted. Trying to mount them manually
resulted in "Operation not permitted" and further testing with reiserfsck revealed that they were
corrupted -- something about top level inode. Anyhow, I re-created a reiserfs on the three lvs
and restored my data from backup and manually restarted services: `rc default`. The lvs seem
to work fine with no write/read errors. But when I reboot, these same three lvs always get
corrupted. I've deleted them `lvremove main/{opt,usr,archives}` and re-created/re-initialized them
but to no avail. Every time I reboot, they get corrupted.

Some messages from dmesg:

md: running: <sdc4><sdb4>
raid1: raid set md4 active with 2 out of 2 mirrors
md: running: <sdc3><sdb3>
raid1: raid set md3 active with 2 out of 2 mirrors
md: running: <sdc1><sdb1>
raid1: raid set md1 active with 2 out of 2 mirrors

ReiserFS: dm-7: found reiserfs format "3.6" with standard journal
ReiserFS: dm-7: using ordered data mode
ReiserFS: dm-7: journal params: device dm-7, size 8192, journal first block 18, max trans len 1024, max batch 900
, max commit age 30, max trans age 30
ReiserFS: dm-7: checking transaction log (dm-7)
ReiserFS: warning: is_tree_node: node level 25938 does not match to the expected one 1
ReiserFS: dm-7: warning: vs-5150: search_by_key: invalid format found in block 8211. Fsck?
ReiserFS: dm-7: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [
1 2 0x0 SD]

ReiserFS: dm-6: found reiserfs format "3.6" with standard journal
ReiserFS: dm-6: using ordered data mode
ReiserFS: dm-6: journal params: device dm-6, size 8192, journal first block 18, max trans len 1024, max batch 900
, max commit age 30, max trans age 30
ReiserFS: dm-6: checking transaction log (dm-6)
ReiserFS: warning: is_tree_node: node level 25938 does not match to the expected one 1
ReiserFS: dm-6: warning: vs-5150: search_by_key: invalid format found in block 8211. Fsck?
ReiserFS: dm-6: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [
1 2 0x0 SD]

ReiserFS: dm-5: found reiserfs format "3.6" with standard journal
ReiserFS: dm-5: using ordered data mode
ReiserFS: dm-5: journal params: device dm-5, size 8192, journal first block 18, max trans len 1024, max batch 900
, max commit age 30, max trans age 30
ReiserFS: dm-5: checking transaction log (dm-5)
ReiserFS: warning: is_tree_node: node level 25938 does not match to the expected one 1
ReiserFS: dm-5: warning: vs-5150: search_by_key: invalid format found in block 8211. Fsck?
ReiserFS: dm-5: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [
1 2 0x0 SD]

If I shutdown and reboot from a LiveCD, I can then manually `mdadm -A --scan && vgchange -ay`
and then manually mount all my lv's with no corruption.

There was a post about baselayout upgrade causing shutdown problems on x86, so I've upgraded to the
latest ~amd64 masked baselayout, but have not yet tried rebooting.
There was also a post in this forum (436514) about lvm2 not mounting during boot and marked as
solved (kindof). I think he ended up repartitioning his harddrives, although he also mentioned
something about adding two lines somewhere to /etc/init.d/checkroot:

vgscan --mknodes --ignorelockingfailure
vgchange -ay --ignorelockingfailure

I need some advice from someone who understands LVM2 and init better than I.
Will repartitioning my harddrives solve this problem? Or is a baselayout upgrade the answer?
or a change to checkroot? or something else?

Here is the tail from `vgdisplay main`

/dev/md4: lvm2 label detected
/dev/md4: Found metadata at 134656 size 2744 for main (CcE5mw-EKsk-u2QZ-tqZp-26YU-QTx6-4YIlys)
Read main metadata (49) from /dev/md4 at 134656 size 2744
/dev/md4 0: 0 1937: archives(18543:0)
/dev/md4 1: 1937 5743: usr(0:0)
/dev/md4 2: 7680 512: portage(0:0)
/dev/md4 3: 8192 2560: home(0:0)
/dev/md4 4: 10752 1937: usr(5743:0)
/dev/md4 5: 12689 111: NULL(0:0)
/dev/md4 6: 12800 1536: vartmp(0:0)
/dev/md4 7: 14336 512: tmp(0:0)
/dev/md4 8: 14848 1024: NULL(0:0)
/dev/md4 9: 15872 1024: var(0:0)
/dev/md4 10: 16896 1024: opt(0:0)
/dev/md4 11: 17920 18543: archives(0:0)
--- Volume group ---
VG Name main
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 49
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 8
Getting device info for main-portage
dm version O [16384]
dm info LVM-CcE5mwEKsku2QZtqZp26YUQTx64YIlysxJOAb8ERUBxCzQ7bgLrdPUC8d9eQCzdk O [16384]
Getting device info for main-home
dm info LVM-CcE5mwEKsku2QZtqZp26YUQTx64YIlys0lnabC1A2xWwwWejywcctENDtBxn4exj O [16384]
Getting device info for main-vartmp
dm info LVM-CcE5mwEKsku2QZtqZp26YUQTx64YIlysXXkM5IYLIvUyx0h0I8kNLK1R8i2fyPmT O [16384]
Getting device info for main-tmp
dm info LVM-CcE5mwEKsku2QZtqZp26YUQTx64YIlysTpyh1z5fxA98aPlbuLGKYCVuER3kvDtB O [16384]
Getting device info for main-var
dm info LVM-CcE5mwEKsku2QZtqZp26YUQTx64YIlyse9ahmisyqPN2H6vKujCaXan2X1u7N9WV O [16384]
Getting device info for main-opt
dm info LVM-CcE5mwEKsku2QZtqZp26YUQTx64YIlysN4hqrcOChtoe1FPGdsLNA1rVcYS7zWDg O [16384]
Getting device info for main-archives
dm info LVM-CcE5mwEKsku2QZtqZp26YUQTx64YIlys2xxbPKk5qEWGCLCULbrANz0DybY1tBgv O [16384]
Getting device info for main-usr
dm info LVM-CcE5mwEKsku2QZtqZp26YUQTx64YIlysqFXgvsrFIhvYHN8eOiJXwcYSHrhJLqPa O [16384]
Open LV 6
Max PV 0
Cur PV 1
Act PV 1
VG Size 142.43 GB
PE Size 4.00 MB
Total PE 36463
Alloc PE / Size 35328 / 138.00 GB
Free PE / Size 1135 / 4.43 GB
VG UUID CcE5mw-EKsk-u2QZ-tqZp-26YU-QTx6-4YIlys

Read volume group main from /etc/lvm/backup/main
Unlocking /var/lock/lvm/V_main
Closed /dev/md4
Dumping persistent device cache to /etc/lvm/.cache
Wiping internal VG cache

THANKS in advance for any help.

tomk · Posted: Wed Aug 09, 2006 7:21 am Post subject:

Moved from Kernel & Hardware to Duplicate Threads, please don't crosspost. If you want a topic moved plesae report it and one of the moderators will move it for you. Follow ups to this topic: lvm2 corruption on reboot.
_________________
Search | Read | Answer | Report | Strip