Broke my dmraid? Help please!

Message

E-Razor · Post by **E-Razor** » Mon Jul 08, 2013 4:46 pm

Hi all,

I'm having quite a serious problem.

Today my server hung and I rebooted it - without any luck. Since it's a root server I started the recovery image and tried to mount my root-partition.

I am using dmraid, and unfortunatelly started with:
mdadm --create --level=1 --disk-count=2 /dev/md0 /dev/sda2 /dev/sdb2

It tried to resync, and I had to reboot. After the reboot I tried with:
mdadm --assemble /dev/md0 /dev/sda2 /dev/sdb2

The resync took quite long and afterwards I'm still not able to mount /dev/md0 .

Kernel log is:

Code: Select all

[  156.281879] md: md0 stopped.
[  156.282986] md: bind<sdb2>
[  156.283161] md: bind<sda2>
[  156.297031] md: raid1 personality registered for level 1
[  156.299897] md/raid1:md0: not clean -- starting background reconstruction
[  156.299904] md/raid1:md0: active with 2 out of 2 mirrors
[  156.299942] md0: detected capacity change from 0 to 248841306112
[  156.307937] md: resync of RAID array md0
[  156.307944] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[  156.307947] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
[  156.307952] md: using 128k window, over a total of 243009088k.
[  156.417063]  md0: unknown partition table
[  935.195018] SQUASHFS error: Can't find a SQUASHFS superblock on md0
[  935.195661] EXT4-fs (md0): VFS: Can't find ext4 filesystem
[  937.801446] EXT4-fs (md0): VFS: Can't find ext4 filesystem
[ 5023.803727] md: md0: resync done.
[ 5023.892662] RAID1 conf printout:
[ 5023.892665]  --- wd:2 rd:2
[ 5023.892668]  disk 0, wo:0, o:1, dev:sda2
[ 5023.892671]  disk 1, wo:0, o:1, dev:sdb2
[ 5360.859027] EXT4-fs (md0): VFS: Can't find ext4 filesystem
[ 5363.666290] SQUASHFS error: Can't find a SQUASHFS superblock on md0
[ 5363.666867] EXT4-fs (md0): VFS: Can't find ext4 filesystem

I'd appreciate any help.

Thanks a lot!

Post by **NeddySeagoon** » Mon Jul 08, 2013 6:10 pm

E-Razor,

mdadm --create is a very bad thing to do. It writes new raid metadata, which in effect, destroys your old raid. The sync won't have helped either.

However, all may not be lost. Creating new raid metadata is harmless *if* its identical to the old metadata. User data is not harmed in the process.
The downside is that mdadms defaults changed a few months ago, so if your original raid was a year or more old and you did not specify the parameters explicitly, you now have raid metadata version 1.2 but the old one was version 0.9.

So when did you create the old raid and how?

It gets slightly worse. Raid version 0.9 metadata is written at the end of the volume and the filesystem starts in the usual place, as if the volume is not a member of a raid set.
Raid version 1.2 metadata is written at the start of the volume and tramples over the primary extX filesystem superblock, that means you can no longer mount the filesystem using the primary superblock, which is what the standard invocation of mount does.

If we know what you used to have, something may be recoverable.

Your partition table would also be useful but I suppose that is inside the raid set and no longer available.

E-Razor · Post by **E-Razor** » Mon Jul 08, 2013 6:18 pm

I think the versions are the same since it's also the same rescue image.

I created it about 1 year ago.

Fstab looks like this:

Code: Select all

root@grml ~ # fdisk -l                                                                                             :(

Disk /dev/sda: 250.1 GB, 250059350016 bytes
224 heads, 56 sectors/track, 38934 cylinders, total 488397168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x52c44f76

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1              56     2107391     1053668   82  Linux swap / Solaris
/dev/sda2   *     2107392   488388095   243140352   fd  Linux raid autodetect

Disk /dev/sdb: 250.1 GB, 250059350016 bytes
224 heads, 56 sectors/track, 38934 cylinders, total 488397168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x9fa5628b

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1              56     2107391     1053668   82  Linux swap / Solaris
/dev/sdb2   *     2107392   488388095   243140352   fd  Linux raid autodetect

Good thing is, that i was able to mount sda1 after I did:
# mdadm --stop /dev/md0
and
# e2fsck /dev/sda1

I did not finish e2fsck, it told me about a second block which it used and all the other questions I answered with "no". Then I was able to mount again.

I'm going to backup as much as possbile now.

Next step would be to enable md0 again, maybe I can also fsck the md0 which could fix my filesystem.

Do you think this would help?

Post by **NeddySeagoon** » Mon Jul 08, 2013 8:25 pm

E-Razor,

If you allowed half the raid to mount rw, the two mirrors are now out of sync.
Recover what you can from /dev/sda2. Its very important that you do not write to a damaged raid/filesystem until you understand the damage.

I suspect you used to have raid superblock version 0.9 but now you have version 1.2
What does

Code: Select all

mdadm -E /dev/sdb2

show?

It sounds like fsck repaired your filesystem superblock damaged as I described above by writing a raid 1.2 superblock in the middle of it.

What counts is not the rescue image you used but the versions of mdadm.

How did you start your raid?
With kernel raid auto assemble or some other way?

E-Razor · Post by **E-Razor** » Tue Jul 09, 2013 11:10 am

I finally got it to work again.

Thanks for your hints!

I had version 0.9 and wrote "mdadm --create ..." which confused my system.

However, after e2fsck of one of the partions I was able to get my data back. I disconnected the working partition from the raid and formatted the raid again, then copied the old files into the empty raid-partiotion again.

My init was broken and I simply did a chroot and emerge which helped and the system is online now.

It seems I've had a lot of luck that the --create did not destroy the partition.

iandoug · Post by **iandoug** » Sun Sep 22, 2013 1:45 pm

NeddySeagoon wrote:
It gets slightly worse. Raid version 0.9 metadata is written at the end of the volume and the filesystem starts in the usual place, as if the volume is not a member of a raid set.
Raid version 1.2 metadata is written at the start of the volume and tramples over the primary extX filesystem superblock, that means you can no longer mount the filesystem using the primary superblock, which is what the standard invocation of mount does.

If we know what you used to have, something may be recoverable.

Your partition table would also be useful but I suppose that is inside the raid set and no longer available.

I did a normal update and noticed portage wanted me to update mdadm.conf, which I did ... I accepted the new version as I did not notice anything unusual.

Now I can't see my drives... my /home is on them.

I get the message in dmesg about "invalid raid superblock magic" .... sdb1 and sdc1 does not have a valid 0.9 superblock and not imported.

What you describe the new version doing sounds like the height of dumb to me unless there was a way to automagically deal with existing disks.

Installed version of mdadm is 3.2.6

/dev has md, md0 and md127, while fstab has /dev/md1

What can a desperate person do under these conditions? I need the box to work...

thanks, Ian

iandoug · Post by **iandoug** » Sun Sep 22, 2013 1:51 pm

would it help to dowgrade back to mdadm 3.1.4?

Jaglover · Post by **Jaglover** » Sun Sep 22, 2013 2:21 pm

/dev has md, md0 and md127, while fstab has /dev/md1

It is possible it is md0 or md127 now, did you look at those volumes?

iandoug · Post by **iandoug** » Sun Sep 22, 2013 2:38 pm

iandoug wrote: What can a desperate person do under these conditions? I need the box to work...

Edit mdadm.conf and specify the DEVICEs and ARRAY and reboot ...

I guess the etc-update step changed those lines and I didn't notice ...

what a relief.

cheers, Ian

Broke my dmraid? Help please!

Broke my dmraid? Help please!

would downgrading help?

solved