Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
mdadm superblock problem
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
drescherjm
Advocate
Advocate


Joined: 05 Jun 2004
Posts: 2790
Location: Pittsburgh, PA, USA

PostPosted: Wed Sep 22, 2010 7:13 pm    Post subject: mdadm superblock problem Reply with quote

I had a problem with a gentoo server that has been in production for 3 to 5 years now (updated udev but forgot to remove deprecated sysfs).

I could not fix the problem in the running system so I tried to boot from a sysrescue cd and hit a big problem. The machine has 6 750GB hard drives that each have 5 partitions. 4 of these partitions are mdadm raid members. The problem is that during boot 2 of the disks were detected as whole disk members on md0 instead of /dev/sda1 and /dev/sdd1 which were partition members of the md0 raid 1 array. The problem appears that my disks have superblocks (probably from testing before depoyment) when they should not have them.


Code:
datastore1 ~ # mdadm -E /dev/sda
/dev/sda:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 89328e5a:110661e4:4dd9d63e:8a6c4e0e
  Creation Time : Fri Oct 12 13:21:08 2007
     Raid Level : raid5
  Used Dev Size : 732574464 (698.64 GiB 750.16 GB)
     Array Size : 2197723392 (2095.91 GiB 2250.47 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Mon Oct 15 13:35:42 2007
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
       Checksum : d578e9c2 - correct
         Events : 18

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     0       8        0        0      active sync   /dev/sda

   0     0       8        0        0      active sync   /dev/sda
   1     1       8       16        1      active sync   /dev/sdb
   2     2       8       32        2      active sync   /dev/sdc
   3     3       8       48        3      active sync   /dev/sdd


Code:
datastore1 ~ # mdadm -E /dev/sda1
/dev/sda1:
          Magic : a92b4efc
        Version : 0.90.03
           UUID : 7acd778f:ed62583c:a2ef05c9:d06c0a48
  Creation Time : Thu Jun 15 00:12:24 2006
     Raid Level : raid1
  Used Dev Size : 256896 (250.92 MiB 263.06 MB)
     Array Size : 256896 (250.92 MiB 263.06 MB)
   Raid Devices : 6
  Total Devices : 6
Preferred Minor : 0

    Update Time : Tue Sep 21 17:02:39 2010
          State : clean
 Active Devices : 6
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 15e41b5b - correct
         Events : 445


      Number   Major   Minor   RaidDevice State
this     0       8        1        0      active sync   /dev/sda1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       8       17        1      active sync   /dev/sdb1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       8       49        3      active sync   /dev/sdd1
   4     4       8       65        4      active sync   /dev/sde1
   5     5       8       81        5      active sync   /dev/sdf1



Where is the superblock stored for full disks? Can I safely zero that without corrupting the working arrays?


Here are the arrays when they are properly assembled.
Code:
datastore1 ~ # cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid1 sdf1[5] sdd1[3] sde1[4] sdb1[1] sdc1[2] sda1[0]
      256896 blocks [6/6] [UUUUUU]

md2 : active raid6 sdf5[5] sdd5[3] sde5[4] sdb5[1] sdc5[2] sda5[0]
      1199283200 blocks level 6, 256k chunk, algorithm 2 [6/6] [UUUUUU]

md3 : active raid6 sdf6[5] sdd6[3] sde6[4] sdb6[1] sdc6[2] sda6[0]
      1680013056 blocks level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU]

md1 : active raid6 sdf3[5] sdd3[3] sde3[4] sdb3[1] sdc3[2] sda3[0]
      46909440 blocks level 6, 256k chunk, algorithm 2 [6/6] [UUUUUU]

unused devices: <none>

_________________
John

My gentoo overlay
Instructons for overlay


Last edited by drescherjm on Tue Sep 28, 2010 8:35 pm; edited 2 times in total
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7470

PostPosted: Wed Sep 22, 2010 7:57 pm    Post subject: Re: mdadm superblock problem Reply with quote

drescherjm wrote:

Where is the superblock stored for full disks? Can I safely zero that without corrupting the working arrays?


I don' t have the answer, but re-asking differently your question should gave you the answer. Here's my version of your question.

Do i have datas on the array i'm ok to loose ?


For your main problem if i get it correclty, mdadm badly detect your members of the array and you are trying to alter the disks (the member of the arrays) so mdadm will autodetect correclty who is who ?

I don't use mdadm, but i really doubt you cannot split (split as "unload", "unset", not as destroy") the array to rebuild it manually.
This way you will get back the array, access it and correct your udev trouble.
Of course it will only work if mdadm allow you to do that, but that seems so basic that it will be a shame it don't.
That method should allow you to correct your gentoo, and more: it should remove your question.
Back to top
View user's profile Send private message
drescherjm
Advocate
Advocate


Joined: 05 Jun 2004
Posts: 2790
Location: Pittsburgh, PA, USA

PostPosted: Wed Sep 22, 2010 8:15 pm    Post subject: Reply with quote

Quote:
Do i have datas on the array i'm ok to loose ?

There is between 4 to 5 TB of data that should be all on tape but it would take a long time to make a second backup.

Quote:
mdadm badly detect your members of the array and you are trying to alter the disks (the member of the arrays) so mdadm will autodetect correclty who is who ?


I am trying to remove the super blocks off the whole disk raid members so that mdadm detects the correct arrays instead of finding arrays that do not currently exist.

Quote:
This way you will get back the array, access it and correct your udev trouble.

I fixed the udev problem and the system actually boots correctly with the current kernel provided genkernel mdadm support is in the initrd. If I boot from a livecd the autodetection of the non-existent arrays causes a mess with the arrays.

/dev/sda and /dev/sdd go into a single array md0 that will not start because of missing members.

the other three raid 6 arrays start but with 4 out 6 raid members.

I figured out how to recover from this situation.
1. stop all 4 arrays

Code:
mdadm --manage /dev/md0 --stop
mdadm --manage /dev/md1 --stop
mdadm --manage /dev/md2 --stop
mdadm --manage /dev/md3 --stop


2. force the kernel to reload the partition table from /dev/sda and /dev/sdd
Code:
sfdisk -R /dev/sda
sfdisk -R /dev/sdd


3. Reassemble md0
Code:
mdadm -A /dev/md0 /dev/sd[abcdef]1


4. Add the missing raid members to the other disks.
Code:
mdadm --manage /dev/md1 --add /dev/sda3
mdadm --manage /dev/md1 --add /dev/sdd3


After this I can chroot into the system and perform maintenance..
BTW, I did not readd the drives to the other two arrays to save time since these are data only.
_________________
John

My gentoo overlay
Instructons for overlay
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7470

PostPosted: Wed Sep 22, 2010 8:45 pm    Post subject: Reply with quote

You should backup before altering your array. It's not like it's a critical thing you must do, it's a cosmetic feature you wish working with a livecd. Is it worth the 4-5TB gambling ?


If you are looking for an alternate but faster way of testing that (as secure as it could be without a backup).
I would consider picking one disk as my "test disk", of course, one that is affect by the issue.

Then i would snapshot that drive (sorry i don't have the command in mind, but i'm sure dd is a tool that can do that easy).
This way, i backup the snapshot of the drive, lowering datas to backup to the drive capacity (750G for your case), lol of course not backing that on the array itself, you need find the place elsewhere.

Then setting my array (if possible) and my gentoo to work RO on the array (trying my best to avoid the array writing infos about a failure that might coming next)
Alter the target disk with your modifications.

Now if you boot the array/gentoo and modification prevent the array from working: the array should be RO, all disks except the modified one should be ok. Then restoring (dd or other tool you use to make the snapshot) that disk should get your array back to the previous state (hmm, well, in theory)

This is tricky, this is risky, this is what i always done because i'm lazzy to backup.

As a foot note: you shouldn't asking advice from unknow user on a forum, what risk do they get? 0, i will still sleep very well if your array is dead, i might get ban as retaliation, woooooo, poor me.
But thinking about your side, you have lost 4-5G of datas, and worst you've put a production server in a stop state for some time.
So even someone here tell you : "don't worry, alter it, it's ok it will works", you should still think if anything goes wrong, and many things can goes wrong when tweaking stuff: who will face the result ? Getting fired for this kind of stuff is possible.

Krinn gains a level, wisdom+1
Back to top
View user's profile Send private message
drescherjm
Advocate
Advocate


Joined: 05 Jun 2004
Posts: 2790
Location: Pittsburgh, PA, USA

PostPosted: Wed Sep 22, 2010 8:59 pm    Post subject: Reply with quote

Quote:
You should backup before altering your array. It's not like it's a critical thing you must do, it's a cosmetic feature you wish working with a livecd. Is it worth the 4-5TB gambling ?


I will most likely do that. The issue with backups is the data is subdivided into 10 to 20 projects/grants (we do medical imaging research). And the backup procedure generally is that the data gets backed up to tape manually per project just after the data is added to the project. This way is efficient for restoring data since we know where to look and do not have to worry about name collisions in 10s of millions of files. Normally data gets added at 10 to 100GB a 0 to 4 times a month. The problem with this method is people do not always tell me when they created an entirely new project and its not easy to tell what is an is not backed up.

Quote:
As a foot note: you shouldn't asking advice from unknow user on a forum, what risk do they get? 0, i will still sleep very well if your array is dead, i might get ban as retaliation, woooooo, poor me.

I was thinking about that when I posted. I am > 90% sure that if I zero the superblocks on the whole disk members that all will be well but I better be more careful.
_________________
John

My gentoo overlay
Instructons for overlay
Back to top
View user's profile Send private message
drescherjm
Advocate
Advocate


Joined: 05 Jun 2004
Posts: 2790
Location: Pittsburgh, PA, USA

PostPosted: Wed Sep 22, 2010 10:39 pm    Post subject: Reply with quote

With the help of wikipedia I found the location of the superblock

https://raid.wiki.kernel.org/index.php/RAID_superblock_formats#The_version-0.90_Superblock_Format

Code:
datastore1 ~ # hexdump -s 750156242944 -C /dev/sda
aea8cbe000  aa aa aa aa aa aa aa aa  aa aa aa aa aa aa aa aa  |................|
*
aea8cc0000  fc 4e 2b a9 00 00 00 00  5a 00 00 00 00 00 00 00  |.N+.....Z.......|
aea8cc0010  00 00 00 00 5a 8e 32 89  04 ad 0f 47 05 00 00 00  |....Z.2....G....|
aea8cc0020  00 33 aa 2b 04 00 00 00  04 00 00 00 00 00 00 00  |.3.+............|
aea8cc0030  00 00 00 00 e4 61 06 11  3e d6 d9 4d 0e 4e 6c 8a  |.....a..>..M.Nl.|
aea8cc0040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|



And it is in the 2.6MB (5104 sectors) that are after the last partition.

Code:
datastore1 ~ # fdisk -l /dev/sda

Disk /dev/sda: 750.2 GB, 750156374016 bytes
255 heads, 63 sectors/track, 91201 cylinders, total 1465149168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *          63      514079      257008+  fd  Linux raid autodetect
/dev/sda2          514080     2024189      755055   82  Linux swap / Solaris
/dev/sda3         2024190    25479089    11727450   fd  Linux raid autodetect
/dev/sda4        25479090  1465144064   719832487+   5  Extended
/dev/sda5        25479153   625137344   299829096   fd  Linux raid autodetect
/dev/sda6       625137408  1465144064   420003328+  fd  Linux raid autodetect


I am 99% sure I can just corrupt this superblock and all will be well. I know mdadm has a zero superblock option but I am concerned that there may be more than 1 superblock (even though the doc does not mention that).. Or is there only 1? I guess I can verify that using virtual box..

Felling confident I knew what I was doing (or at least writing zeros past the end of the last partition would be safe) I went ahead and zapped the superblock:

aea8cc0000 = 750156251136 bytes

Code:

datastore1 ~ # dd of=/dev/sda if=/dev/zero bs=1 count=4096 seek=750156251136
4096+0 records in
4096+0 records out
4096 bytes (4.1 kB) copied, 0.0112831 s, 363 kB/s
datastore1 ~ # mdadm -E /dev/sda
mdadm: No md superblock detected on /dev/sda.


Note: I did save the superblock before attempting this..

And to verify that I did not mess anything up:
Code:

datastore1 ~ # mdadm -E /dev/sda6
/dev/sda6:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 19ded0b8:91d05f83:5b8944ff:62ae5209
  Creation Time : Mon Oct 22 14:15:28 2007
     Raid Level : raid6
  Used Dev Size : 420003264 (400.55 GiB 430.08 GB)
     Array Size : 1680013056 (1602.19 GiB 1720.33 GB)
   Raid Devices : 6
  Total Devices : 6
Preferred Minor : 3

    Update Time : Tue Sep 21 17:02:49 2010
          State : clean
 Active Devices : 6
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 0
       Checksum : bfd1dc5e - correct
         Events : 6

     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     0       8        6        0      active sync   /dev/sda6

   0     0       8        6        0      active sync   /dev/sda6
   1     1       8       22        1      active sync   /dev/sdb6
   2     2       8       38        2      active sync   /dev/sdc6
   3     3       8       54        3      active sync   /dev/sdd6
   4     4       8       70        4      active sync   /dev/sde6
   5     5       8       86        5      active sync   /dev/sdf6


datastore1 ~ # echo check > /sys/block/md3/md/sync_action
datastore1 ~ # cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid1 sdf1[5] sdd1[3] sde1[4] sdb1[1] sdc1[2] sda1[0]
      256896 blocks [6/6] [UUUUUU]

md2 : active raid6 sdf5[5] sdd5[3] sde5[4] sdb5[1] sdc5[2] sda5[0]
      1199283200 blocks level 6, 256k chunk, algorithm 2 [6/6] [UUUUUU]

md3 : active raid6 sdf6[5] sdd6[3] sde6[4] sdb6[1] sdc6[2] sda6[0]
      1680013056 blocks level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU]
      [>....................]  check =  0.0% (164776/420003264) finish=169.8min speed=41194K/sec

md1 : active raid6 sdf3[5] sdd3[3] sde3[4] sdb3[1] sdc3[2] sda3[0]
      46909440 blocks level 6, 256k chunk, algorithm 2 [6/6] [UUUUUU]

unused devices: <none>

_________________
John

My gentoo overlay
Instructons for overlay
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum