Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
sw raid 1 / wrong "/" after disc repair [solved]
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
little_bob
Retired Dev
Retired Dev


Joined: 27 Jul 2004
Posts: 34

PostPosted: Thu Sep 27, 2012 7:38 pm    Post subject: sw raid 1 / wrong "/" after disc repair [solved] Reply with quote

hello community,

i am using 2 crucial 128gb discs in a sw raid 1 (with meta 0.9) on a asus p8b ws. the discs are connected to the two 6gb sata ports (port1 and port2). cause of a firmware bug i was forced to run my system for some days with the disc on port 2. i had plugged of the disc on port 1 for checking. i fixed the bug with a firmware update on both disc and plugged in the disc on port 1 back again.

when the system now come up i see in /proc/mdstat

Code:

wooki ~ # cat /proc/mdstat
Personalities : [raid1]
md127 : active raid1 sdb3[1]
      122834176 blocks [2/1] [_U]
     
md1 : active raid1 sda1[0] sdb1[1]
      102336 blocks [2/2] [UU]
     
md3 : active raid1 sda3[0]
      122834176 blocks [2/1] [U_]
     
unused devices: <none>


md1 is ok.
md3 is broken and running with the old data from the disc which was not in the system for some days.
md127 is wrong but has the actual data i need.

mdadm examine shows for /dev/sda3

Code:

wooki ~ # mdadm --examine /dev/sda3
/dev/sda3:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : b2406504:81284d9b:78a9b883:5f5f67d5
  Creation Time : Thu Jan 12 16:05:31 2012
     Raid Level : raid1
  Used Dev Size : 122834176 (117.14 GiB 125.78 GB)
     Array Size : 122834176 (117.14 GiB 125.78 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 3

    Update Time : Thu Sep 27 19:24:02 2012
          State : clean
 Active Devices : 1
Working Devices : 1
 Failed Devices : 1
  Spare Devices : 0
       Checksum : 5b7a591b - correct
         Events : 761467


      Number   Major   Minor   RaidDevice State
this     0       8        3        0      active sync   /dev/sda3

   0     0       8        3        0      active sync   /dev/sda3
   1     1       0        0        1      faulty removed


mdadm exmaine for /dev/sdb3 show

Code:

wooki ~ # mdadm --examine /dev/sdb3
/dev/sdb3:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : b2406504:81284d9b:78a9b883:5f5f67d5
  Creation Time : Thu Jan 12 16:05:31 2012
     Raid Level : raid1
  Used Dev Size : 122834176 (117.14 GiB 125.78 GB)
     Array Size : 122834176 (117.14 GiB 125.78 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 3

    Update Time : Wed Sep 26 13:53:21 2012
          State : clean
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 5b64ebbc - correct
         Events : 112460


      Number   Major   Minor   RaidDevice State
this     1       8        3        1      active sync   /dev/sda3

   0     0       0        0        0      removed
   1     1       8        3        1      active sync   /dev/sda3


i can see that sda3 and sdb3 have the same uuid. how come?

/etc/mdadm.conf
Code:

ARRAY /dev/md/3_0 metadata=0.90 UUID=b2406504:81284d9b:78a9b883:5f5f67d5
ARRAY /dev/md/1_0 metadata=0.90 UUID=cb91b603:9860658e:78a9b883:5f5f67d5


i am not sure how to fix this.
someone has a tip?

best regards


Last edited by little_bob on Sun Sep 30, 2012 11:57 am; edited 1 time in total
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54096
Location: 56N 3W

PostPosted: Thu Sep 27, 2012 8:43 pm    Post subject: Reply with quote

little_bob,

The two UUIDs are the same as they both refer to the same raid.
Code:
 $ sudo mdadm -E /dev/sd[abcd]1 | grep UUID
           UUID : 9392926d:64086e7a:86638283:4138a597
           UUID : 9392926d:64086e7a:86638283:4138a597
           UUID : 9392926d:64086e7a:86638283:4138a597
           UUID : 9392926d:64086e7a:86638283:4138a597
is from my 4 spindle raid1 /boot

You need to fail then remove the old partition from /dev/md3, so its back in degraded mode on one drive.
Then zero the raid superblock and add the partition to the /dev/md3 raid set.

Zeroing the superblock on the partition is destructive.

The problem is that the old drive has not failed, it was just dropped out of the raid, so it no longer holds a mirror of data, but its a valid degraded raid set on its own, so mdadm has started it as its own raid set.

Whe you add the 'replacement' partition, the raid set will rebuild. Watch /proc/mdstat
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
little_bob
Retired Dev
Retired Dev


Joined: 27 Jul 2004
Posts: 34

PostPosted: Sat Sep 29, 2012 6:55 pm    Post subject: Reply with quote

hello NeddySeagoon,

thank you for the infos.

"/" is on the active but "old" disk /dev/sda3 from /dev/md3.
i have stopped the wrong raid /dev/md127 (which has the actual data on /dev/sdb3). no problem.
then i tried to set the raid /dev/md3 to fail but as this raid holds my "/" it is busy.

Code:
wooki ~ # mdadm -f /dev/md3 /dev/sda3
mdadm: set device faulty failed for /dev/sda3:  Device or resource busy


i guess that would be no problem if the running /dev/md3 would be using /dev/sdb3 and not /dev/sda3.
is there a way to tell the system to use /dev/sdb3 instead of /dev/sda3 at boot time?

best regards
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54096
Location: 56N 3W

PostPosted: Sat Sep 29, 2012 7:01 pm    Post subject: Reply with quote

little_bob,

root=/dev/md127 in grub.conf and in /etc/fstab, not the /dev/md3 you have thee currently
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
little_bob
Retired Dev
Retired Dev


Joined: 27 Jul 2004
Posts: 34

PostPosted: Sat Sep 29, 2012 9:23 pm    Post subject: Reply with quote

after a longer irc session with you (NeddySeagoon) it is solved :D #

thank you very much for the help and the time.
Back to top
View user's profile Send private message
little_bob
Retired Dev
Retired Dev


Joined: 27 Jul 2004
Posts: 34

PostPosted: Sun Sep 30, 2012 11:54 am    Post subject: Reply with quote

hi community,

after the forum runs stable again i will describe formal what i have done to fix this.

- boot from usb stick with live dvd
- create md devices md1, md3 (with sdb3 and missing) and md127 (with sda3 and missing)
- mount md1 and adjust grub.conf (boot again from md3)
- mount md127 and check if it is old content (it was)
- mount md3 and check if it is actual content (it was)
- stop md127 and remove sda3 from it
- zero superblock on sda3
- add sda3 to md3
- observe /proc/mdstat and wait until recovery is finished
- reboot and check content

description i used for handling sw raid:
http://www.gentoo.org/doc/en/gentoo-x86+raid+lvm2-quickinstall.xml
http://en.wikipedia.org/wiki/Mdadm

there are also examples for commands.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54096
Location: 56N 3W

PostPosted: Sun Sep 30, 2012 5:02 pm    Post subject: Reply with quote

little_bob,

Thank you for sharing the solution.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum