Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
RAID6: emergency help?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
RayDude
Advocate
Advocate


Joined: 29 May 2004
Posts: 2050
Location: San Jose, CA

PostPosted: Fri Dec 07, 2012 8:20 am    Post subject: RAID6: emergency help? Reply with quote

I created a raid6 array from six drives and copied all my old data onto it. I still have the old drives but ... setting them up and copying them over would be a rather large task.

The first time I booted the machine with the new raid6 array it worked perfectly.

Then I powered it off and put the cover back on and rebooted and one of the drives became faulty.

Then trying to re-add it to the array to get it to rebuild another drive went faulty.

Then one more mdadm -A and another drive went away and it looks like all data is lost.

What am I doing wrong?

Here's what mdadm currently says:

Code:
server ~ # mdadm --detail /dev/md127
/dev/md127:
        Version : 1.2
  Creation Time : Thu Nov 29 09:03:33 2012
     Raid Level : raid6
     Array Size : 11720534016 (11177.57 GiB 12001.83 GB)
  Used Dev Size : 2930133504 (2794.39 GiB 3000.46 GB)
   Raid Devices : 6
  Total Devices : 5
    Persistence : Superblock is persistent

    Update Time : Fri Dec  7 00:10:39 2012
          State : clean, FAILED
 Active Devices : 3
Working Devices : 4
 Failed Devices : 1
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 512K

           Name : SparePC:soulstorage
           UUID : bfc07787:5075d763:c70b65ac:687f7544
         Events : 61

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8       33        1      active sync   /dev/sdc1
       2       8       49        2      active sync   /dev/sdd1
       3       0        0        3      removed
       4       0        0        4      removed
       5       0        0        5      removed

       3       8       97        -      faulty spare   /dev/sdg1
       6       8       81        -      spare   /dev/sdf1


Here's what it said the previous failure:

Code:
server ~ # mdadm --detail /dev/md127
/dev/md127:
        Version : 1.2
  Creation Time : Thu Nov 29 09:03:33 2012
     Raid Level : raid6
  Used Dev Size : -1
   Raid Devices : 6
  Total Devices : 5
    Persistence : Superblock is persistent

    Update Time : Fri Dec  7 00:04:04 2012
          State : active, degraded, Not Started
 Active Devices : 4
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 512K

           Name : SparePC:soulstorage
           UUID : bfc07787:5075d763:c70b65ac:687f7544
         Events : 56

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8       33        1      active sync   /dev/sdc1
       2       8       49        2      active sync   /dev/sdd1
       3       8       97        3      active sync   /dev/sdg1
       4       0        0        4      removed
       5       0        0        5      removed

       6       8       81        -      spare   /dev/sdf1



It is interesting to note that the failed drives are all connected to a raid (with hardware disabled) card that had, until now been working fine.

Can someone help me? I have no idea what's failing or why...

Update: It looks like they are hard errors....

Code:
sd 9:0:0:0: [sdh] 
sd 9:0:0:0: [sdh] 
sd 9:0:0:0: [sdh] 
sd 9:0:0:0: [sdh] CDB:
end_request: I/O error, dev sdh, sector 264200
sd 9:0:0:0: [sdh] Unhandled sense code
sd 9:0:0:0: [sdh] 
sd 9:0:0:0: [sdh] 
sd 9:0:0:0: [sdh] 
sd 9:0:0:0: [sdh] CDB:
end_request: I/O error, dev sdh, sector 265224
sd 8:0:0:0: [sdg] 
sd 8:0:0:0: [sdg] 
sd 8:0:0:0: [sdg] 
sd 8:0:0:0: [sdg] CDB:
end_request: I/O error, dev sdg, sector 264200
 disk 0, o:1, dev:sdb1
 disk 1, o:1, dev:sdc1
 disk 2, o:1, dev:sdd1
 disk 3, o:1, dev:sdg1
 disk 4, o:1, dev:sdh1
 disk 5, o:1, dev:sdf1
 disk 0, o:1, dev:sdb1
 disk 1, o:1, dev:sdc1
 disk 2, o:1, dev:sdd1
 disk 3, o:1, dev:sdg1
 disk 4, o:1, dev:sdh1
sd 9:0:0:0: [sdh] 
sd 9:0:0:0: [sdh] 
sd 9:0:0:0: [sdh] 
sd 9:0:0:0: [sdh] CDB:
end_request: I/O error, dev sdh, sector 265224
md/raid:md127: read error NOT corrected!! (sector 263176 on sdh1).
md/raid:md127: Disk failure on sdh1, disabling device.
md/raid:md127: read error not correctable (sector 263184 on sdh1).
md/raid:md127: read error not correctable (sector 263192 on sdh1).
md/raid:md127: read error not correctable (sector 263200 on sdh1).
md/raid:md127: read error not correctable (sector 263208 on sdh1).
md/raid:md127: read error not correctable (sector 263216 on sdh1).
md/raid:md127: read error not correctable (sector 263224 on sdh1).
md/raid:md127: read error not correctable (sector 263232 on sdh1).
md/raid:md127: read error not correctable (sector 263240 on sdh1).
md/raid:md127: read error not correctable (sector 263248 on sdh1).
md/raid:md127: read error not correctable (sector 263256 on sdh1).
 disk 0, o:1, dev:sdb1
 disk 1, o:1, dev:sdc1
 disk 2, o:1, dev:sdd1
 disk 3, o:1, dev:sdg1
 disk 4, o:0, dev:sdh1
 disk 0, o:1, dev:sdb1
 disk 1, o:1, dev:sdc1
 disk 2, o:1, dev:sdd1
 disk 3, o:1, dev:sdg1
nfsd: last server has exited, flushing export cache
md: unbind<sdf1>
md: export_rdev(sdf1)
md: unbind<sdg1>
md: export_rdev(sdg1)
md: unbind<sdh1>
md: export_rdev(sdh1)
md: unbind<sdc1>
md: export_rdev(sdc1)
md: unbind<sdd1>
md: export_rdev(sdd1)
md: unbind<sdb1>
md: export_rdev(sdb1)
md: bind<sdc1>
md: bind<sdd1>
md: bind<sdg1>
md: bind<sdh1>
md: bind<sdf1>
md: bind<sdb1>
md: kicking non-fresh sdh1 from array!
md: unbind<sdh1>
md: export_rdev(sdh1)
md/raid:md127: device sdb1 operational as raid disk 0
md/raid:md127: device sdg1 operational as raid disk 3
md/raid:md127: device sdd1 operational as raid disk 2
md/raid:md127: device sdc1 operational as raid disk 1
 disk 0, o:1, dev:sdb1
 disk 1, o:1, dev:sdc1
 disk 2, o:1, dev:sdd1
 disk 3, o:1, dev:sdg1
md: unbind<sdb1>
md: export_rdev(sdb1)
md: unbind<sdf1>
md: export_rdev(sdf1)
md: unbind<sdg1>
md: export_rdev(sdg1)
md: unbind<sdd1>
md: export_rdev(sdd1)
md: unbind<sdc1>
md: export_rdev(sdc1)
md: bind<sdc1>
md: bind<sdd1>
md: bind<sdg1>
md: bind<sdh1>
md: bind<sdf1>
md: bind<sdb1>
md: unbind<sdb1>
md: export_rdev(sdb1)
md: unbind<sdf1>
md: export_rdev(sdf1)
md: unbind<sdh1>
md: export_rdev(sdh1)
md: unbind<sdg1>
md: export_rdev(sdg1)
md: unbind<sdd1>
md: export_rdev(sdd1)
md: unbind<sdc1>
md: export_rdev(sdc1)
md: bind<sdc1>
md: bind<sdd1>
md: bind<sdg1>
md: bind<sdh1>
md: bind<sdf1>
md: bind<sdb1>
md: kicking non-fresh sdh1 from array!
md: unbind<sdh1>
md: export_rdev(sdh1)
md/raid:md127: device sdb1 operational as raid disk 0
md/raid:md127: device sdg1 operational as raid disk 3
md/raid:md127: device sdd1 operational as raid disk 2
md/raid:md127: device sdc1 operational as raid disk 1
 disk 0, o:1, dev:sdb1
 disk 1, o:1, dev:sdc1
 disk 2, o:1, dev:sdd1
 disk 3, o:1, dev:sdg1
 disk 0, o:1, dev:sdb1
 disk 1, o:1, dev:sdc1
 disk 2, o:1, dev:sdd1
 disk 3, o:1, dev:sdg1
 disk 4, o:1, dev:sdf1
sd 8:0:0:0: [sdg] 
sd 8:0:0:0: [sdg] 
sd 8:0:0:0: [sdg] 
sd 8:0:0:0: [sdg] CDB:
end_request: I/O error, dev sdg, sector 264192
md/raid:md127: read error not correctable (sector 262144 on sdg1).
md/raid:md127: Disk failure on sdg1, disabling device.
md/raid:md127: read error not correctable (sector 262152 on sdg1).
md/raid:md127: read error not correctable (sector 262160 on sdg1).
md/raid:md127: read error not correctable (sector 262168 on sdg1).
md/raid:md127: read error not correctable (sector 262176 on sdg1).
md/raid:md127: read error not correctable (sector 262184 on sdg1).
md/raid:md127: read error not correctable (sector 262192 on sdg1).
md/raid:md127: read error not correctable (sector 262200 on sdg1).
md/raid:md127: read error not correctable (sector 262208 on sdg1).
md/raid:md127: read error not correctable (sector 262216 on sdg1).
 disk 0, o:1, dev:sdb1
 disk 1, o:1, dev:sdc1
 disk 2, o:1, dev:sdd1
 disk 3, o:0, dev:sdg1
 disk 4, o:1, dev:sdf1
 disk 0, o:1, dev:sdb1
 disk 1, o:1, dev:sdc1
 disk 2, o:1, dev:sdd1
 disk 3, o:0, dev:sdg1
 disk 0, o:1, dev:sdb1
 disk 1, o:1, dev:sdc1
 disk 2, o:1, dev:sdd1
 disk 3, o:0, dev:sdg1
 disk 0, o:1, dev:sdb1
 disk 1, o:1, dev:sdc1
 disk 2, o:1, dev:sdd1
md: unbind<sdb1>
md: export_rdev(sdb1)
md: unbind<sdf1>
md: export_rdev(sdf1)
md: unbind<sdg1>
md: export_rdev(sdg1)
md: unbind<sdd1>
md: export_rdev(sdd1)
md: unbind<sdc1>
md: export_rdev(sdc1)
md: bind<sdc1>
md: bind<sdd1>
md: bind<sdg1>
md: bind<sdh1>
md: bind<sdf1>
md: bind<sdb1>
md: unbind<sdb1>
md: export_rdev(sdb1)
md: unbind<sdf1>
md: export_rdev(sdf1)
md: unbind<sdh1>
md: export_rdev(sdh1)
md: unbind<sdg1>
md: export_rdev(sdg1)
md: unbind<sdd1>
md: export_rdev(sdd1)
md: unbind<sdc1>
md: export_rdev(sdc1)

_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54028
Location: 56N 3W

PostPosted: Fri Dec 07, 2012 10:05 pm    Post subject: Reply with quote

RayDude,

Don't do anything that may involve writes. Post the output of
Code:
mdadm -E /dev/sd[abcdef]1

What you hope to find is four members of the set with the same event count so you can assemble the raid in degraded mode.

I've just been through this with my 4 spindle raid5.

Its also worth installing smartmontools and looking at the drives internal error log.
If you saved dmesg with the error reports that showed why the drives were kicked out of the array, that would be good too.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
RayDude
Advocate
Advocate


Joined: 29 May 2004
Posts: 2050
Location: San Jose, CA

PostPosted: Sat Dec 08, 2012 12:51 am    Post subject: Reply with quote

Thanks Neddy!

I think the raid controller I used for my external SATA box is not compatible with these drives... Because the three drives that failed are all attached to it. (two inside, one outside)

It looks like b, c, d, and f are all at the same count which means I might be able to recover this. I bought two new dual controllers (DGMS, Fry's sucks) to see if they will work with the drives. I think b, c, and d are okay because they are plugged into the mother board.

It would be so awesome if I could get the array to rebuild itself. It takes days to copy 8 TB over gigabit.

Here's what mdadm said

Code:
server ~ # mdadm -E /dev/sd[bcdfgh]1
/dev/sdb1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : bfc07787:5075d763:c70b65ac:687f7544
           Name : SparePC:soulstorage
  Creation Time : Thu Nov 29 09:03:33 2012
     Raid Level : raid6
   Raid Devices : 6

 Avail Dev Size : 5860268032 (2794.39 GiB 3000.46 GB)
     Array Size : 23441068032 (11177.57 GiB 12001.83 GB)
  Used Dev Size : 5860267008 (2794.39 GiB 3000.46 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 6b6e4f45:5d52d5ea:c6ff4d3a:ebf8b515

    Update Time : Fri Dec  7 00:10:47 2012
       Checksum : ed444042 - correct
         Events : 63

         Layout : left-symmetric
     Chunk Size : 512K                                                   
                                                                         
   Device Role : Active device 0
   Array State : AAA... ('A' == active, '.' == missing)
/dev/sdc1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : bfc07787:5075d763:c70b65ac:687f7544
           Name : SparePC:soulstorage
  Creation Time : Thu Nov 29 09:03:33 2012
     Raid Level : raid6
   Raid Devices : 6

 Avail Dev Size : 5860268032 (2794.39 GiB 3000.46 GB)
     Array Size : 23441068032 (11177.57 GiB 12001.83 GB)
  Used Dev Size : 5860267008 (2794.39 GiB 3000.46 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : dc35e656:81f9e617:e9a6eafe:00cf70d6

    Update Time : Fri Dec  7 00:10:47 2012
       Checksum : b1442c11 - correct
         Events : 63

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AAA... ('A' == active, '.' == missing)
/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : bfc07787:5075d763:c70b65ac:687f7544
           Name : SparePC:soulstorage
  Creation Time : Thu Nov 29 09:03:33 2012
     Raid Level : raid6
   Raid Devices : 6

 Avail Dev Size : 5860268032 (2794.39 GiB 3000.46 GB)
     Array Size : 23441068032 (11177.57 GiB 12001.83 GB)
  Used Dev Size : 5860267008 (2794.39 GiB 3000.46 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 59a13e07:c83416e1:4c96063b:6ca6bbb3

    Update Time : Fri Dec  7 00:10:47 2012
       Checksum : 443299a5 - correct
         Events : 63

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAA... ('A' == active, '.' == missing)
/dev/sdf1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : bfc07787:5075d763:c70b65ac:687f7544
           Name : SparePC:soulstorage
  Creation Time : Thu Nov 29 09:03:33 2012
     Raid Level : raid6
   Raid Devices : 6

 Avail Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
     Array Size : 23441068032 (11177.57 GiB 12001.83 GB)
  Used Dev Size : 5860267008 (2794.39 GiB 3000.46 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 374073a6:b52ff21f:01661c22:cc5f2acb

    Update Time : Fri Dec  7 00:10:47 2012
       Checksum : 20c7bc89 - correct
         Events : 63

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : spare
   Array State : AAA... ('A' == active, '.' == missing)
/dev/sdg1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : bfc07787:5075d763:c70b65ac:687f7544
           Name : SparePC:soulstorage
  Creation Time : Thu Nov 29 09:03:33 2012
     Raid Level : raid6
   Raid Devices : 6

 Avail Dev Size : 5860268032 (2794.39 GiB 3000.46 GB)
     Array Size : 23441068032 (11177.57 GiB 12001.83 GB)
  Used Dev Size : 5860267008 (2794.39 GiB 3000.46 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 0445633e:5bf226e9:50f860b0:4784965b

    Update Time : Fri Dec  7 00:10:36 2012
       Checksum : a0a13abe - correct
         Events : 57

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : AAAAA. ('A' == active, '.' == missing)
/dev/sdh1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : bfc07787:5075d763:c70b65ac:687f7544
           Name : SparePC:soulstorage
  Creation Time : Thu Nov 29 09:03:33 2012
     Raid Level : raid6
   Raid Devices : 6

 Avail Dev Size : 5860268032 (2794.39 GiB 3000.46 GB)
     Array Size : 23441068032 (11177.57 GiB 12001.83 GB)
  Used Dev Size : 5860267008 (2794.39 GiB 3000.46 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 697e05c1:644d35c8:1cb2b136:97a14f39

    Update Time : Thu Dec  6 23:58:39 2012
       Checksum : 665ba374 - correct
         Events : 48

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 4
   Array State : AAAAA. ('A' == active, '.' == missing)

_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
frostschutz
Advocate
Advocate


Joined: 22 Feb 2005
Posts: 2977
Location: Germany

PostPosted: Sat Dec 08, 2012 1:37 am    Post subject: Reply with quote

If you have solved the drive failure problem (by hooking them up through some other card). And if the drives do work reliably then. You should be able to reassemble the RAID. By your output, the first four ones are good (same timestamp and event count), whereas the latter two are out-of-date. So you should assemble (using --force if you must) with only the first four drives, and then once this is up and running, re-add the other two. Since RAID 6 allows two drive failures, it will resync. As long as the first drives aren't bad, the sync should succeed and you are back in the game with no additional data loss since the md failure.

Good luck.

If it does not work out you may have to resort to your backup after all.
Back to top
View user's profile Send private message
RayDude
Advocate
Advocate


Joined: 29 May 2004
Posts: 2050
Location: San Jose, CA

PostPosted: Sat Dec 08, 2012 5:40 am    Post subject: Reply with quote

Thanks!

I removed the old sata card and added the two new SIL3132 boards. Unfortunately only one of them is recognized and I don't know why. The one plugged into the X16 port is not initialized, no bios no nothing. So I can only see five drives.

I had a four port PCI SATA raid card in my hand but I couldn't remember if this MOBO had a PCI slot so I got the PCIe cards...

Now I'm stuck buying a four port card from NewEgg because Frys didn't have any four port PCIe in stock (PCI is probably slow anyway) or I just have to bite the bullet and assemble with the four good drives and hope that there are no write errors until I find a way to hook up the sixth drive.

Man I wish I'd planned this better. I forgot this MOBO only had four SATA ports.

Update: Well, I let it rebuild the drives and it went to active sync.

Then I added the last drive and its currently rebuilding. So I'll have one redundant drive until I can find a solution that gives me four more SATA devices.

Any suggestions for a good card that's less than a hundred?

Thanks again guys!
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54028
Location: 56N 3W

PostPosted: Sat Dec 08, 2012 5:45 pm    Post subject: Reply with quote

RayDude,

Write errors are actually fairly safe. The drive will realise the write failed and reallocate the failed sector.
Its read errors that are the problem. When a drive has problems with a read but its still successful, the data will be moved to s spare sector.
When a read fails, the data is lost and can't be moved. That's a fairly simplistic explaination anyway.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Mad Merlin
Veteran
Veteran


Joined: 09 May 2005
Posts: 1155

PostPosted: Mon Dec 10, 2012 3:46 am    Post subject: Reply with quote

RayDude wrote:
Any suggestions for a good card that's less than a hundred?


As far as I know, such a thing doesn't exist. You can grab an LSI 9211-4i for ~$200 or an 9211-8i for ~$250, which are 4 port (1x SFF-8087) and 8 port (2x SFF-8087), respectively. These are barebones HBA cards meant for passing through the disks to the host OS rather than doing any RAID themselves and work well.

You can often find rebrands and/or used cards for less, have a look at this list, the 9211 uses the SAS2008 chipset.

I can't really recommend anything less expensive than that, I've tried a couple of them and have been burned more than once.
_________________
Game! - Where the stick is mightier than the sword!
Back to top
View user's profile Send private message
frostschutz
Advocate
Advocate


Joined: 22 Feb 2005
Posts: 2977
Location: Germany

PostPosted: Mon Dec 10, 2012 1:24 pm    Post subject: Reply with quote

I'm not sure how expensive they are in $, but Lian Li IB-01 or Dawicontrol DC-624e, are around 80€, so shouldn't be >$100.

The Lian Li is just a port multiplier, though, and the Dawicontrol may need some extras ( http://theangryangel.co.uk/blog/marvell-88se9172-sata3-under-linux-as-of-320 but it may work out of the box in newer kernels maybe )

Since you mentioned you used internal and external ports on your card, there are lots of cards where the external port is shared, i.e. you can use either the external or the internal connector but not both at the same time.
Back to top
View user's profile Send private message
RayDude
Advocate
Advocate


Joined: 29 May 2004
Posts: 2050
Location: San Jose, CA

PostPosted: Thu Dec 13, 2012 12:44 am    Post subject: Reply with quote

mdadm help again. I bought a cheap Marvell Based RAID III card from Newegg.

It worked from the start and drive six of the array started rebuilding.

At some point after about 20 hours the new controller died and took three hard drives with it ... again...

I guess the old adage, you get what you paid for, applies here.

Anywho. I bought a PCI raid card SIL3124 based, and its up and running. All six drives read as clean but three of them have a smaller event counts. How do I rebuild this with minimal damage. Is it possible?

Code:
/dev/sdb1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : bfc07787:5075d763:c70b65ac:687f7544
           Name : SparePC:soulstorage
  Creation Time : Thu Nov 29 09:03:33 2012
     Raid Level : raid6
   Raid Devices : 6

 Avail Dev Size : 5860268032 (2794.39 GiB 3000.46 GB)
     Array Size : 23441068032 (11177.57 GiB 12001.83 GB)
  Used Dev Size : 5860267008 (2794.39 GiB 3000.46 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 6b6e4f45:5d52d5ea:c6ff4d3a:ebf8b515

    Update Time : Wed Dec 12 16:23:17 2012
       Checksum : ed4c7e51 - correct
         Events : 49889

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAA... ('A' == active, '.' == missing)
/dev/sdc1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : bfc07787:5075d763:c70b65ac:687f7544
           Name : SparePC:soulstorage
  Creation Time : Thu Nov 29 09:03:33 2012
     Raid Level : raid6
   Raid Devices : 6

 Avail Dev Size : 5860268032 (2794.39 GiB 3000.46 GB)
     Array Size : 23441068032 (11177.57 GiB 12001.83 GB)
  Used Dev Size : 5860267008 (2794.39 GiB 3000.46 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : dc35e656:81f9e617:e9a6eafe:00cf70d6

    Update Time : Wed Dec 12 16:23:17 2012
       Checksum : b14c6a20 - correct
         Events : 49889

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AAA... ('A' == active, '.' == missing)
/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : bfc07787:5075d763:c70b65ac:687f7544
           Name : SparePC:soulstorage
  Creation Time : Thu Nov 29 09:03:33 2012
     Raid Level : raid6
   Raid Devices : 6

 Avail Dev Size : 5860268032 (2794.39 GiB 3000.46 GB)
     Array Size : 23441068032 (11177.57 GiB 12001.83 GB)
  Used Dev Size : 5860267008 (2794.39 GiB 3000.46 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 59a13e07:c83416e1:4c96063b:6ca6bbb3

    Update Time : Wed Dec 12 16:23:17 2012
       Checksum : 443ad7b4 - correct
         Events : 49889

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAA... ('A' == active, '.' == missing)
/dev/sde1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : bfc07787:5075d763:c70b65ac:687f7544
           Name : SparePC:soulstorage
  Creation Time : Thu Nov 29 09:03:33 2012
     Raid Level : raid6
   Raid Devices : 6

 Avail Dev Size : 5860268032 (2794.39 GiB 3000.46 GB)
     Array Size : 23441068032 (11177.57 GiB 12001.83 GB)
  Used Dev Size : 5860267008 (2794.39 GiB 3000.46 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 0445633e:5bf226e9:50f860b0:4784965b

    Update Time : Wed Dec 12 12:34:15 2012
       Checksum : a0b11323 - correct
         Events : 49876

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : AAAAAA ('A' == active, '.' == missing)
/dev/sdf1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x2
     Array UUID : bfc07787:5075d763:c70b65ac:687f7544
           Name : SparePC:soulstorage
  Creation Time : Thu Nov 29 09:03:33 2012
     Raid Level : raid6
   Raid Devices : 6

 Avail Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
     Array Size : 23441068032 (11177.57 GiB 12001.83 GB)
  Used Dev Size : 5860267008 (2794.39 GiB 3000.46 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
Recovery Offset : 4761467048 sectors
          State : clean
    Device UUID : b74d0c97:082efe8c:f85ebd68:2e5d8734

    Update Time : Wed Dec 12 12:34:15 2012
       Checksum : 4a4bffb9 - correct
         Events : 49876

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 5
   Array State : AAAAAA ('A' == active, '.' == missing)
/dev/sdg1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : bfc07787:5075d763:c70b65ac:687f7544
           Name : SparePC:soulstorage
  Creation Time : Thu Nov 29 09:03:33 2012
     Raid Level : raid6
   Raid Devices : 6

 Avail Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
     Array Size : 23441068032 (11177.57 GiB 12001.83 GB)
  Used Dev Size : 5860267008 (2794.39 GiB 3000.46 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 45527180:c644c76d:c1b4c0f2:d2f7ed4a

    Update Time : Wed Dec 12 12:34:15 2012
       Checksum : 9915d2c6 - correct
         Events : 49876

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 4
   Array State : AAAAAA ('A' == active, '.' == missing)

_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
RayDude
Advocate
Advocate


Joined: 29 May 2004
Posts: 2050
Location: San Jose, CA

PostPosted: Thu Dec 13, 2012 5:31 am    Post subject: Reply with quote

Update: I just typed 'mdadm -A --force /dev/md127' and it assembled and began rebuilding disk 6 again.

Then I ran fsck.ext4 on /dev/md127 and let the delete a few bad inodes.

Unfortunately, as I suspected, the PCI card is too slow and the rebuild of drive 6 hasn't moved a percent in several hours.

I'm still without a solution.
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54028
Location: 56N 3W

PostPosted: Thu Dec 13, 2012 9:41 pm    Post subject: Reply with quote

RayDude,

If all your raid6 was doing was rebuilding - i.e. you were not writing anything to it, you might be lucky.
If there were files open for writing when your three drives went offline, you can expect those files to be in a mess.
If directory writes were in progress, the contents of these directories may be lost.

This wiki article works. The advantage with --create in degraded mode over --force, is that you can try all the combinatons to see if one degraded combination is better than another.

At first sight, there is no reason why it should be but the drives are not all written concurrently, so a failure such as you had will leave the drives in slightly different states. Degraded mode for you means four out of six drives. You need to think carefully before your mount the filesystem, even read only, as journal replays and the resulting writes will still happen. I think you can avoid the journal replay is you want but I don't know how.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
frostschutz
Advocate
Advocate


Joined: 22 Feb 2005
Posts: 2977
Location: Germany

PostPosted: Thu Dec 13, 2012 10:17 pm    Post subject: Reply with quote

--create is also dangerous though, if you get the command wrong that's bye bye to your data. Your top priority is your hardware issue though. It just doesn't do having three drives vanish in one go. If you have so many controllers failing maybe you should check for short circuits in your PSU/case, or any other cables for that matter. I've used cheap controllers myself and never had a failure. So your problems seem fishy to me somehow.
Back to top
View user's profile Send private message
RayDude
Advocate
Advocate


Joined: 29 May 2004
Posts: 2050
Location: San Jose, CA

PostPosted: Fri Dec 14, 2012 3:51 am    Post subject: Reply with quote

Thanks guys.

There were only three bad inodes after the fsck, so the array seems fine.

With the PCI card, its still rebuilding the final drive. It is at 94% and will hopefully be done before the morning...

I have ordered an LSI logic RAID card from Amazon, it will arrive tomorrow. Hopefully it will be reliable in my mother board. Since I'll likely be able to boot my SSD off it, I'll connect four drives to the mother board that way if I have problems the most I will lose is two drives.

This sure has been an experience though, wow.
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
RayDude
Advocate
Advocate


Joined: 29 May 2004
Posts: 2050
Location: San Jose, CA

PostPosted: Sat Dec 15, 2012 7:19 am    Post subject: Reply with quote

Just Venting...

I bought a SAS controller from LSI.

It didn't appear to support 3TB drives, so I upgraded the FW from a kubuntu boot usb drive.

It still doesn't appear to support 3TB drives.

What do I have to do to get a working four port SATA card?

*exasperated*
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54028
Location: 56N 3W

PostPosted: Sat Dec 15, 2012 1:15 pm    Post subject: Reply with quote

RayDude,

How do you mean
RayDude wrote:
didn't appear to support 3TB drives
?

What happens when you connect a 3G drive.
? The only difference is 48bit LBA or not. Or not means you max out at 137G.
A lot of bolt on goodies claim a 2TB limit so Windows users with MSDOS partition tables are not surprised when they find a 2Tb limit but its not hardware related.

Try it, I will be surprised if it doesn't 'just work'.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
RayDude
Advocate
Advocate


Joined: 29 May 2004
Posts: 2050
Location: San Jose, CA

PostPosted: Sun Dec 16, 2012 4:30 am    Post subject: Reply with quote

Thanks Neddy.

I already have working GPT 3 TB partitions on the drives. When I connect them to the LSI card, the are reported as 2048 MB and the GPT partitions are not present when the machine is booted.

I've found some forums complaining about this problem with LSI but I haven't found any solutions. I've emailed tech support.

I should point out that I removed the BDROM from the PCI raid card and the raid performance doubled to 135 MB / second... Pretty interesting.

NeddySeagoon wrote:
RayDude,

How do you mean
RayDude wrote:
didn't appear to support 3TB drives
?

What happens when you connect a 3G drive.
? The only difference is 48bit LBA or not. Or not means you max out at 137G.
A lot of bolt on goodies claim a 2TB limit so Windows users with MSDOS partition tables are not surprised when they find a 2Tb limit but its not hardware related.

Try it, I will be surprised if it doesn't 'just work'.

_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54028
Location: 56N 3W

PostPosted: Sun Dec 16, 2012 2:22 pm    Post subject: Reply with quote

RayDude,

GPT uses two copies of the partition table, one at the start of the drive and one at the end. It gets really upset if the two don't match.
With a 2048Mb limit, the copy at the end of the drive can't be read.

dmesg probably has errors about attempting to read beyond the end of the device.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
RayDude
Advocate
Advocate


Joined: 29 May 2004
Posts: 2050
Location: San Jose, CA

PostPosted: Sun Dec 16, 2012 4:00 pm    Post subject: Reply with quote

Thanks, I'm sure that's why the partition table seems empty when I attempt to look at it. Now its up to LSI.

I guess I'll have to break down and buy another raid card. The reviews on promise and hot point look bad. I might just buy another silicon images board, but make it for PCI Express instead of PCI... I don't want to because the one I used for several years died...
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
Mad Merlin
Veteran
Veteran


Joined: 09 May 2005
Posts: 1155

PostPosted: Wed Dec 19, 2012 12:35 am    Post subject: Reply with quote

You didn't mention which model of LSI card you ended up with. However, LSI has an article on the issue here: http://webcache.googleusercontent.com/search?q=cache:6MN0yCPeVn0J:http://kb.lsi.com/Print16399.aspx%2Blsi+2TB&oe=UTF-8&hl=en&ct=clnk

It looks like the 6Gbit/s cards (such as the 9211 I suggested above) support >= 3T drives while the older ones do not (unless you have SAS drives, which you likely don't). However, I've personally only used the 9211 with SSDs (which are, sadly, smaller than 2T still).
_________________
Game! - Where the stick is mightier than the sword!
Back to top
View user's profile Send private message
RayDude
Advocate
Advocate


Joined: 29 May 2004
Posts: 2050
Location: San Jose, CA

PostPosted: Tue Jan 01, 2013 8:45 pm    Post subject: Reply with quote

Final Update. My older card would not recognize 3TB even with the IT firmware. I sent it back and ordered a Marvell SAS card. It works okay, but the performance is kinda random. For example the SSD plugged into the MVSAS card is about half as fast as it was on the motherboard. But it wasn't always this slow... I'm having trouble figuring out how to get full performance out of it.
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum