Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
HELP: 2of6 devices faulty in Softraid-5
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Master One
l33t
l33t


Joined: 25 Aug 2003
Posts: 754
Location: Austria

PostPosted: Sat Nov 19, 2005 10:58 am    Post subject: HELP: 2of6 devices faulty in Softraid-5 Reply with quote

I really thought, my storage server is kind of save, having a softraid-5 with 6 pieces Maxtor MaxLine II Plus 250GB IDE drives (these are ultra-reliable enterprise-class disk drives designed for low-I/O enterprise applications such as midline, nearline, NAS and other secondary storage solution), but yesterday noon something terrible happened, I could not imagine before. Looks like two drives are now faulty with different issues. Luckily these drives have a 5y warranty, I already requested two in-advance replacement drives, which I expect to arrive in the upcoming week. so it is not about the drives themself, but the content of my 1.25TB raid (it does not contain any real important stuff, but it would be quite a loss, if it could not be recovered).

This softraid-5 was built as follows:
Code:
Device0 /dev/hda1
Device1 /dev/hdc1 -> failed on 18th November
Device2 /dev/hde1
Device3 /dev/hdg1
Device4 /dev/hdi1 -> failed on 25th October
Device5 /dev/hdk1
Filesystem is reiserfs
Assembled as /dev/md7

After some analyses I could find out, that both drives did not die at the same time, looks like /dev/hdi already failed on 25th October (that server is unattended in an office, usually once a week software is updated, but because network access to the samba-share on that raid-device was working without any flaws until yesterday, the status of the raid was only checked occasionally, and as seen, not within the last month). I didn't know, that a raid-5 could operate with one device down without a spare drive, I thought a defective drive would have to be replaced, before the raid can be started again this way.

So this is the part of the syslog, when the first drive (/dev/hdi) died:
Code:
Oct 25 03:12:45 storemaster hdi: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Oct 25 03:12:45 storemaster hdi: dma_intr: error=0x40 { UncorrectableError }, LBAsect=273969472, high=16, low=5534016, sector=273969471
Oct 25 03:12:45 storemaster end_request: I/O error, dev hdi, sector 273969471
Oct 25 03:12:45 storemaster raid5: Disk failure on hdi1, disabling device. Operation continuing on 5 devices
Oct 25 03:12:45 storemaster disk 4, o:0, dev:hdi1

And now what happened yesterday:
Code:
Nov 18 12:17:05 storemaster hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Nov 18 12:17:05 storemaster hdc: dma_intr: error=0x40 { UncorrectableError }, LBAsect=348709327, high=20, low=13165007, sector=348709327
Nov 18 12:17:05 storemaster end_request: I/O error, dev hdc, sector 348709327
Nov 18 12:17:05 storemaster raid5: Disk failure on hdc1, disabling device. Operation continuing on 4 devices
Nov 18 12:17:07 storemaster hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Nov 18 12:17:07 storemaster hdc: dma_intr: error=0x40 { UncorrectableError }, LBAsect=348709335, high=20, low=13165015, sector=348709335
Nov 18 12:17:07 storemaster end_request: I/O error, dev hdc, sector 348709335
Nov 18 12:17:08 storemaster hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Nov 18 12:17:08 storemaster hdc: dma_intr: error=0x40 { UncorrectableError }, LBAsect=348709351, high=20, low=13165031, sector=348709343
Nov 18 12:17:08 storemaster end_request: I/O error, dev hdc, sector 348709343
Nov 18 12:17:10 storemaster hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Nov 18 12:17:10 storemaster hdc: dma_intr: error=0x40 { UncorrectableError }, LBAsect=348709351, high=20, low=13165031, sector=348709351
Nov 18 12:17:10 storemaster end_request: I/O error, dev hdc, sector 348709351
Nov 18 12:17:11 storemaster hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Nov 18 12:17:11 storemaster hdc: dma_intr: error=0x40 { UncorrectableError }, LBAsect=348709359, high=20, low=13165039, sector=348709359
Nov 18 12:17:11 storemaster end_request: I/O error, dev hdc, sector 348709359
Nov 18 12:17:12 storemaster hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Nov 18 12:17:12 storemaster hdc: dma_intr: error=0x40 { UncorrectableError }, LBAsect=348709370, high=20, low=13165050, sector=348709367
Nov 18 12:17:12 storemaster end_request: I/O error, dev hdc, sector 348709367
Nov 18 12:17:14 storemaster hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Nov 18 12:17:14 storemaster hdc: dma_intr: error=0x40 { UncorrectableError }, LBAsect=348709379, high=20, low=13165059, sector=348709375
Nov 18 12:17:14 storemaster end_request: I/O error, dev hdc, sector 348709375
Nov 18 12:17:14 storemaster disk 1, o:0, dev:hdc1

I already checked both drives with the Maxtor PowerMax disc diagnoses tool, it gave different diagnostic codes for both drives with "failed" as result (#DEA6991 for hdi and #DE97987D for hdc).

The question is now, what are my chances, that I can recover the data, when the two replacement drives arrive. Both drives are not completely dead, which means I can run a check with hdparm on both drives, and I can also check the partition table with fdisk.

From what I can see in the syslog on a reboot, it looks really bad for /dev/hdi:
Code:
Nov 18 20:11:49 storemaster hdi: max request size: 1024KiB
Nov 18 20:11:49 storemaster hdi: 490234752 sectors (251000 MB) w/7936KiB Cache, CHS=30515/255/63, UDMA(133)
Nov 18 20:11:49 storemaster hdi: cache flushes supported
Nov 18 20:11:49 storemaster hdi: hdi1
Nov 18 20:11:49 storemaster hdi: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Nov 18 20:11:49 storemaster hdi: dma_intr: error=0x40 { UncorrectableError }, LBAsect=490223295, high=29, low=3684031, sector=490223295
Nov 18 20:11:49 storemaster end_request: I/O error, dev hdi, sector 490223295
Nov 18 20:11:49 storemaster md: disabled device hdi1, could not read superblock.
Nov 18 20:11:49 storemaster md: hdi1 has invalid sb, not importing!

The situation for /dev/hdc is different:
Code:
Nov 18 20:31:58 storemaster hdc: max request size: 1024KiB
Nov 18 20:31:58 storemaster hdc: 490234752 sectors (251000 MB) w/7936KiB Cache, CHS=30515/255/63, UDMA(133)
Nov 18 20:31:58 storemaster hdc: cache flushes supported
Nov 18 20:31:58 storemaster hdc: hdc1
Nov 18 20:31:58 storemaster md: hdc1 has different UUID to sdb9
Nov 18 20:31:58 storemaster md: hdc1 has different UUID to sdb8
Nov 18 20:31:58 storemaster md: hdc1 has different UUID to sdb7
Nov 18 20:31:58 storemaster md: hdc1 has different UUID to sdb6
Nov 18 20:31:58 storemaster md: hdc1 has different UUID to sdb5
Nov 18 20:31:58 storemaster md: hdc1 has different UUID to sdb3
Nov 18 20:31:58 storemaster md: hdc1 has different UUID to sdb1
Nov 18 20:31:58 storemaster md:  adding hdc1 ...
Nov 18 20:31:58 storemaster md: bind<hdc1>
Nov 18 20:31:58 storemaster md: running: <hdk1><hdg1><hde1><hdc1><hda1>
Nov 18 20:31:58 storemaster md: kicking non-fresh hdc1 from array!
Nov 18 20:31:58 storemaster md: unbind<hdc1>
Nov 18 20:31:58 storemaster md: export_rdev(hdc1)

Of course /dev/md7 can not be assembled or run any more in this state, so there is nothing that can be analysed with mdadm ATM.

I hope, there are some raid specialists arround here, who can answe the following questions:

1. Can the superblock on /dev/hdi be restored in any way?
2. What would be the proper proceeding, when the two replacement drives arrive?
3. Would it make sense, to first copy the content of both drives sector-by-sector to the new drives, and could this be done?
4. Is there anything else, I could try or check in advance (as long as the replacement drives are not here)?
_________________
Las torturas mentales de la CIA
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 45406
Location: 56N 3W

PostPosted: Sat Nov 19, 2005 11:11 am    Post subject: Reply with quote

Master One,

You chances of getting all your data back are close to zero.
Find dd_rhelp (its not in portage) and a machine you can install the new drive and a faulty drive in.

Run dd_rhelp to copy the faulty drive to the replacement. Do both drives.
dd_rhelp does a binary seach for good sectors on a damaged drive and recovers what it can.
It will gradually do more and more retries, so it never completes unless you are very lucky.

Put the recovered drives in your raid array and see how lucky you are.
You will need to devise some sort of data integrety test.

Oh, and set up a cron job to send you a status email once a day, or just when when you get a drive fail.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Master One
l33t
l33t


Joined: 25 Aug 2003
Posts: 754
Location: Austria

PostPosted: Sat Nov 19, 2005 1:59 pm    Post subject: Reply with quote

Thank's for the feedback, NeddySeagoon. I can live with partial dataloss, but hopefully not all is lost. Already took a look at dd_rhelp, I guess this is the only way to go. I still can use that machine to perform the drive copying, because the OS is installed on a separate softraid-1 with two U160 SCSI discs. The great mystery is now, what exactly will happen, after I installed the two replacement drives with the copied content of the faulty ones. Will it be possible to assemble and run the array, and if not, what can be done in that case? If I am lucky, what sort of data integrity test would be available (or is it just fsck.reiserfs)?

The other thing, that's actually on my mind:

When trying to assemble the faulty raid in its actual state, it does not work out because of the missing superblock on /dev/hdi (the superblock on /dev/hdc is intact, so hdc most likely only suffers from some bad sectors).

Shouldn't I just replace /dev/hdi with the new replacement drive, and try to assemble and run the raid then?
Or would that attempt cause more harm than benefit?
Is there any chance, that it could work out that way, I mean, that /dev/hdi gets reconstructed, so that I then can swap the faulty /dev/hdc with the new replacenment drive, have a final reconstruction, and everthing is done?

Oh boy, what a terrible situation, having to sit here with the uncertainty if this will work out in any way at all, and having to wait until the replacement drives will arrive here next week... :cry:

If I get it going again, I certainly will use the mdadm reporting feature, so that such a mess will not happen again (or should I use smartmon for all IDE drives in addition to the mdadm reporting feature in daemon mode)?
_________________
Las torturas mentales de la CIA
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 45406
Location: 56N 3W

PostPosted: Sat Nov 19, 2005 5:51 pm    Post subject: Reply with quote

Master One,

You should be able to bring up your raid with one drive missing anyway. RAID is about drive redundancy
Your situation is complicated by the differing failure dates.

When the first drive failed it was not used and the system continued with degraded RAID operation.
You probably cannot use that drive (or a copy of it) to bring up the raid in degraded mode as the drive contents are not consistant with the rest of the raid set. Neither can the raid set be reconstructed with the data from the good drives, there is not enough data. You need to recover the data from the most recently failed drive, or as mucg as you can.

I don't knoe how you check your data integrty. Damage will be evident at two levels.
Where metadata is lost (sectors holding directory information) the files that these directory entries pointed to will be difficult to recover. you need to know hw reiserfs allocates disk spece.

Where data sectors are lost, the data is gone.

In the case of unused sectors, it doesn't make any difference - there was nothing there to lose.
Its down to luck now.
Do not operate the drive unless you really have to, if the platter bearings have failed it will get worse quite quickly.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Master One
l33t
l33t


Joined: 25 Aug 2003
Posts: 754
Location: Austria

PostPosted: Sat Nov 19, 2005 10:02 pm    Post subject: Reply with quote

Of course, I was so caught up in my first assumption, that both drives died at the same time yesterday, that I have completely overlooked the fact, that the dead drive from 25th October (/dev/hdi) does not matter at all any more.

So it is all about saving as much content as possible from yesterdays failing drive /dev/hdc, and this is what I am planning to do, once the two replacement drives arrived:

1. Copy the content of /dev/hdc to the first new drive using dd_rhelp.
2. Simply replace /dev/hdi with the second new drive, and partition it identical like the other drives (it's only one large primary type fd partition).
3. Assemble & run the raid.

The only uncertainty at this point is, if the raid can be assembed properly. With the defective /dev/hdi, mdadm complained, that the raid can not be assembled due to the missing superblock on /dev/hdi. But the new drive will also have no superblock (which is written when the raid is created, right?). So unless I am missing something here, mdadm will also complain about the missing superblock on the new drive, or will it somehow recognize, that it is a new drive and create the superblock itself (using the --add option in manage mode of mdadm)? I just want to be 100% sure, that I am not doing anything wrong, once the replacement drives are here.

Any further recommendations and ideas are highly appreciated.
_________________
Las torturas mentales de la CIA
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 45406
Location: 56N 3W

PostPosted: Sat Nov 19, 2005 10:11 pm    Post subject: Reply with quote

Master One,

You need to bring the raid up in degraded mode, with the partitions from the new drive missing, then raidhotadd them, so the redundant data is recreated.
You only fdisk the new drive. radihotadd can take a long time on a large partition but you can use the raid while it rebuilds like this.

With only one drive missing, the raid will form if its parts appear to be correct.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Master One
l33t
l33t


Joined: 25 Aug 2003
Posts: 754
Location: Austria

PostPosted: Sun Nov 20, 2005 11:49 am    Post subject: Reply with quote

Thank's, NeddySeagoon, you are are great help. On my different machines I have one hardware-raid5, two software-raid1 and one software-raid5, but I never ran into any troubles until now, so I am not that confident dealing with the actual situation. I read the Linux Softraid FAQ, but the following question is still not properly answered:

Does the new replacement drive (for my defective /dev/hdi) have to be partitioned (and set to type 'fd' for linux autoraid) before it gets added to the raid?

The FAQ only tells to swap the defective drive and raidhotadd it. You mention, to fdisk the new drive. I am confused now.
_________________
Las torturas mentales de la CIA
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 45406
Location: 56N 3W

PostPosted: Sun Nov 20, 2005 1:00 pm    Post subject: Reply with quote

Master One,

Linux software raid does not raid whole drives, it makes raid systems from partitions.
This allows different raid levels to be hosted on the same drives. For example, my /boot is raid 1 but all my other partitions are raid 0. They are all on the same pair of drives.

When you raidhotadd, you will raidhotadd a partition, not a drive. The partition must be there for raidhotadd to find it, so you need to create it before you can tell raidhotadd about it.

Silly thought for the day.
You could make a raid5 set from 3 or more partitions on the same drive. It may not be useful, other than for testing but the kernel won't mind.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
j-m
Retired Dev
Retired Dev


Joined: 31 Oct 2004
Posts: 975

PostPosted: Sun Nov 20, 2005 1:22 pm    Post subject: Reply with quote

NeddySeagoon wrote:
Master One,

You need to bring the raid up in degraded mode, with the partitions from the new drive missing, then raidhotadd them, so the redundant data is recreated.


It's already been running in degraded mode since the first drive failed.

NeddySeagoon wrote:

You only fdisk the new drive. radihotadd can take a long time on a large partition but you can use the raid while it rebuilds like this.

With only one drive missing, the raid will form if its parts appear to be correct.


No chance this would work w/ two drives failed.
Back to top
View user's profile Send private message
Master One
l33t
l33t


Joined: 25 Aug 2003
Posts: 754
Location: Austria

PostPosted: Sun Nov 20, 2005 1:34 pm    Post subject: Reply with quote

Ok, understood now. Silly me, of course it is not possible to add a whole drive, but only a partition (should have had a clear mind before writing, but for me it was meant to be the whole drive, because I only have one large partition of each of those raid5 drives).

j-m, only the first drive (with missing superblock) is really gone, the second only seems to have some bad sectors, so I am speculating with the chance, that I can save it by using dd_rhelp to copy over the content to the second new replacement drive.

I only hope, that this will work out somehow, I'll let you know the result. But for now I have to wait, until the two replacement drives arrive (Maxtor is sending them from Hungary, hopefully they arrive within the next few days).
_________________
Las torturas mentales de la CIA
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 45406
Location: 56N 3W

PostPosted: Sun Nov 20, 2005 2:44 pm    Post subject: Reply with quote

j-m,

I'm well aware that Master One needs to recover the most recently failed drive to attempt to get the raid to start in degraded mode. If that works, and the data is worth saving, then the oldest failed drive can be replaced by raidhotadd.

I said earlier
Quote:

Neither can the raid set be reconstructed with the data from the good drives, there is not enough data. You need to recover the data from the most recently failed drive, or as mucg as you can.

_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Master One
l33t
l33t


Joined: 25 Aug 2003
Posts: 754
Location: Austria

PostPosted: Tue Nov 22, 2005 12:30 am    Post subject: Reply with quote

BTW I'm going to use the mdadm email monitoring function in the future for my softraid-1 & softraid-5, which should prevent such a mess from happening ever again.

But I have no clue, how to remote monitor the hardware-raid-5 in my IBM eServer 226 with ServRaid 6i U320 SCSI-raid-controller.

Anyone any hint?
_________________
Las torturas mentales de la CIA
Back to top
View user's profile Send private message
Master One
l33t
l33t


Joined: 25 Aug 2003
Posts: 754
Location: Austria

PostPosted: Wed Nov 23, 2005 9:25 am    Post subject: Reply with quote

I have just been informed about a software called "HDD Regenerator", which is intended for the following purpose:
Quote:
Program features

Ability to detect physical bad sectors on a hard disk drive surface

Ability to repair physical bad sectors (magnetic errors) on a hard disk surface

The product ignores file system, scans disk at physical level. It can be used with FAT, NTFS or any other file system, and also with unformatted or unpartitioned disks.

Bootable regenerating diskette allows starting regenerating process under DOS automatically. The diskette can be created under Windows 95 / 98 / ME / NT4 / 2000 / XP / 2003

Main benefits

Hard disk drive is an integral part of every computer. It stores all your information. One of the most prevalent defects of hard drives is bad sectors on the disk surface. Bad sectors are a part of the disk surface which contains not readable, but frequently necessary information. As a result of bad sectors you may have difficulties to read and copy data from your disk, your operating system becomes unstable and finally your computer may unable to boot altogether. When a hard drive is damaged with bad sectors, the disk not only becomes unfit for use, but also you risk losing information stored on it. The HDD Regenerator can repair damaged hard disks without affecting or changing existing data. As a result, previously unreadable and inaccessible information is restored.

How it works

Almost 60% of all hard drives damaged with bad sectors have an incorrectly magnetized disk surface. We have developed an algorithm which is used to repair damaged disk surfaces. This technology is hardware independent, it supports many types of hard drives and repairs damage that even low-level disk formatting cannot repair. As a result, previously unreadable information will be restored. Because of the way the repair is made, the existing information on the disk drive will not be affected!

Important notes

Since the program does not change the logical structure of a hard drive, the file system may still show some sectors marked earlier as 'bad', and other disk utilities such as Scandisk will detect logical bad sectors even though the disk has been successfully regenerated and is no longer damaged by physical bad sectors. If you want to remove these marks, repartition the hard disk drive.

Would that software provide a better chance to save my raiddrive with bad sectors (/dev/hdc), than just trying to copy the content using dd_rhelp?

Once again the objective:

From the 6 discs in that softraid-5, one is gone (/dev/hdi -> no superblock found), and a second one (/dev/hdc) is marked as "none-fresh" due to bad sectors.

Before the replacement drive for /dev/hdi can be hotadded to the raid-5, the raid will have to be started in degraded mode, which is currently not possibe due to /dev/hdc being marked as "none-fresh".

So my concern is now, that I have to somehow regenerate /dev/hdc, so that the raid can be assembled and run in degraded mode at all.

Would this be the best way of proceeding:

1. Copy content of defective /dev/hdc to a new drive using dd_rhelp
2. Run HDD REGENERATOR on defective /dev/hdc
3. Try to assemble and run the raid using the defective but regenerated /dev/hdc drive
4. If that fails, try to assemble and run the raid using the new /dev/hdc drive (with contains the copy of the content of the defective drive)

Or would it be better, to swap step 3 and 4?

Still missing info:

Is there a way, to mark /dev/hdc as "fresh" again, once it has been copied to the new drive, or been regenerated using that HDD REGENERATOR software, so that the raid can be assembled and started in degraded mode at all?

I mean, let's say /dev/hdc only suffers from a few bad sectors, and it could be successfully regenerated / copied to the new drive, I assume mdadm will still complain, that it's marked as "none-fresh", and therefor the raid can't be assembled and started.

I hope, someone can provide the missing info. I am really desperate. :cry:
(The two replacement drives still did not arrive here, but online-tracking confirms, they have left Maxtor Hungary yesterday)
_________________
Las torturas mentales de la CIA
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 45406
Location: 56N 3W

PostPosted: Fri Nov 25, 2005 3:31 pm    Post subject: Reply with quote

Master One,

"HDD Regenerator" appears to defy the laws of physics. First, the software running on your PC plays no part in reading the data from the disc platter. If the drive cant read it, it can't read it for any software. The advertising is misleading too. Its been about 10 years since you could do a low level format on a hard drive. One read only platter surface is given over to head servo and formatting information. There was a brief period when you could overwrite this information but it rendered the drive unusable, since the head servo information was lost.

When something appears too good to be true, it usually is.

I've lost the plot with which drive you need to recover - it needs to be the one that failed most recently, regardless of whats wrong with it.
Most filesystems store several copies of the superblock so if the primary one is damaged, it can still be repaired.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Master One
l33t
l33t


Joined: 25 Aug 2003
Posts: 754
Location: Austria

PostPosted: Fri Nov 25, 2005 6:51 pm    Post subject: Reply with quote

You are probably right, and dd_rhelp may already be the best bet, I'll see. I just don't want to miss out on any available chance, to somehow save as much as possible from the raid5 content. It would be really very sad, if all data would be lost, just because of some few bad sectors. I am not concerned about being unable to recover those bad sectors for their content, but to be able to assemble and run the raid at all.

BTW I am getting a little impatient by now, because my replacement drives should already have arrived days ago, but something went wrong during the transport (online-tracking shows, that the two parcels left Maxtor EMEA in Hungary on tuesday, but for an unknown reason they went to the Nederlands instead to Austria). It's been already a week now, that my terrabyte file-server is down... :(
_________________
Las torturas mentales de la CIA
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 45406
Location: 56N 3W

PostPosted: Fri Nov 25, 2005 9:18 pm    Post subject: Reply with quote

Master One,

If you have had a spin motor bearing failure you may get data back if you operate the failed drive in unusual positions.
If it normally operates on edge try it on the opposite edge and any other stable position you like.
Sometimes making gravity move the spindle onto an unworn part of the bearing can make the drive read well enigh to recover your data. It only needs to work once per faulty sector.

Do not move the drive while its spinning unless you have recovered everything you expect to get the gyroscopic forces can do a lot of harm.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Master One
l33t
l33t


Joined: 25 Aug 2003
Posts: 754
Location: Austria

PostPosted: Sat Nov 26, 2005 12:36 pm    Post subject: Reply with quote

Good hint, may be worth a try. That drive was positioned normally, so I'll try the recovery with dd_rhelp with the drive positioned upside down (at least is should cause no further harm).
_________________
Las torturas mentales de la CIA
Back to top
View user's profile Send private message
Master One
l33t
l33t


Joined: 25 Aug 2003
Posts: 754
Location: Austria

PostPosted: Fri Dec 02, 2005 10:06 am    Post subject: Reply with quote

Damned, still stuck here. The replacement drives already arrived on last monday, but when I tried the lastest version of dd_rhelp with dd_rescue, it always stopped with the following error message after a while:
Code:
dd_rhelp: error: sources add_chunk : invalid argument '0-(info)' (is not correctly formatted as number:number)

I already contacted Valentin (dd_rhelp), as well as Kurt (dd_rescue) with the details and the logfile by email. Only Valentin replied until now, mentioning that the error is stating that there is incoherent info in the log file, but I did not receive any further reply since tuesday... :(

Does anybody know of an alternative to dd_rhelp+dd_rescue?

The goal is just to copy the content of the defective drive to the new drive nevertheless the defective sectors (so a simple "dd" is a no-go, because it simply would stop when getting to the first none-readable sector).
_________________
Las torturas mentales de la CIA
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 45406
Location: 56N 3W

PostPosted: Fri Dec 02, 2005 5:07 pm    Post subject: Reply with quote

Master One,

You can use good old dd and tell it not to abort on errors, however thats very slow as it runs retries on bad blocks until it gives up and moves on. I don't know how to control the retries (maybe thats the kernel) without hacking the code.

dd_rhelp gives up on the first fail and tries somewhere else on the disk, so it recovers the maximum amount of data in the minimum time.

If you want to give dd a try, the man page tells how. You can also use skip and seek to skip block in the input and output streams to manually get over bad areas.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Master One
l33t
l33t


Joined: 25 Aug 2003
Posts: 754
Location: Austria

PostPosted: Tue Dec 06, 2005 8:45 pm    Post subject: Reply with quote

THIS IS UNBELIEVABLE!!! I REALLY DID IT!!! :D

Ok, it's not completely my victory, but Valentin's (the creator of dd_rhelp). The problem was, dd_rhelp walked straight until about 160 GB, but then met the first set of bad sectors, and jumped to a point beyond the capacity of the disc. Instead of reporting eof, it led to some other feedback, and therefor to an error. The problem could easily be fixed, be deleting that sections from the dd_rhelp logfile, and setting the eof manually. After that, dd_rhelp kept walking and jumping through that drive (there were two areas with a few bad sectors) and indeed finished (!) after a short while (kept that machine unattended for some hours, when I came back, it simply was done).

I then reinstalled all drives, had to fiddle a little around with getting the raid to assemble ("mdadm --assemble /dev/md7 -Rf && mdadm --manage /dev/md7 -a /dev/hdi" finally did the job), and after a reconstruction time of about 3 hours all 6 drives were up and running, and it does not seem, that any data loss has occurred at all (after all 868 GB of that 1.25TB capacity).

What an amazing experience! I still can not believe, that I could indeed save that softraid-5 with 2 of 6 drives down!

Thank's a lot, NeddySeagoon, it would definitely not have worked out without your help (wouldn't have come to dd_rhelp without your hint).

What a wonderful day... :D
_________________
Las torturas mentales de la CIA
Back to top
View user's profile Send private message
muchtall
n00b
n00b


Joined: 07 Mar 2008
Posts: 1

PostPosted: Fri Mar 07, 2008 1:39 pm    Post subject: Reply with quote

Please excuse my resurrection of an ancient thread.

Where did you find the value for eof? I tried some of the values shown in fdisk -l without success. Perhaps I'm not editing the logfile properly?
Back to top
View user's profile Send private message
Master One
l33t
l33t


Joined: 25 Aug 2003
Posts: 754
Location: Austria

PostPosted: Mon Mar 10, 2008 9:41 am    Post subject: Reply with quote

I am sorry, I don't remember how I got the value for eof in my case at that time, it's just too long ago. You should contact Valentin (the creator of dd_rhelp).
_________________
Las torturas mentales de la CIA
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum