Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Dying hard disk, or is it?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Ian Goldby
Guru
Guru


Joined: 18 May 2002
Posts: 539
Location: (Inactive member)

PostPosted: Sun Jul 27, 2003 3:16 pm    Post subject: Dying hard disk, or is it? Reply with quote

A few days ago I got nervous enough about my hard disk to prompt me into doing a proper backup :oops:. I was doing a big emerge, and came back to find that the hard disk had stopped spinning, the system was completely frozen, and the light on the front of the case was doing morse code.

I rebooted, but after a short while I heard the hard disk motor straining, and then it stalled, started up again, stalled, and then started clicking forlornly.

I turned off the power and left it for a while. Later that day, I tried making a tar archive of important stuff to write out to a CD. About half way through, the hard disk stalled again. So I turned everything off.

A few days later, I succeeded in making the backup, and today it's been running for some time without any problems.

Now, I would diagnose that as a definite hard disk problem, possibly heat-related as it's been quite warm the last week or so. BUT: A few months ago, my HP 9100 CD writer failed in a similar sort of way. It works fine for a while after turning everything on, but then starts clicking and refusing to accept any disks. I've never known one of these units to fail before.

I could just go out and by a new hard disk, and a new CD writer, but I wonder if the two apparent failures are related. Maybe the real problem is something on the motherboard, or maybe even the power supply going under-voltage or something. But I'm just guessing.

I'd like to know if anyone else has had problems similar to this, and if so, how you solved it. Would you go with the coincidence theory, or have you had experience of something else that can cause various disks to appear to fail?

Thanks
Back to top
View user's profile Send private message
madmango
Guru
Guru


Joined: 15 Jul 2003
Posts: 507
Location: PA, USA

PostPosted: Sun Jul 27, 2003 4:09 pm    Post subject: Reply with quote

Hmm. I've had the same problem, but only on my *ancient* hds, the ones that came with my 486 (250mb). Before you think it is just a coincidence, try testing the p/s; get a voltmeter and measure the sucka.

Other things to try: Put the affected disks in another box, see if they have similar problems. Try a new drive controller board.

If all else fails, it's your disks. Good thing you backed up!
_________________
word.
Back to top
View user's profile Send private message
Ian Goldby
Guru
Guru


Joined: 18 May 2002
Posts: 539
Location: (Inactive member)

PostPosted: Sun Jul 27, 2003 8:43 pm    Post subject: Reply with quote

Thanks for the suggestions. I can measure the power supply voltage (although the problems tend not to occur on cue, so I'll have to be lucky with getting the DMM to it at a critical moment). I don't have another box to swap things with which is a pity.
Back to top
View user's profile Send private message
dma
Guru
Guru


Joined: 31 Jan 2003
Posts: 437
Location: Charlotte, NC, USA

PostPosted: Sun Jul 27, 2003 11:29 pm    Post subject: Reply with quote

If it is a fairly new hard drive you can try this for onboard self-diagnostics:

Code:
emerge ide-smart smartmontools


Here's some output from smartmontools (sorry forum users if this is overkill):

Code:
root@laureate:~# smartctl -a /dev/hdb
smartctl version 5.1-14 Copyright (C) 2002-3 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD1200JB-00CRA1
Serial Number:    WD-WMA8C3706891
Firmware Version: 17.07W17
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   5
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sun Jul 27 19:27:22 2003 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Off-line data collection status: (0x02) Offline data collection activity was
                                        completed without error.
                                        Auto Off-line Data Collection: Disabled.
Self-test execution status:      (  40) The self-test routine was interrupted
                                        by the host with a hard or soft reset.
Total time to complete off-line
data collection:                 (4680) seconds.
Offline data collection
capabilities:                    (0x3b) SMART execute Offline immediate.
                                        Automatic timer ON/OFF support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        No General Purpose Logging support.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  87) minutes.
Extended self-test routine
recommended polling time:        (   5) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0007   095   093   021    Pre-fail  Always       -       6025
  4 Start_Stop_Count        0x0032   100   100   040    Old_age   Always       -       35
  5 Reallocated_Sector_Ct   0x0033   199   199   140    Pre-fail  Always       -       7
  7 Seek_Error_Rate         0x000b   200   200   051    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   096   096   000    Old_age   Always       -       3224
 10 Spin_Retry_Count        0x0013   100   253   051    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0013   100   253   051    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       29
196 Reallocated_Event_Count 0x0032   196   196   000    Old_age   Always       -       4
197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0012   200   200   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x000a   200   253   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0009   200   200   051    Pre-fail  Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log, version number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended captive    Interrupted (host reset)      80%       483         -
# 2  Extended captive    Interrupted (host reset)      80%       483         -
# 3  Short off-line      Aborted by host               20%       483         -
# 4  Short off-line      Aborted by host               80%       483         -
# 5  Short off-line      Aborted by host               10%       483         -


And ide-smart:
Code:
root@laureate:~# ide-smart /dev/hdb
Id=  1  Status=11  {Prefailure  Online }  Value=200  Threshold= 51  Passed
Id=  3  Status= 7  {Prefailure  Online }  Value= 95  Threshold= 21  Passed
Id=  4  Status=50  {Advisory    Online }  Value=100  Threshold= 40  Passed
Id=  5  Status=51  {Prefailure  Online }  Value=199  Threshold=140  Passed
Id=  7  Status=11  {Prefailure  Online }  Value=200  Threshold= 51  Passed
Id=  9  Status=50  {Advisory    Online }  Value= 96  Threshold=  0  Passed
Id= 10  Status=19  {Prefailure  Online }  Value=100  Threshold= 51  Passed
Id= 11  Status=19  {Prefailure  Online }  Value=100  Threshold= 51  Passed
Id= 12  Status=50  {Advisory    Online }  Value=100  Threshold=  0  Passed
Id=196  Status=50  {Advisory    Online }  Value=196  Threshold=  0  Passed
Id=197  Status=18  {Advisory    Online }  Value=200  Threshold=  0  Passed
Id=198  Status=18  {Advisory    Online }  Value=200  Threshold=  0  Passed
Id=199  Status=10  {Advisory    Online }  Value=200  Threshold=  0  Passed
Id=200  Status= 9  {Prefailure  OffLine}  Value=200  Threshold= 51  Passed
OffLineStatus=2 {Completed}, AutoOffLine=No, OffLineTimeout=78 minutes
OffLineCapability=59 {Immediate Auto SuspendOnCmd}
SmartRevision=16, CheckSum=233, SmartCapability=3 {SaveOnStandBy AutoSave}


Looks like my hard drive isn't complaining.
Back to top
View user's profile Send private message
AgenT
Apprentice
Apprentice


Joined: 18 May 2003
Posts: 280

PostPosted: Mon Jul 28, 2003 11:06 pm    Post subject: Reply with quote

These two programs are really nice, thanks!
Back to top
View user's profile Send private message
Ian Goldby
Guru
Guru


Joined: 18 May 2002
Posts: 539
Location: (Inactive member)

PostPosted: Tue Jul 29, 2003 8:08 pm    Post subject: Reply with quote

Thanks. I tried them, and as far as I could understand it, there were no errors or warnings from my disk.
Back to top
View user's profile Send private message
pmjdebruijn
Guru
Guru


Joined: 24 Jul 2003
Posts: 506
Location: Sittard, The Netherlands

PostPosted: Wed Jul 30, 2003 11:19 am    Post subject: Reply with quote

Do you have correct cabling? certified for ATA100 or ATA133?

Also a friend of mine has his harddisks cooled with a silent 12x12cm fan, and the harddrive temperate dorps are just amazing... Without the fan they run at about 40 degrees celcius, with the fan the temperature drops just below 20 degress celcius...

So you might consider installing a fan...
At least if temperature is your problem!
Back to top
View user's profile Send private message
Robelix
l33t
l33t


Joined: 21 Jul 2002
Posts: 760
Location: in a World created by a Flying Spaghetti Monster

PostPosted: Wed Jul 30, 2003 11:08 pm    Post subject: Reply with quote

40° is usally no problem for a HD, but if it gets 50° or more it's going to be dangerous.
I use a cheap 80mm standard-fan connected to 5V. Enough to cool the disks down about 10° - and you don't hear it.

Robelix
_________________
mysql> SELECT question FROM life, universe, everything WHERE answer=42;
Empty set (2079460347 sec)
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum