View previous topic :: View next topic |
Author |
Message |
Ian Goldby Guru
Joined: 18 May 2002 Posts: 539 Location: (Inactive member)
|
Posted: Sun Jul 27, 2003 3:16 pm Post subject: Dying hard disk, or is it? |
|
|
A few days ago I got nervous enough about my hard disk to prompt me into doing a proper backup . I was doing a big emerge, and came back to find that the hard disk had stopped spinning, the system was completely frozen, and the light on the front of the case was doing morse code.
I rebooted, but after a short while I heard the hard disk motor straining, and then it stalled, started up again, stalled, and then started clicking forlornly.
I turned off the power and left it for a while. Later that day, I tried making a tar archive of important stuff to write out to a CD. About half way through, the hard disk stalled again. So I turned everything off.
A few days later, I succeeded in making the backup, and today it's been running for some time without any problems.
Now, I would diagnose that as a definite hard disk problem, possibly heat-related as it's been quite warm the last week or so. BUT: A few months ago, my HP 9100 CD writer failed in a similar sort of way. It works fine for a while after turning everything on, but then starts clicking and refusing to accept any disks. I've never known one of these units to fail before.
I could just go out and by a new hard disk, and a new CD writer, but I wonder if the two apparent failures are related. Maybe the real problem is something on the motherboard, or maybe even the power supply going under-voltage or something. But I'm just guessing.
I'd like to know if anyone else has had problems similar to this, and if so, how you solved it. Would you go with the coincidence theory, or have you had experience of something else that can cause various disks to appear to fail?
Thanks |
|
Back to top |
|
|
madmango Guru
Joined: 15 Jul 2003 Posts: 507 Location: PA, USA
|
Posted: Sun Jul 27, 2003 4:09 pm Post subject: |
|
|
Hmm. I've had the same problem, but only on my *ancient* hds, the ones that came with my 486 (250mb). Before you think it is just a coincidence, try testing the p/s; get a voltmeter and measure the sucka.
Other things to try: Put the affected disks in another box, see if they have similar problems. Try a new drive controller board.
If all else fails, it's your disks. Good thing you backed up! _________________ word. |
|
Back to top |
|
|
Ian Goldby Guru
Joined: 18 May 2002 Posts: 539 Location: (Inactive member)
|
Posted: Sun Jul 27, 2003 8:43 pm Post subject: |
|
|
Thanks for the suggestions. I can measure the power supply voltage (although the problems tend not to occur on cue, so I'll have to be lucky with getting the DMM to it at a critical moment). I don't have another box to swap things with which is a pity. |
|
Back to top |
|
|
dma Guru
Joined: 31 Jan 2003 Posts: 437 Location: Charlotte, NC, USA
|
Posted: Sun Jul 27, 2003 11:29 pm Post subject: |
|
|
If it is a fairly new hard drive you can try this for onboard self-diagnostics:
Code: | emerge ide-smart smartmontools |
Here's some output from smartmontools (sorry forum users if this is overkill):
Code: | root@laureate:~# smartctl -a /dev/hdb
smartctl version 5.1-14 Copyright (C) 2002-3 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Device Model: WDC WD1200JB-00CRA1
Serial Number: WD-WMA8C3706891
Firmware Version: 17.07W17
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 5
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Sun Jul 27 19:27:22 2003 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Off-line data collection status: (0x02) Offline data collection activity was
completed without error.
Auto Off-line Data Collection: Disabled.
Self-test execution status: ( 40) The self-test routine was interrupted
by the host with a hard or soft reset.
Total time to complete off-line
data collection: (4680) seconds.
Offline data collection
capabilities: (0x3b) SMART execute Offline immediate.
Automatic timer ON/OFF support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
No General Purpose Logging support.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 87) minutes.
Extended self-test routine
recommended polling time: ( 5) minutes.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0007 095 093 021 Pre-fail Always - 6025
4 Start_Stop_Count 0x0032 100 100 040 Old_age Always - 35
5 Reallocated_Sector_Ct 0x0033 199 199 140 Pre-fail Always - 7
7 Seek_Error_Rate 0x000b 200 200 051 Pre-fail Always - 0
9 Power_On_Hours 0x0032 096 096 000 Old_age Always - 3224
10 Spin_Retry_Count 0x0013 100 253 051 Pre-fail Always - 0
11 Calibration_Retry_Count 0x0013 100 253 051 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 29
196 Reallocated_Event_Count 0x0032 196 196 000 Old_age Always - 4
197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0012 200 200 000 Old_age Always - 0
199 UDMA_CRC_Error_Count 0x000a 200 253 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0009 200 200 051 Pre-fail Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log, version number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended captive Interrupted (host reset) 80% 483 -
# 2 Extended captive Interrupted (host reset) 80% 483 -
# 3 Short off-line Aborted by host 20% 483 -
# 4 Short off-line Aborted by host 80% 483 -
# 5 Short off-line Aborted by host 10% 483 -
|
And ide-smart:
Code: | root@laureate:~# ide-smart /dev/hdb
Id= 1 Status=11 {Prefailure Online } Value=200 Threshold= 51 Passed
Id= 3 Status= 7 {Prefailure Online } Value= 95 Threshold= 21 Passed
Id= 4 Status=50 {Advisory Online } Value=100 Threshold= 40 Passed
Id= 5 Status=51 {Prefailure Online } Value=199 Threshold=140 Passed
Id= 7 Status=11 {Prefailure Online } Value=200 Threshold= 51 Passed
Id= 9 Status=50 {Advisory Online } Value= 96 Threshold= 0 Passed
Id= 10 Status=19 {Prefailure Online } Value=100 Threshold= 51 Passed
Id= 11 Status=19 {Prefailure Online } Value=100 Threshold= 51 Passed
Id= 12 Status=50 {Advisory Online } Value=100 Threshold= 0 Passed
Id=196 Status=50 {Advisory Online } Value=196 Threshold= 0 Passed
Id=197 Status=18 {Advisory Online } Value=200 Threshold= 0 Passed
Id=198 Status=18 {Advisory Online } Value=200 Threshold= 0 Passed
Id=199 Status=10 {Advisory Online } Value=200 Threshold= 0 Passed
Id=200 Status= 9 {Prefailure OffLine} Value=200 Threshold= 51 Passed
OffLineStatus=2 {Completed}, AutoOffLine=No, OffLineTimeout=78 minutes
OffLineCapability=59 {Immediate Auto SuspendOnCmd}
SmartRevision=16, CheckSum=233, SmartCapability=3 {SaveOnStandBy AutoSave} |
Looks like my hard drive isn't complaining. |
|
Back to top |
|
|
AgenT Apprentice
Joined: 18 May 2003 Posts: 280
|
Posted: Mon Jul 28, 2003 11:06 pm Post subject: |
|
|
These two programs are really nice, thanks! |
|
Back to top |
|
|
Ian Goldby Guru
Joined: 18 May 2002 Posts: 539 Location: (Inactive member)
|
Posted: Tue Jul 29, 2003 8:08 pm Post subject: |
|
|
Thanks. I tried them, and as far as I could understand it, there were no errors or warnings from my disk. |
|
Back to top |
|
|
pmjdebruijn Guru
Joined: 24 Jul 2003 Posts: 506 Location: Sittard, The Netherlands
|
Posted: Wed Jul 30, 2003 11:19 am Post subject: |
|
|
Do you have correct cabling? certified for ATA100 or ATA133?
Also a friend of mine has his harddisks cooled with a silent 12x12cm fan, and the harddrive temperate dorps are just amazing... Without the fan they run at about 40 degrees celcius, with the fan the temperature drops just below 20 degress celcius...
So you might consider installing a fan...
At least if temperature is your problem! |
|
Back to top |
|
|
Robelix l33t
Joined: 21 Jul 2002 Posts: 760 Location: in a World created by a Flying Spaghetti Monster
|
Posted: Wed Jul 30, 2003 11:08 pm Post subject: |
|
|
40° is usally no problem for a HD, but if it gets 50° or more it's going to be dangerous.
I use a cheap 80mm standard-fan connected to 5V. Enough to cool the disks down about 10° - and you don't hear it.
Robelix _________________ mysql> SELECT question FROM life, universe, everything WHERE answer=42;
Empty set (2079460347 sec) |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|