View previous topic :: View next topic |
Author |
Message |
Tony0945 Watchman
Joined: 25 Jul 2006 Posts: 5127 Location: Illinois, USA
|
Posted: Fri Nov 18, 2016 4:14 pm Post subject: Can't complete smartctl test, host reset [SOLVED] |
|
|
And kern log has a lot of entries like this: Code: | Nov 17 19:50:14 X3 kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 0x90202 action 0xe frozen
Nov 17 19:50:14 X3 kernel: ata5: irq_stat 0x00400000, PHY RDY changed
Nov 17 19:50:14 X3 kernel: ata5: SError: { RecovComm Persist PHYRdyChg 10B8B }
Nov 17 19:50:14 X3 kernel: ata5: hard resetting link
Nov 17 19:50:22 X3 kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Nov 17 19:50:22 X3 kernel: ata5.00: configured for UDMA/133
Nov 17 19:50:22 X3 kernel: ata5: EH complete |
Any idea what this means? Drive is WD Black 5TB, one year old.
EDIT:
I did the following from TTY-1: Code: | /etc/init.d/samba stop
/etc/init.d/minidlna stop
/etc/init.d/xdm stop
umount /dev/sdc2
smartctl -t long /dev/sdc | And it completed. However, the test is supposed to run on a mounted drive, so I still want to know what was wrong.
Last edited by Tony0945 on Thu Dec 01, 2016 11:41 pm; edited 1 time in total |
|
Back to top |
|
|
christoph_peter_s Tux's lil' helper
Joined: 30 Nov 2015 Posts: 106
|
|
Back to top |
|
|
Tony0945 Watchman
Joined: 25 Jul 2006 Posts: 5127 Location: Illinois, USA
|
Posted: Mon Nov 21, 2016 5:01 pm Post subject: |
|
|
Thanks for the reply but I don't know which road you mean since that link is about different errors. The BIOS was updated to the latest years ago (it's an old board), smartcl showed no errors when running with the disk unmounted. I don't use grub2, so the referenced file doesn't exist.
I don't think it's in IDE mode because: Code: | # hdparm -t -T /dev/sdc
/dev/sdc:
Timing cached reads: 4026 MB in 2.00 seconds = 2013.28 MB/sec
Timing buffered disk reads: 576 MB in 3.01 seconds = 191.49 MB/sec
| Looks like it's running at a good clip. |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54214 Location: 56N 3W
|
Posted: Mon Nov 21, 2016 7:35 pm Post subject: |
|
|
Tony0945,
is probably a spin speed issue.
The drive asserts ready once its done all its self checks and its up to speed.
Spin up is what takes the longest.
Move the drive so that the gravity vector is in a different plane to the one you have been running it in.
Check the 12v to the drive. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
christoph_peter_s Tux's lil' helper
Joined: 30 Nov 2015 Posts: 106
|
Posted: Tue Nov 22, 2016 1:32 am Post subject: |
|
|
Tony0945 wrote: | Thanks for the reply but I don't know which road you mean since that link is about different errors. |
Well, what I mean, is to first doublecheck, that You really have no H/W issue. If I were You, I'd try a known good disk - as I wouldn't trust the disk until it is proven, that the disk is absolutely OK. And after that, I would begin double check any piece of S/W related to the disk. Bios of the board, firmware of the disk, try a different kernel - and so on. The simplest things first, the more tedious things later.
Btw, I had troubles with my file server for more than a year. In the end it turned out to be the cable between the RAID controller and one HD. But the remark of Neddy looks pretty plausible. But the spin-up time is one of the SMART values, You most likely did doublecheck already. Maybe there is a jumper on the HD for delayed spin-up? It might be on the edge of being OK. Or maybe the power supply is starting to fail. A subtle problem can be difficult to track down. An old principle of debugging is, to check all individual components in a known good environment - but typically You can't do this for Your private machine as You would need to have a properly functioning identical machine and then start swapping devices. Anyway, double-check the SMART data, try a different SATA cable (and maybe a different mainboard port). |
|
Back to top |
|
|
Tony0945 Watchman
Joined: 25 Jul 2006 Posts: 5127 Location: Illinois, USA
|
Posted: Tue Nov 22, 2016 5:26 pm Post subject: |
|
|
Swapping ports & cables is easy. I'll try that first. BTW, this isn't a disk drive, it's a hard drive, so I can't put a different disk in it.
Also can try cutting back on the unmounting steps that worked. i.e. shut down the applications that accessed the disk but don't unmount. if that's OK, start leaving services running until it fails agin to locate the service that's resetting it (if any). PS is not that old and it's running on a UPS, but, yes, they might be problems. |
|
Back to top |
|
|
Tony0945 Watchman
Joined: 25 Jul 2006 Posts: 5127 Location: Illinois, USA
|
Posted: Thu Dec 01, 2016 11:41 pm Post subject: |
|
|
Swapped port and that didn't help. In fact, it may have caused a problem on the drive because it switched from sdc to sdb and fstab tried to load it as JFS instead of ext4. fsck, fixed that.
Replaced the SATA cable. Problem seems solved. No more entries in kern.log and hdparm shows the drive running at full speed.
Ordered some black SATA cables from Cable Matters. I have a few older orange cables that I like to use for ata1 so I know which will be the boot drive, but I'd like to color code all the cables. Right now, the box that had the problem has three red cables. |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54214 Location: 56N 3W
|
Posted: Fri Dec 02, 2016 11:31 am Post subject: |
|
|
Tony0945,
If you run smartctl with the new cable, it should show that there have been no internal drive errors.
It may (or may not) show interface errors. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
Tony0945 Watchman
Joined: 25 Jul 2006 Posts: 5127 Location: Illinois, USA
|
Posted: Fri Dec 02, 2016 4:13 pm Post subject: |
|
|
NeddySeagoon wrote: | If you run smartctl with the new cable, it should show that there have been no internal drive errors.
It may (or may not) show interface errors. |
I ran the short test with no errors. I'll run the long test tonight as it takes about 9 hours on this 5TB drive. |
|
Back to top |
|
|
|