Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Can't complete smartctl test, host reset [SOLVED]
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Tony0945
Veteran
Veteran


Joined: 25 Jul 2006
Posts: 1361

PostPosted: Fri Nov 18, 2016 4:14 pm    Post subject: Can't complete smartctl test, host reset [SOLVED] Reply with quote

And kern log has a lot of entries like this:
Code:
Nov 17 19:50:14 X3 kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 0x90202 action 0xe frozen
Nov 17 19:50:14 X3 kernel: ata5: irq_stat 0x00400000, PHY RDY changed
Nov 17 19:50:14 X3 kernel: ata5: SError: { RecovComm Persist PHYRdyChg 10B8B }
Nov 17 19:50:14 X3 kernel: ata5: hard resetting link
Nov 17 19:50:22 X3 kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Nov 17 19:50:22 X3 kernel: ata5.00: configured for UDMA/133
Nov 17 19:50:22 X3 kernel: ata5: EH complete


Any idea what this means? Drive is WD Black 5TB, one year old.

EDIT:
I did the following from TTY-1:
Code:
/etc/init.d/samba stop
/etc/init.d/minidlna stop
/etc/init.d/xdm stop
umount /dev/sdc2
smartctl -t long /dev/sdc
And it completed. However, the test is supposed to run on a mounted drive, so I still want to know what was wrong.

Last edited by Tony0945 on Thu Dec 01, 2016 11:41 pm; edited 1 time in total
Back to top
View user's profile Send private message
christoph_peter_s
n00b
n00b


Joined: 30 Nov 2015
Posts: 19

PostPosted: Mon Nov 21, 2016 1:03 pm    Post subject: Reply with quote

After a short Google search...
http://unix.stackexchange.com/questions/217113/what-causes-the-ata-exceptions-in-my-syslog-and-how-to-solve-them
I'd try to go along that road first.
Back to top
View user's profile Send private message
Tony0945
Veteran
Veteran


Joined: 25 Jul 2006
Posts: 1361

PostPosted: Mon Nov 21, 2016 5:01 pm    Post subject: Reply with quote

Thanks for the reply but I don't know which road you mean since that link is about different errors. The BIOS was updated to the latest years ago (it's an old board), smartcl showed no errors when running with the disk unmounted. I don't use grub2, so the referenced file doesn't exist.

I don't think it's in IDE mode because:
Code:
 # hdparm -t -T /dev/sdc

/dev/sdc:
 Timing cached reads:   4026 MB in  2.00 seconds = 2013.28 MB/sec
 Timing buffered disk reads: 576 MB in  3.01 seconds = 191.49 MB/sec
Looks like it's running at a good clip.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 36559
Location: 56N 3W

PostPosted: Mon Nov 21, 2016 7:35 pm    Post subject: Reply with quote

Tony0945,

Code:
PHY RDY changed
is probably a spin speed issue.

The drive asserts ready once its done all its self checks and its up to speed.
Spin up is what takes the longest.
Move the drive so that the gravity vector is in a different plane to the one you have been running it in.
Check the 12v to the drive.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
christoph_peter_s
n00b
n00b


Joined: 30 Nov 2015
Posts: 19

PostPosted: Tue Nov 22, 2016 1:32 am    Post subject: Reply with quote

Tony0945 wrote:
Thanks for the reply but I don't know which road you mean since that link is about different errors.


Well, what I mean, is to first doublecheck, that You really have no H/W issue. If I were You, I'd try a known good disk - as I wouldn't trust the disk until it is proven, that the disk is absolutely OK. And after that, I would begin double check any piece of S/W related to the disk. Bios of the board, firmware of the disk, try a different kernel - and so on. The simplest things first, the more tedious things later.

Btw, I had troubles with my file server for more than a year. In the end it turned out to be the cable between the RAID controller and one HD. But the remark of Neddy looks pretty plausible. But the spin-up time is one of the SMART values, You most likely did doublecheck already. Maybe there is a jumper on the HD for delayed spin-up? It might be on the edge of being OK. Or maybe the power supply is starting to fail. A subtle problem can be difficult to track down. An old principle of debugging is, to check all individual components in a known good environment - but typically You can't do this for Your private machine as You would need to have a properly functioning identical machine and then start swapping devices. Anyway, double-check the SMART data, try a different SATA cable (and maybe a different mainboard port).
Back to top
View user's profile Send private message
Tony0945
Veteran
Veteran


Joined: 25 Jul 2006
Posts: 1361

PostPosted: Tue Nov 22, 2016 5:26 pm    Post subject: Reply with quote

Swapping ports & cables is easy. I'll try that first. BTW, this isn't a disk drive, it's a hard drive, so I can't put a different disk in it.

Also can try cutting back on the unmounting steps that worked. i.e. shut down the applications that accessed the disk but don't unmount. if that's OK, start leaving services running until it fails agin to locate the service that's resetting it (if any). PS is not that old and it's running on a UPS, but, yes, they might be problems.
Back to top
View user's profile Send private message
Tony0945
Veteran
Veteran


Joined: 25 Jul 2006
Posts: 1361

PostPosted: Thu Dec 01, 2016 11:41 pm    Post subject: Reply with quote

Swapped port and that didn't help. In fact, it may have caused a problem on the drive because it switched from sdc to sdb and fstab tried to load it as JFS instead of ext4. fsck, fixed that.

Replaced the SATA cable. Problem seems solved. No more entries in kern.log and hdparm shows the drive running at full speed.

Ordered some black SATA cables from Cable Matters. I have a few older orange cables that I like to use for ata1 so I know which will be the boot drive, but I'd like to color code all the cables. Right now, the box that had the problem has three red cables.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 36559
Location: 56N 3W

PostPosted: Fri Dec 02, 2016 11:31 am    Post subject: Reply with quote

Tony0945,

If you run smartctl with the new cable, it should show that there have been no internal drive errors.
It may (or may not) show interface errors.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Tony0945
Veteran
Veteran


Joined: 25 Jul 2006
Posts: 1361

PostPosted: Fri Dec 02, 2016 4:13 pm    Post subject: Reply with quote

NeddySeagoon wrote:
If you run smartctl with the new cable, it should show that there have been no internal drive errors.
It may (or may not) show interface errors.


I ran the short test with no errors. I'll run the long test tonight as it takes about 9 hours on this 5TB drive.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum