[Solved] SATA drives mess

creaker · l33t Joined: 14 Jul 2012 Posts: 651

I faced with an amazing error. It even isn't an error, I do not know how to classify it.
OK, I have a server that running samba. Box has two sata drives onboard - sda and sdb.
I listened to music over the samba share and suddenly amarok stoped to play. I checked for shares and found that one of two shares disappears, mount doesn't show share mounted. OK, I logged into server over ssh and checked samba - seems it works. Dmesg doesn't show any fault. But running blkid I noticed very strange thing: one of my drives for some strange reason became /dev/sdc... Initially it was /dev/sda.

lagalopex · Guru Joined: 16 Oct 2004 Posts: 562

Have a look at the output of dmesg when this happens.
You may also check your sda/sdc's smart values. (sys-apps/smartmontools => smartctl -a /dev/sda) and perhaps running some tests.

Anon-E-moose · Posted: Thu Sep 18, 2014 10:49 am Post subject: Re: SATA drives mess

creaker · l33t Joined: 14 Jul 2012 Posts: 651

ОК, have to wait for it occurs once more to check dmesg, though I checked dmesg and doesn't notice any message added to it since boot.

lagalopex · Guru Joined: 16 Oct 2004 Posts: 562

The smart values can be checked at any time, the scores are saved within the disks firmware.
Your sig mentions 250Gb hdd... this size is quite old. Maybe the disks just fail? (=> look at the smart values!)

creaker · l33t Joined: 14 Jul 2012 Posts: 651

I do not see any errors in smartctl output. Complete smart report:
http://pastebin.com/Ba11HCXp
No any fault or slowness on data accessing/filesystem navigation, nothing unusual.

NeddySeagoon · Posted: Thu Sep 18, 2014 7:27 pm Post subject:

creaker,

The failure mechanism is that the drive is lost to the system but udev has not noticed the disconnect before the drive comes back.
As /dev/sda han not been released, the drive is allocated /dev/sdc. When udev catches up, it removes /dev/sda

I would expect dmesg to contain a similar set of messages as it produces for a USB drive disconnect and reconnect, unless the box rebooted.
Is the uptime correct?

The smart log looks good.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.

creaker · l33t Joined: 14 Jul 2012 Posts: 651

I've caught it. The dmesg tail:

creaker · l33t Joined: 14 Jul 2012 Posts: 651

Scanned for lost drive at all available hosts:

Akkara · Posted: Fri Sep 19, 2014 8:24 am Post subject:

I've seen that happen when a SATA cable is loose. Try jiggling them while watching dmesg
_________________
Many think that Dilbert is a comic. Unfortunately it is a documentary.

Anon-E-moose · Posted: Fri Sep 19, 2014 9:42 am Post subject:

Everything between the exception and the hard reset means that the sata controller and the sata device lost communication.
As Akkara said it could be a loose cable.
Or it could be the other things I mentioned earlier.

I've swapped out my non-locking sata cables for the ones that lock, in the past.

creaker · l33t Joined: 14 Jul 2012 Posts: 651

OK, I've replaced both power and data cords. Let see will it solve a problem.
Though I doubt it is a cable issue due to two reasons:
1. If it's a data cable failure it should be mentioned in smart output, but seems no data transmission issues there:

lagalopex · Guru Joined: 16 Oct 2004 Posts: 562

smartctl also mentions a possible firmware update CC49:

creaker · l33t Joined: 14 Jul 2012 Posts: 651

Server used intensively for the last two days and seems cords replacing solved a problem.
Thanks for helping, guys!