Forums

Skip to content

Advanced search
  • Quick links
    • Unanswered topics
    • Active topics
    • Search
  • FAQ
  • Login
  • Register
  • Board index Assistance Kernel & Hardware
  • Search

SATA HW errors with AMD SB950 controller [workaround]

Kernel not recognizing your hardware? Problems with power management or PCMCIA? What hardware is compatible with Gentoo? See here. (Only for kernels supported by Gentoo.)
Post Reply
Advanced search
8 posts • Page 1 of 1
Author
Message
Zucca
Administrator
Administrator
User avatar
Posts: 4698
Joined: Thu Jun 14, 2007 10:31 pm
Location: Rasi, Finland
Contact:
Contact Zucca
Website

SATA HW errors with AMD SB950 controller [workaround]

  • Quote

Post by Zucca » Fri Apr 07, 2017 10:29 pm

I bought a new motherboard. I was about to chroot from live os and then recompile kernel to fit the new hardware. But...

Let me first explain my hard drive setup:
Six SSDs. I have bought maximum of two at the time. All are set as eSATA (hotswap) because those all reside in 5.25" bay hotswap cage. The cage itself is a simple passtrough device. It only has indicator leds for each drive. And them to work the drive needs to support an activity led.

Code: Select all

NAME      MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda         8:0    1 238.5G  0 disk  
├─sda1      8:1    1   512M  0 part  
├─sda2      8:2    1   3.5G  0 part  
│ └─md126   9:126  0  17.4G  0 raid5 
└─sda3      8:3    1 234.5G  0 part
sdb         8:16   1 111.8G  0 disk  
├─sdb1      8:17   1   512M  0 part  
├─sdb2      8:18   1   3.5G  0 part  
│ └─md126   9:126  0  17.4G  0 raid5 
└─sdb3      8:19   1 107.8G  0 part  
sdc         8:32   1 489.1G  0 disk  
├─sdc1      8:33   1   512M  0 part  
├─sdc2      8:34   1   3.5G  0 part  
│ └─md126   9:126  0  17.4G  0 raid5 
└─sdc3      8:35   1 485.1G  0 part  
sdd         8:48   1 447.1G  0 disk  
├─sdd1      8:49   1   512M  0 part  
├─sdd2      8:50   1   3.5G  0 part  
│ └─md126   9:126  0  17.4G  0 raid5 
└─sdd3      8:51   1 443.1G  0 part  
sde         8:64   1 447.1G  0 disk  
├─sde1      8:65   1   512M  0 part  
├─sde2      8:66   1   3.5G  0 part  
│ └─md126   9:126  0  17.4G  0 raid5 
└─sde3      8:67   1 443.1G  0 part  
sdf         8:80   1 447.1G  0 disk  
├─sdf1      8:81   1   512M  0 part  
├─sdf2      8:82   1   3.5G  0 part  
│ └─md126   9:126  0  17.4G  0 raid5 
└─sdf3      8:83   1 443.1G  0 part
  • First partition of each device was a /boot partition on mdraid1. I'm not sure if I have lost that raid stack... All the drives appers as spares now. At first boot, at least, the partition was avalable
  • md126 is/was my swap partition on raid5 for hibernate image. I reformatted it to ext4 to make test. I dumped data from /dev/urandom to (almost) fill the partition. The data going in and out from the partition had the same md5sum (dropping write cache in between). While I did that I didn't receive any errors. However eralier srubbing the raid device did produce errors on at least four sata busses
  • The last, third, partition of each device is btrfs filesystem. Reading and writing to it works. Although I've been mounting it ro since I started to investigate this problem. I have backups there also which I have backupped further into my server.
The problem
I enounter lots of ata errors. But somehow things do still work. The only exception was that I lost the (six drive) raid1 array as spares. Before when that raid1 array worked it usually got stuck when trying to umount it, but the system didn't froze. I haven't tried to assemble it yet. The data might be there, but I also have backups. Also it's only /boot.

Here's some more information:
Motherboard: ASRock 970M Pro3

Code: Select all

Linux livecd 4.5.2-aufs-r1 #1 SMP Sun Jul 3 17:17:11 UTC 2016 x86_64 AMD FX(tm)-8350 Eight-Core Processor AuthenticAMD GNU/Linux

Code: Select all

00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890 PCI to PCI bridge (external gfx0 port B) (rev 02)
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD/ATI] RD990 I/O Memory Management Unit (IOMMU)
00:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890 PCI to PCI bridge (PCI express gpp port B)
00:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890 PCI to PCI bridge (PCI express gpp port D)
00:09.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890 PCI to PCI bridge (PCI express gpp port H)
00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode] (rev 40)
00:12.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:12.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:13.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:13.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 SMBus Controller (rev 42)
00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 Azalia (Intel HDA) (rev 40)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 LPC host controller (rev 40)
00:14.4 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 PCI to PCI Bridge (rev 40)
00:14.5 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI2 Controller
00:15.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB700/SB800/SB900 PCI to PCI bridge (PCIE port 0)
00:15.3 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB900 PCI to PCI bridge (PCIE port 3)
00:16.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:16.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 5
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Fiji [Radeon R9 FURY / NANO Series] (rev ca)
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Fiji HDMI/DP Audio Controller
02:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] (rev 20)
03:00.0 USB controller: Etron Technology, Inc. EJ188/EJ198 USB 3.0 Host Controller
06:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)

Code: Select all

[    1.723018] ata1: SATA max UDMA/133 abar m1024@0xfeb0b000 port 0xfeb0b100 irq 19
[    1.723021] ata2: SATA max UDMA/133 abar m1024@0xfeb0b000 port 0xfeb0b180 irq 19
[    1.723023] ata3: SATA max UDMA/133 abar m1024@0xfeb0b000 port 0xfeb0b200 irq 19
[    1.723025] ata4: SATA max UDMA/133 abar m1024@0xfeb0b000 port 0xfeb0b280 irq 19
[    1.723027] ata5: SATA max UDMA/133 abar m1024@0xfeb0b000 port 0xfeb0b300 irq 19
[    1.723029] ata6: SATA max UDMA/133 abar m1024@0xfeb0b000 port 0xfeb0b380 irq 19
[    2.178802] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    2.179794] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    2.179810] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    2.179825] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    2.179842] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    2.180125] ata3.00: supports DRM functions and may not be fully accessible
[    2.180279] ata1.00: ATA-9: SAMSUNG SSD 830 Series, CXM03B1Q, max UDMA/133
[    2.180281] ata1.00: 500118192 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[    2.180484] ata3.00: ATA-10: Crucial_CT525MX300SSD1,  M0CR040, max UDMA/133
[    2.180486] ata3.00: 1025610768 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[    2.180559] ata1.00: configured for UDMA/133
[    2.180792] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    2.181071] ata6.00: ATA-11: KINGSTON SUV400S37480G, 0C3FD6SD, max UDMA/133
[    2.181073] ata6.00: 937703088 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
[    2.181197] ata3.00: supports DRM functions and may not be fully accessible
[    2.181452] ata6.00: configured for UDMA/133
[    2.182044] ata3.00: configured for UDMA/133
[    2.186906] ata2.00: ATA-8: KINGSTON SV300S37A120G, 600ABBF0, max UDMA/133
[    2.186908] ata2.00: 234441648 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
[    2.186991] ata5.00: ATA-8: KINGSTON SV300S37A480G, 605ABBF2, max UDMA/133
[    2.186993] ata5.00: 937703088 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
[    2.187450] ata4.00: ATA-8: KINGSTON SV300S37A480G, 605ABBF2, max UDMA/133
[    2.187452] ata4.00: 937703088 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
[    2.192499] ata5.00: configured for UDMA/133
[    2.192933] ata4.00: configured for UDMA/133
[    2.193052] ata2.00: configured for UDMA/133
[    2.220890] ata3.00: Enabling discard_zeroes_data
[    2.221124] ata3.00: Enabling discard_zeroes_data
[    2.221476] ata3.00: Enabling discard_zeroes_data
[   14.640192] ata6.00: exception Emask 0x10 SAct 0x1800000 SErr 0x400000 action 0x6 frozen
[   14.640194] ata6.00: irq_stat 0x08000000, interface fatal error
[   14.640196] ata6: SError: { Handshk }
[   14.640199] ata6.00: failed command: WRITE FPDMA QUEUED
[   14.640202] ata6.00: cmd 61/80:b8:00:bc:81/00:00:0b:00:00/40 tag 23 ncq 65536 out
[   14.640204] ata6.00: status: { DRDY }
[   14.640206] ata6.00: failed command: WRITE FPDMA QUEUED
[   14.640208] ata6.00: cmd 61/80:c0:80:bc:81/00:00:0b:00:00/40 tag 24 ncq 65536 out
[   14.640210] ata6.00: status: { DRDY }
[   14.640212] ata6: hard resetting link
[   15.096160] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   15.096836] ata6.00: configured for UDMA/133
[   15.096843] ata6: EH complete
[   15.108166] ata6.00: exception Emask 0x10 SAct 0x6 SErr 0x400000 action 0x6 frozen
[   15.108168] ata6.00: irq_stat 0x08000000, interface fatal error
[   15.108169] ata6: SError: { Handshk }
[   15.108171] ata6.00: failed command: WRITE FPDMA QUEUED
[   15.108174] ata6.00: cmd 61/80:08:00:bc:81/00:00:0b:00:00/40 tag 1 ncq 65536 out
[   15.108176] ata6.00: status: { DRDY }
[   15.108177] ata6.00: failed command: WRITE FPDMA QUEUED
[   15.108180] ata6.00: cmd 61/80:10:80:bc:81/00:00:0b:00:00/40 tag 2 ncq 65536 out
[   15.108181] ata6.00: status: { DRDY }
[   15.108184] ata6: hard resetting link
[   15.564138] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   15.564811] ata6.00: configured for UDMA/133
[   15.564815] ata6: EH complete
This snippet shows errors for ata6, but I've encountered the same errors for other drives/buses too.

I don't want to admit it but I think it's the SATA controller... Swapping back the old motherbord in would take some time... I wish I had some kind of test bench but I don't.

Finally: the whole dmesg from one boot.
Last edited by Zucca on Mon Apr 10, 2017 6:28 pm, edited 1 time in total.
..: Zucca :..

Code: Select all

init=/sbin/openrc-init
-systemd -logind -elogind seatd
I am NaN! I am a man!
Top
roarinelk
Guru
Guru
User avatar
Posts: 524
Joined: Thu Mar 04, 2004 12:24 pm

  • Quote

Post by roarinelk » Sat Apr 08, 2017 9:15 am

I'd say it's the cables, or the connectors in the hotswap bay:

[ 15.108169] ata6: SError: { Handshk }

These errors appear when the data on the wires is bad (bitflips, crc errors, )
Top
Zucca
Administrator
Administrator
User avatar
Posts: 4698
Joined: Thu Jun 14, 2007 10:31 pm
Location: Rasi, Finland
Contact:
Contact Zucca
Website

  • Quote

Post by Zucca » Sat Apr 08, 2017 9:36 am

roarinelk wrote:I'd say it's the cables, or the connectors in the hotswap bay
I'll check the cables again. I have bought four new SATA cables. I bought them because I already had two of them and those were short enough and flexible... I do have spare SATA cables, so I'll start experimenting. Thanks.
..: Zucca :..

Code: Select all

init=/sbin/openrc-init
-systemd -logind -elogind seatd
I am NaN! I am a man!
Top
Zucca
Administrator
Administrator
User avatar
Posts: 4698
Joined: Thu Jun 14, 2007 10:31 pm
Location: Rasi, Finland
Contact:
Contact Zucca
Website

  • Quote

Post by Zucca » Sun Apr 09, 2017 4:35 pm

I've now changed the cables for ata5 and ata6.

dmesg showed those familiar errors from ata5 during boot. dmesg reported errors as I did a scrub... however doing the scrub for a second time didn't produce any errors. Scrubbing has always been succesful. No matter how much errors show up in dmesg.

This just does not make any sense...
..: Zucca :..

Code: Select all

init=/sbin/openrc-init
-systemd -logind -elogind seatd
I am NaN! I am a man!
Top
Zucca
Administrator
Administrator
User avatar
Posts: 4698
Joined: Thu Jun 14, 2007 10:31 pm
Location: Rasi, Finland
Contact:
Contact Zucca
Website

  • Quote

Post by Zucca » Sun Apr 09, 2017 9:22 pm

I think I've found a pattern: All the errors stop after kernel limits the badwidth of the sata bus.

Code: Select all

[  711.235882] ata6: limiting SATA link speed to 3.0 Gbps
[  711.235887] ata6.00: exception Emask 0x12 SAct 0x3800 SErr 0x500 action 0x6 frozen
[  711.235889] ata6.00: irq_stat 0x08000000, interface fatal error
[  711.235890] ata6: SError: { UnrecovData Proto }
[  711.235893] ata6.00: failed command: READ FPDMA QUEUED
[  711.235896] ata6.00: cmd 60/40:58:f8:50:1d/05:00:00:00:00/40 tag 11 ncq 688128 in
                        res 40/00:5c:f8:50:1d/00:00:00:00:00/40 Emask 0x12 (ATA bus error)
[  711.235898] ata6.00: status: { DRDY }
[  711.235899] ata6.00: failed command: READ FPDMA QUEUED
[  711.235902] ata6.00: cmd 60/40:60:38:56:1d/05:00:00:00:00/40 tag 12 ncq 688128 in
                        res 40/00:5c:f8:50:1d/00:00:00:00:00/40 Emask 0x12 (ATA bus error)
[  711.235903] ata6.00: status: { DRDY }
[  711.235905] ata6.00: failed command: READ FPDMA QUEUED
[  711.235907] ata6.00: cmd 60/40:68:78:5b:1d/00:00:00:00:00/40 tag 13 ncq 32768 in
                        res 40/00:5c:f8:50:1d/00:00:00:00:00/40 Emask 0x12 (ATA bus error)
[  711.235909] ata6.00: status: { DRDY }
[  711.235911] ata6: hard resetting link
[  711.895866] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  711.896440] ata6.00: configured for UDMA/133
After badwidth limit, the errors appear for the final time. After the errors, the link hard resets to 3 Gbps. And errors no longer appear.

What is going on?
I've now switched the cables on ata[4-6] to different brand ones. They seem to have no visible effect.
..: Zucca :..

Code: Select all

init=/sbin/openrc-init
-systemd -logind -elogind seatd
I am NaN! I am a man!
Top
frostschutz
Advocate
Advocate
User avatar
Posts: 2978
Joined: Tue Feb 22, 2005 11:23 am
Location: Germany

  • Quote

Post by frostschutz » Sun Apr 09, 2017 9:33 pm

You can use libata.force kernel parameter to limit specific interfaces to 1.5 or 3.0 Gbps, if lower speeds resolve your issues that might silence the errors. You can also use it to disable ncq, some controllers/drives don't handle properly.

As for the cause, it could be anything from controller, drive, whatever you have in between those, or even a kernel bug.
Top
Zucca
Administrator
Administrator
User avatar
Posts: 4698
Joined: Thu Jun 14, 2007 10:31 pm
Location: Rasi, Finland
Contact:
Contact Zucca
Website

  • Quote

Post by Zucca » Mon Apr 10, 2017 10:55 am

frostschutz wrote:You can use libata.force kernel parameter to limit specific interfaces to 1.5 or 3.0 Gbps, if lower speeds resolve your issues that might silence the errors. You can also use it to disable ncq, some controllers/drives don't handle properly.
When you mentioned ncq, I though it must be it. Most of the error messages are related to it. However, it didn't make a difference. But forcing bandwidth to 3Gbps "solved" the issue.
frostschutz wrote:As for the cause, it could be anything from controller, drive, whatever you have in between those, or even a kernel bug.
I can safely rule out the drives, as they all (6) worked flawlessly on my previous MB. I also have swapped several cables around and those didn't have any visible effect.

The SATA controller is AMD SB950. There has been some problems with it. Also see the source of the information. I'm not sure if that bug is related... If it is, then I wonder if clocksource=tsc would resolve the issue. What's the drawback of using tsc compared to hpet?
..: Zucca :..

Code: Select all

init=/sbin/openrc-init
-systemd -logind -elogind seatd
I am NaN! I am a man!
Top
Zucca
Administrator
Administrator
User avatar
Posts: 4698
Joined: Thu Jun 14, 2007 10:31 pm
Location: Rasi, Finland
Contact:
Contact Zucca
Website

  • Quote

Post by Zucca » Mon Apr 10, 2017 6:28 pm

So far I haven't found any other solution to this problem than to add libata.force=3Gbps to kernel command line.
I read somewhere that this problem may only occur if there's three or more devices attached to the SATA controller.

If anyone, who stumbles here, has a better solution, please post. For now I'll use this "hack".

Changing the topic from "New MB - almost random (frequent) ata errors" to "SATA HW errors with AMD SB950 controller [workaround]".
..: Zucca :..

Code: Select all

init=/sbin/openrc-init
-systemd -logind -elogind seatd
I am NaN! I am a man!
Top
Post Reply

8 posts • Page 1 of 1

Return to “Kernel & Hardware”

Jump to
  • Assistance
  • ↳   News & Announcements
  • ↳   Frequently Asked Questions
  • ↳   Installing Gentoo
  • ↳   Multimedia
  • ↳   Desktop Environments
  • ↳   Networking & Security
  • ↳   Kernel & Hardware
  • ↳   Portage & Programming
  • ↳   Gamers & Players
  • ↳   Other Things Gentoo
  • ↳   Unsupported Software
  • Discussion & Documentation
  • ↳   Documentation, Tips & Tricks
  • ↳   Gentoo Chat
  • ↳   Gentoo Forums Feedback
  • ↳   Duplicate Threads
  • International Gentoo Users
  • ↳   中文 (Chinese)
  • ↳   Dutch
  • ↳   Finnish
  • ↳   French
  • ↳   Deutsches Forum (German)
  • ↳   Diskussionsforum
  • ↳   Deutsche Dokumentation
  • ↳   Greek
  • ↳   Forum italiano (Italian)
  • ↳   Forum di discussione italiano
  • ↳   Risorse italiane (documentazione e tools)
  • ↳   Polskie forum (Polish)
  • ↳   Instalacja i sprzęt
  • ↳   Polish OTW
  • ↳   Portuguese
  • ↳   Documentação, Ferramentas e Dicas
  • ↳   Russian
  • ↳   Scandinavian
  • ↳   Spanish
  • ↳   Other Languages
  • Architectures & Platforms
  • ↳   Gentoo on ARM
  • ↳   Gentoo on PPC
  • ↳   Gentoo on Sparc
  • ↳   Gentoo on Alternative Architectures
  • ↳   Gentoo on AMD64
  • ↳   Gentoo for Mac OS X (Portage for Mac OS X)
  • Board index
  • All times are UTC
  • Delete cookies

© 2001–2026 Gentoo Foundation, Inc.

Powered by phpBB® Forum Software © phpBB Limited

Privacy Policy

 

 

magic