Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Multiqueue schedulers kill Marvell 88SE9230 SATA controller?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Pearlseattle
Apprentice
Apprentice


Joined: 04 Oct 2007
Posts: 162
Location: Switzerland

PostPosted: Thu Jul 05, 2018 9:18 pm    Post subject: Multiqueue schedulers kill Marvell 88SE9230 SATA controller? Reply with quote

Hi

Background info
I am running since ~2 months a quite heavy parallelized database job that inserts every hour hundreds of millions of rows into a MariaDB database (using the TokuDB storage engine).
I'm continuously testing small config changes on all levels (DB, storage, compiler, etc...) as any small improvement will have a potentially big impact on a long-term perspective.
"uname -a": Linux quad 4.14.40-gentoo #6 SMP PREEMPT Sat Jun 16 14:11:41 CEST 2018 x86_64 Intel(R) Xeon(R) CPU E31270 @ 3.40GHz GenuineIntel GNU/Linux

The storage used by MariaDB is set up as follows:
1) a "bcache" virtual device on which Mariadb stores all data (DB tables & indexes).
1a) "sdf": a HDD (~8TB) that stores all data (DB tables & indexes) => this is connected to "bcache" as "backing device" (where all data is supposed to end up being stored).
1b) "sda": a SSD (~460GB) that acts as read & write cache ("writeback"-mode) (for "sdf") => this is connected to "bcache" as "caching device".
2) "sde": a HDD (~4TB) that stores the binary logs and TokuDB logs of MariaDB.
This setup worked without problems; when I tested the classical I/O schedulers, the best combo resulted in being "NOOP" for "sda", "CFQ" for "sde" and "DEADLINE" for "sdf".

Problem
A few days ago I decided to test the new "multiqueue" schedulers (deadline, BFQ and Kyber).
I did not have to change anything in the Kernel (I had preemptively activated the relevant options some months ago), so I just rebooted by adding "scsi_mod.use_blk_mq=1" as boot parameter => I then set "Kyber" for "sda", "DEADLINE" for "sdf" and "BFQ" for "sde".
About 30 minutes after running the usual workload I noticed that the "sdf"-disk was quiet => I therefore checked "dmesg" & "/var/log/messages" and saw this:
Quote:

...
Jul 2 21:22:46 quad kernel: ata7.00: exception Emask 0x0 SAct 0x4000000 SErr 0x0 action 0x6 frozen
Jul 2 21:22:46 quad kernel: ata7.00: failed command: WRITE FPDMA QUEUED
Jul 2 21:22:46 quad kernel: ata7.00: cmd 61/18:d0:e8:0f:51/00:00:00:00:00/40 tag 26 ncq dma 12288 out\x0a
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 2 21:22:46 quad kernel: ata7.00: status: { DRDY }
Jul 2 21:22:46 quad kernel: ata7: hard resetting link
Jul 2 21:22:47 quad kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Jul 2 21:22:47 quad kernel: ata7.00: configured for UDMA/133
Jul 2 21:22:47 quad kernel: ata7.00: device reported invalid CHS sector 0
Jul 2 21:22:47 quad kernel: ata7: EH complete
Jul 2 21:26:12 quad kernel: ata7.00: exception Emask 0x0 SAct 0x100 SErr 0x0 action 0x6 frozen
Jul 2 21:26:12 quad kernel: ata7.00: failed command: WRITE FPDMA QUEUED
Jul 2 21:26:12 quad kernel: ata7.00: cmd 61/a0:40:60:66:86/01:00:00:00:00/40 tag 8 ncq dma 212992 out\x0a res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 2 21:26:12 quad kernel: ata7.00: status: { DRDY }
Jul 2 21:26:12 quad kernel: ata7: hard resetting link
Jul 2 21:26:17 quad kernel: ata7: link is slow to respond, please be patient (ready=0)
Jul 2 21:26:22 quad kernel: ata7: COMRESET failed (errno=-16)
Jul 2 21:26:22 quad kernel: ata7: hard resetting link
Jul 2 21:26:27 quad kernel: ata7: link is slow to respond, please be patient (ready=0)
Jul 2 21:26:32 quad kernel: ata7: COMRESET failed (errno=-16)
Jul 2 21:26:32 quad kernel: ata7: hard resetting link
Jul 2 21:26:37 quad kernel: ata7: link is slow to respond, please be patient (ready=0)
Jul 2 21:26:43 quad kernel: ata9.00: exception Emask 0x0 SAct 0x1ffe00 SErr 0x0 action 0x6 frozen
Jul 2 21:26:43 quad kernel: ata9.00: failed command: WRITE FPDMA QUEUED
Jul 2 21:26:43 quad kernel: ata9.00: cmd 61/a8:48:28:9e:ee/01:00:1d:00:00/40 tag 9 ncq dma 217088 out\x0a res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 2 21:26:43 quad kernel: ata9.00: status: { DRDY }
Jul 2 21:26:43 quad kernel: ata9.00: failed command: WRITE FPDMA QUEUED
Jul 2 21:26:43 quad kernel: ata9.00: cmd 61/38:50:d0:9f:ee/00:00:1d:00:00/40 tag 10 ncq dma 28672 out\x0a res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 2 21:26:43 quad kernel: ata9.00: status: { DRDY }
Jul 2 21:26:43 quad kernel: ata9.00: failed command: WRITE FPDMA QUEUED
Jul 2 21:26:43 quad kernel: ata9.00: cmd 61/f8:58:c0:d2:ee/01:00:1d:00:00/40 tag 11 ncq dma 258048 out\x0a res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 2 21:26:43 quad kernel: ata9.00: status: { DRDY }
Jul 2 21:26:43 quad kernel: ata9.00: failed command: WRITE FPDMA QUEUED
Jul 2 21:26:43 quad kernel: ata9.00: cmd 61/00:60:b8:d4:ee/04:00:1d:00:00/40 tag 12 ncq dma 524288 out\x0a res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 2 21:26:43 quad kernel: ata9.00: status: { DRDY }
Jul 2 21:26:43 quad kernel: ata9.00: failed command: WRITE FPDMA QUEUED
Jul 2 21:26:43 quad kernel: ata9.00: cmd 61/40:68:b8:d8:ee/03:00:1d:00:00/40 tag 13 ncq dma 425984 out\x0a res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 2 21:26:43 quad kernel: ata9.00: status: { DRDY }
Jul 2 21:26:43 quad kernel: ata9.00: failed command: WRITE FPDMA QUEUED
Jul 2 21:26:43 quad kernel: ata9.00: cmd 61/08:70:88:e1:ee/00:00:1d:00:00/40 tag 14 ncq dma 4096 out\x0a res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 2 21:26:43 quad kernel: ata9.00: status: { DRDY }
Jul 2 21:26:43 quad kernel: ata9.00: failed command: WRITE FPDMA QUEUED
Jul 2 21:26:43 quad kernel: ata9.00: cmd 61/18:78:60:21:ef/00:00:1d:00:00/40 tag 15 ncq dma 12288 out\x0a res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 2 21:26:43 quad kernel: ata9.00: status: { DRDY }
Jul 2 21:26:43 quad kernel: ata9.00: failed command: WRITE FPDMA QUEUED
Jul 2 21:26:43 quad kernel: ata9.00: cmd 61/10:80:50:57:ef/01:00:1d:00:00/40 tag 16 ncq dma 139264 out\x0a res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 2 21:26:43 quad kernel: ata9.00: status: { DRDY }
Jul 2 21:26:43 quad kernel: ata9.00: failed command: WRITE FPDMA QUEUED
Jul 2 21:26:43 quad kernel: ata9.00: cmd 61/f8:88:58:78:ef/02:00:1d:00:00/40 tag 17 ncq dma 389120 out\x0a res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 2 21:26:43 quad kernel: ata9.00: status: { DRDY }
Jul 2 21:26:43 quad kernel: ata9.00: failed command: WRITE FPDMA QUEUED
Jul 2 21:26:43 quad kernel: ata9.00: cmd 61/08:90:50:7b:ef/02:00:1d:00:00/40 tag 18 ncq dma 266240 out\x0a res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 2 21:26:43 quad kernel: ata9.00: status: { DRDY }
Jul 2 21:26:43 quad kernel: ata9.00: failed command: READ FPDMA QUEUED
Jul 2 21:26:43 quad kernel: ata9.00: cmd 60/08:98:b8:e4:a8/00:00:9e:00:00/40 tag 19 ncq dma 4096 in\x0a res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 2 21:26:43 quad kernel: ata9.00: status: { DRDY }
Jul 2 21:26:43 quad kernel: ata9.00: failed command: READ FPDMA QUEUED
Jul 2 21:26:43 quad kernel: ata9.00: cmd 60/40:a0:38:e5:a8/00:00:9e:00:00/40 tag 20 ncq dma 32768 in\x0a res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 2 21:26:43 quad kernel: ata9.00: status: { DRDY }
Jul 2 21:26:43 quad kernel: ata9: hard resetting link
Jul 2 21:26:44 quad kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Jul 2 21:26:49 quad kernel: ata9.00: qc timeout (cmd 0xec)
Jul 2 21:26:50 quad kernel: ata9.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jul 2 21:26:50 quad kernel: ata9.00: revalidation failed (errno=-5)
Jul 2 21:26:50 quad kernel: ata9: hard resetting link
Jul 2 21:26:51 quad kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Jul 2 21:27:01 quad kernel: ata9.00: qc timeout (cmd 0xec)
Jul 2 21:27:02 quad kernel: ata9.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jul 2 21:27:02 quad kernel: ata9.00: revalidation failed (errno=-5)
Jul 2 21:27:02 quad kernel: ata9: limiting SATA link speed to 3.0 Gbps
Jul 2 21:27:02 quad kernel: ata9: hard resetting link
Jul 2 21:27:03 quad kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 320)
Jul 2 21:27:07 quad kernel: ata7: COMRESET failed (errno=-16)
Jul 2 21:27:07 quad kernel: ata7: limiting SATA link speed to 3.0 Gbps
Jul 2 21:27:07 quad kernel: ata7: hard resetting link
Jul 2 21:27:12 quad kernel: ata7: COMRESET failed (errno=-16)
Jul 2 21:27:12 quad kernel: ata7: reset failed, giving up
Jul 2 21:27:12 quad kernel: ata7.00: disabled
Jul 2 21:27:12 quad kernel: ata7: EH complete
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#11 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#11 CDB: Write(16) 8a 00 00 00 00 00 00 86 66 60 00 00 01 a0 00 00
Jul 2 21:27:12 quad kernel: print_req_error: I/O error, dev sde, sector 8808032
Jul 2 21:27:12 quad kernel: EXT4-fs warning (device sde1): ext4_end_bio:323: I/O error 10 writing to inode 99 (offset 738197504 size 5246976 starting block 1101056)
Jul 2 21:27:12 quad kernel: Buffer I/O error on device sde1, logical block 1100544
Jul 2 21:27:12 quad kernel: Buffer I/O error on device sde1, logical block 1100545
Jul 2 21:27:12 quad kernel: Buffer I/O error on device sde1, logical block 1100546
Jul 2 21:27:12 quad kernel: Buffer I/O error on device sde1, logical block 1100547
Jul 2 21:27:12 quad kernel: Buffer I/O error on device sde1, logical block 1100548
Jul 2 21:27:12 quad kernel: Buffer I/O error on device sde1, logical block 1100549
Jul 2 21:27:12 quad kernel: Buffer I/O error on device sde1, logical block 1100550
Jul 2 21:27:12 quad kernel: Buffer I/O error on device sde1, logical block 1100551
Jul 2 21:27:12 quad kernel: Buffer I/O error on device sde1, logical block 1100552
Jul 2 21:27:12 quad kernel: Buffer I/O error on device sde1, logical block 1100553
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#9 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#9 CDB: Write(16) 8a 00 00 00 00 00 00 7f 5a 90 00 00 00 08 00 00
Jul 2 21:27:12 quad kernel: print_req_error: I/O error, dev sde, sector 8346256
Jul 2 21:27:12 quad kernel: EXT4-fs warning (device sde1): ext4_end_bio:323: I/O error 10 writing to inode 90439689 (offset 0 size 0 starting block 1043283)
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#10 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#10 CDB: Write(16) 8a 00 00 00 00 00 00 86 68 00 00 00 07 88 00 00
Jul 2 21:27:12 quad kernel: print_req_error: I/O error, dev sde, sector 8808448
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#11 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#11 CDB: Write(16) 8a 00 00 00 00 00 00 86 6f 88 00 00 08 78 00 00
Jul 2 21:27:12 quad kernel: print_req_error: I/O error, dev sde, sector 8810376
Jul 2 21:27:12 quad kernel: EXT4-fs warning (device sde1): ext4_end_bio:323: I/O error 10 writing to inode 99 (offset 738197504 size 7344128 starting block 1101312)
Jul 2 21:27:12 quad kernel: EXT4-fs warning (device sde1): ext4_end_bio:323: I/O error 10 writing to inode 99 (offset 738197504 size 7344128 starting block 1101568)
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#12 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#12 CDB: Write(16) 8a 00 00 00 00 00 00 86 78 00 00 00 08 d0 00 00
Jul 2 21:27:12 quad kernel: print_req_error: I/O error, dev sde, sector 8812544
Jul 2 21:27:12 quad kernel: EXT4-fs warning (device sde1): ext4_end_bio:323: I/O error 10 writing to inode 99 (offset 738197504 size 7446528 starting block 1101824)
Jul 2 21:27:12 quad kernel: EXT4-fs warning (device sde1): ext4_end_bio:323: I/O error 10 writing to inode 99 (offset 738197504 size 7446528 starting block 1101850)
Jul 2 21:27:12 quad kernel: JBD2: Detected IO errors while flushing file data on sde1-8
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#13 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#13 CDB: Read(16) 88 00 00 00 00 00 00 1d 88 38 00 00 00 08 00 00
Jul 2 21:27:12 quad kernel: print_req_error: I/O error, dev sde, sector 1935416
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#14 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#14 CDB: Write(16) 8a 00 00 00 00 00 e8 c4 30 d8 00 00 00 40 00 00
Jul 2 21:27:12 quad kernel: print_req_error: I/O error, dev sde, sector 3905171672
Jul 2 21:27:12 quad kernel: Aborting journal on device sde1-8.
Jul 2 21:27:12 quad kernel: EXT4-fs error (device sde1) in ext4_do_update_inode:5171: Journal has aborted
Jul 2 21:27:12 quad kernel: EXT4-fs (sde1): Delayed block allocation failed for inode 90439689 at logical offset 19027 with max blocks 2048 with error 30
Jul 2 21:27:12 quad kernel: EXT4-fs (sde1): This should not happen!! Data will be lost\x0a
Jul 2 21:27:12 quad kernel: EXT4-fs error (device sde1) in ext4_writepages:2907: Journal has aborted
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#12 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#12 CDB: Write(16) 8a 00 00 00 00 00 e8 c4 08 00 00 00 00 08 00 00
Jul 2 21:27:12 quad kernel: print_req_error: I/O error, dev sde, sector 3905161216
Jul 2 21:27:12 quad kernel: Buffer I/O error on dev sde1, logical block 488144896, lost sync page write
Jul 2 21:27:12 quad kernel: JBD2: Error -5 detected when updating journal superblock for sde1-8.
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#13 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#13 CDB: Write(16) 8a 00 00 00 00 00 00 00 08 00 00 00 00 08 00 00
Jul 2 21:27:12 quad kernel: print_req_error: I/O error, dev sde, sector 2048
Jul 2 21:27:12 quad kernel: Buffer I/O error on dev sde1, logical block 0, lost sync page write
Jul 2 21:27:12 quad kernel: EXT4-fs error (device sde1): ext4_journal_check_start:61: Detected aborted journal
Jul 2 21:27:12 quad kernel: EXT4-fs (sde1): previous I/O error to superblock detected
Jul 2 21:27:12 quad kernel: EXT4-fs (sde1): Remounting filesystem read-only
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#12 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#12 CDB: Write(16) 8a 00 00 00 00 00 00 00 08 00 00 00 00 08 00 00
Jul 2 21:27:12 quad kernel: print_req_error: I/O error, dev sde, sector 2048
Jul 2 21:27:12 quad kernel: Buffer I/O error on dev sde1, logical block 0, lost sync page write
Jul 2 21:27:12 quad kernel: EXT4-fs error (device sde1) in ext4_dirty_inode:5888: IO failure
Jul 2 21:27:12 quad kernel: EXT4-fs (sde1): I/O error while writing superblock
Jul 2 21:27:12 quad kernel: EXT4-fs (sde1): previous I/O error to superblock detected
Jul 2 21:27:12 quad kernel: EXT4-fs (sde1): ext4_writepages: jbd2_start: 1024 pages, ino 99; err -30
Jul 2 21:27:12 quad kernel: Buffer I/O error on dev sde1, logical block 0, lost sync page write
Jul 2 21:27:12 quad kernel: EXT4-fs (sde1): previous I/O error to superblock detected
Jul 2 21:27:12 quad kernel: Buffer I/O error on dev sde1, logical block 0, lost sync page write
Jul 2 21:27:12 quad kernel: EXT4-fs error (device sde1) in ext4_da_write_end:3188: IO failure
Jul 2 21:27:12 quad kernel: Buffer I/O error on dev sde1, logical block 173, lost async page write
Jul 2 21:27:12 quad kernel: EXT4-fs (sde1): previous I/O error to superblock detected
Jul 2 21:27:12 quad kernel: Buffer I/O error on dev sde1, logical block 524301, lost async page write
Jul 2 21:27:12 quad kernel: Buffer I/O error on dev sde1, logical block 361758736, lost async page write
Jul 2 21:27:12 quad kernel: Buffer I/O error on dev sde1, logical block 361766945, lost async page write
Jul 2 21:27:12 quad kernel: Buffer I/O error on dev sde1, logical block 0, lost sync page write
Jul 2 21:27:33 quad kernel: ata9.00: qc timeout (cmd 0xec)
Jul 2 21:27:33 quad kernel: ata9.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jul 2 21:27:33 quad kernel: ata9.00: revalidation failed (errno=-5)
Jul 2 21:27:33 quad kernel: ata9.00: disabled
Jul 2 21:27:34 quad kernel: ata9: hard resetting link
Jul 2 21:27:35 quad kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 320)
Jul 2 21:27:35 quad kernel: ata9: EH complete
Jul 2 21:27:35 quad kernel: scsi_io_completion: 7 callbacks suppressed
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#2 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#2 CDB: Read(16) 88 00 00 00 00 00 9e a8 e5 38 00 00 00 40 00 00
Jul 2 21:27:35 quad kernel: print_req_error: 7 callbacks suppressed
Jul 2 21:27:35 quad kernel: print_req_error: I/O error, dev sdf, sector 2661868856
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#1 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#1 CDB: Read(16) 88 00 00 00 00 00 9e a8 e4 b8 00 00 00 08 00 00
Jul 2 21:27:35 quad kernel: print_req_error: I/O error, dev sdf, sector 2661868728
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#0 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#0 CDB: Write(16) 8a 00 00 00 00 00 1d ef 7b 50 00 00 02 08 00 00
Jul 2 21:27:35 quad kernel: print_req_error: I/O error, dev sdf, sector 502233936
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#29 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#29 CDB: Write(16) 8a 00 00 00 00 00 1d ef 78 58 00 00 02 f8 00 00
Jul 2 21:27:35 quad kernel: print_req_error: I/O error, dev sdf, sector 502233176
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#28 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#28 CDB: Write(16) 8a 00 00 00 00 00 1d ef 57 50 00 00 01 10 00 00
Jul 2 21:27:35 quad kernel: print_req_error: I/O error, dev sdf, sector 502224720
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#27 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#27 CDB: Write(16) 8a 00 00 00 00 00 1d ef 21 60 00 00 00 18 00 00
Jul 2 21:27:35 quad kernel: print_req_error: I/O error, dev sdf, sector 502210912
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#26 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#26 CDB: Write(16) 8a 00 00 00 00 00 1d ee e1 88 00 00 00 08 00 00
Jul 2 21:27:35 quad kernel: print_req_error: I/O error, dev sdf, sector 502194568
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#25 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#25 CDB: Write(16) 8a 00 00 00 00 00 1d ee d8 b8 00 00 03 40 00 00
Jul 2 21:27:35 quad kernel: print_req_error: I/O error, dev sdf, sector 502192312
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#24 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#24 CDB: Write(16) 8a 00 00 00 00 00 1d ee d4 b8 00 00 04 00 00 00
Jul 2 21:27:35 quad kernel: print_req_error: I/O error, dev sdf, sector 502191288
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#23 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#23 CDB: Write(16) 8a 00 00 00 00 00 1d ee d2 c0 00 00 01 f8 00 00
Jul 2 21:27:35 quad kernel: print_req_error: I/O error, dev sdf, sector 502190784
Jul 2 21:27:41 quad kernel: scsi_io_completion: 87 callbacks suppressed
Jul 2 21:27:41 quad kernel: sd 8:0:0:0: [sdf] tag#15 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:41 quad kernel: sd 8:0:0:0: [sdf] tag#15 CDB: Write(16) 8a 00 00 00 00 00 1d f3 a1 18 00 00 01 b8 00 00
Jul 2 21:27:41 quad kernel: print_req_error: 87 callbacks suppressed
Jul 2 21:27:41 quad kernel: print_req_error: I/O error, dev sdf, sector 502505752
Jul 2 21:27:41 quad kernel: sd 8:0:0:0: [sdf] tag#16 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:41 quad kernel: sd 8:0:0:0: [sdf] tag#16 CDB: Write(16) 8a 00 00 00 00 00 1d f3 a2 d0 00 00 04 00 00 00
Jul 2 21:27:41 quad kernel: print_req_error: I/O error, dev sdf, sector 502506192
Jul 2 21:27:41 quad kernel: sd 8:0:0:0: [sdf] tag#17 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:41 quad kernel: sd 8:0:0:0: [sdf] tag#17 CDB: Write(16) 8a 00 00 00 00 00 1d f3 a6 d0 00 00 00 08 00 00
Jul 2 21:27:41 quad kernel: print_req_error: I/O error, dev sdf, sector 502507216
Jul 2 21:27:41 quad kernel: sd 8:0:0:0: [sdf] tag#18 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:41 quad kernel: sd 8:0:0:0: [sdf] tag#18 CDB: Write(16) 8a 00 00 00 00 00 1d f3 a6 d8 00 00 01 60 00 00
Jul 2 21:27:41 quad kernel: print_req_error: I/O error, dev sdf, sector 502507224
Jul 2 21:27:42 quad kernel: sd 8:0:0:0: [sdf] tag#19 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:42 quad kernel: sd 8:0:0:0: [sdf] tag#19 CDB: Write(16) 8a 00 00 00 00 00 1d f3 a8 38 00 00 04 00 00 00
...

Seems that "sde" and "sdf" had problems, but no other device.

I then rebooted the PC, set up for both sdf & sde different multiqueue schedulers but 5 minutes after restarting the usual workload some similar weirdness occurred again:
Quote:

...
Jul 2 22:26:10 quad kernel: ata7.00: exception Emask 0x0 SAct 0x8000000 SErr 0x0 action 0x6 frozen
Jul 2 22:26:10 quad kernel: ata7.00: failed command: WRITE FPDMA QUEUED
Jul 2 22:26:10 quad kernel: ata7.00: cmd 61/60:d8:40:cd:85/00:00:02:00:00/40 tag 27 ncq dma 49152 out\x0a res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 2 22:26:10 quad kernel: ata7.00: status: { DRDY }
Jul 2 22:26:10 quad kernel: ata7: hard resetting link
Jul 2 22:26:10 quad kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Jul 2 22:26:10 quad kernel: ata7.00: configured for UDMA/133
Jul 2 22:26:10 quad kernel: ata7: EH complete
...

2 DB-recoveries per day were enough so I immediately got rid of "scsi_mod.use_blk_mq=1" and since then I am using again the usual noop/deadline/cfq schedulers.
Since going back to the normal I/O schedulers everything works again perfectly, so I can exclude basic hardware problems.

Question
By looking at the output of my "lspci"...
Quote:

# lspci -k
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 Processor Family DRAM Controller (rev 09)
Subsystem: ASUSTeK Computer Inc. Xeon E3-1200 Processor Family DRAM Controller
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port (rev 09)
Kernel driver in use: pcieport
00:06.0 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port (rev 09)
Kernel driver in use: pcieport
00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04)
Subsystem: ASUSTeK Computer Inc. P8 series motherboard
Kernel driver in use: mei_me
Kernel modules: mei_me
00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05)
Subsystem: ASUSTeK Computer Inc. P8 series motherboard
Kernel driver in use: ehci-pci
00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 05)
Subsystem: ASUSTeK Computer Inc. 6 Series/C200 Series Chipset Family High Definition Audio Controller
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel
00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5)
Kernel driver in use: pcieport
00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b5)
Kernel driver in use: pcieport
00:1c.5 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 6 (rev b5)
Kernel driver in use: pcieport
00:1c.6 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 7 (rev b5)
Kernel driver in use: pcieport
00:1c.7 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 8 (rev b5)
Kernel driver in use: pcieport
00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05)
Subsystem: ASUSTeK Computer Inc. P8 series motherboard
Kernel driver in use: ehci-pci
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a5)
00:1f.0 ISA bridge: Intel Corporation C206 Chipset Family LPC Controller (rev 05)
Subsystem: ASUSTeK Computer Inc. P8B WS Motherboard
Kernel driver in use: lpc_ich
Kernel modules: lpc_ich
00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port Desktop SATA AHCI Controller (rev 05)
Subsystem: ASUSTeK Computer Inc. P8 series motherboard
Kernel driver in use: ahci
00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05)
Subsystem: ASUSTeK Computer Inc. P8 series motherboard
Kernel driver in use: i801_smbus
Kernel modules: i2c_i801
01:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 760] (rev a1)
Subsystem: Micro-Star International Co., Ltd. [MSI] GK104 [GeForce GTX 760]
Kernel driver in use: nvidia
Kernel modules: nvidia_drm, nvidia
01:00.1 Audio device: NVIDIA Corporation GK104 HDMI Audio Controller (rev a1)
Subsystem: Micro-Star International Co., Ltd. [MSI] GK104 HDMI Audio Controller
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel
02:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9230 PCIe SATA 6Gb/s Controller (rev 10)
Subsystem: Marvell Technology Group Ltd. 88SE9230 PCIe SATA 6Gb/s Controller
Kernel driver in use: ahci
05:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
Subsystem: ASUSTeK Computer Inc. Motherboard
Kernel driver in use: e1000e
Kernel modules: e1000e
06:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
Subsystem: ASUSTeK Computer Inc. Motherboard
Kernel driver in use: e1000e
Kernel modules: e1000e
07:00.0 USB controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller
Subsystem: ASUSTeK Computer Inc. P8B WS Motherboard
Kernel driver in use: xhci_hcd
08:03.0 FireWire (IEEE 1394): VIA Technologies, Inc. VT6306/7/8 [Fire II(M)] IEEE 1394 OHCI Controller (rev c0)
Subsystem: ASUSTeK Computer Inc. VT6306/7/8 [Fire II(M)] IEEE 1394 OHCI Controller

...and by looking at the output of "ls -al /sys/block/sd*"...
Quote:

# ls -al /sys/block/sd*
lrwxrwxrwx 1 root root 0 Jul 5 20:02 /sys/block/sda -> ../devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda
lrwxrwxrwx 1 root root 0 Jul 5 20:02 /sys/block/sdb -> ../devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/block/sdb
lrwxrwxrwx 1 root root 0 Jul 5 20:02 /sys/block/sdc -> ../devices/pci0000:00/0000:00:1f.2/ata5/host4/target4:0:0/4:0:0:0/block/sdc
lrwxrwxrwx 1 root root 0 Jul 5 20:02 /sys/block/sdd -> ../devices/pci0000:00/0000:00:1f.2/ata6/host5/target5:0:0/5:0:0:0/block/sdd
lrwxrwxrwx 1 root root 0 Jul 5 20:02 /sys/block/sde -> ../devices/pci0000:00/0000:00:06.0/0000:02:00.0/ata7/host6/target6:0:0/6:0:0:0/block/sde
lrwxrwxrwx 1 root root 0 Jul 5 20:02 /sys/block/sdf -> ../devices/pci0000:00/0000:00:06.0/0000:02:00.0/ata9/host8/target8:0:0/8:0:0:0/block/sdf


...would you agree that mooooooost probably the old Marvell 88SE9230 disk controller has a problem with the kind-of-new multiqueue schedulers?
Or maybe I'm overlooking something?

I did not take into consideration complications caused by "bcache" because:
- on one hand "sde" DID have a problem even if not involved in any way with "bcache"
- on the other hand "sda" DID NOT have a problem even if being involved in the "bcache" setup.
Even by taking into consideration the load (MBs or I/O activity), "sde" (logs, impacted) has a much lighter load than "sda" (read & write cache, not impacted) => if the issue would be caused by the "load" on the device then I would rather expect the opposite.

I do have a new disk controller ready to be installed (a serious one - "LSI SAS 9207-8i") which I will definitely test, but I wanted to at least post this as I could not find anything related to this in google/duckduckgo/bing.
Thx for reading :D
Cheers :D
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum