View previous topic :: View next topic |
Author |
Message |
mrray n00b
Joined: 02 Feb 2011 Posts: 4
|
Posted: Wed May 01, 2013 6:26 pm Post subject: Gentoo KVM guest loses disk access |
|
|
Hi all! (Long time, no see)
Anyways, I have a Gentoo guest running on a KVM host with a coud provider.
The guest resides on an SSD array which gives massive performance on a server that runs semi-I/O intnesive stuff like amavisd-new and some other stuff.
Problem is, however, that the guest keeps losing disk access at random intervals. Can be after 4 weeks, has happened after as little as 8 hours of uptime.
A reboot solves the immediate problem, but it mean I have to be available to do just that and I like (and need) my beauty sleep.
The provider support department have been very forthcoming on this issue and has made some configuration changes to the virtual hardware, but also suggested I head on over here and ask if anyone has seen the same problem witj Gentoo KVM guests.
Here is an enclosed kernel log: Code: | Apr 20 03:07:21 [kernel] [367916.497177] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Apr 20 03:07:21 [kernel] [367916.497184] ata1.00: failed command: WRITE DMA
Apr 20 03:07:21 [kernel] [367916.497188] ata1.00: cmd ca/00:08:5b:f3:e9/00:00:00:00:00/e2 tag 0 dma 4096 out
Apr 20 03:07:21 [kernel] [367916.497188] res 40/00:01:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
Apr 20 03:07:21 [kernel] [367916.497190] ata1.00: status: { DRDY }
Apr 20 03:07:21 [kernel] [367916.502915] ata1: soft resetting link
Apr 20 03:07:21 [kernel] [367916.654706] ata1.01: NODEV after polling detection
Apr 20 03:07:21 [kernel] [367916.655703] ata1.00: configured for MWDMA2
Apr 20 03:07:21 [kernel] [367916.655711] ata1.00: device reported invalid CHS sector 0
Apr 20 03:07:21 [kernel] [367916.655728] sd 0:0:0:0: [sda]
Apr 20 03:07:21 [kernel] [367916.655730] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Apr 20 03:07:21 [kernel] [367916.655731] sd 0:0:0:0: [sda]
Apr 20 03:07:21 [kernel] [367916.655733] Sense Key : Aborted Command [current] [descriptor]
Apr 20 03:07:21 [kernel] [367916.655735] Descriptor sense data with sense descriptors (in hex):
Apr 20 03:07:21 [kernel] [367916.655736] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
Apr 20 03:07:21 [kernel] [367916.655740] 00 00 00 00
Apr 20 03:07:21 [kernel] [367916.655743] sd 0:0:0:0: [sda]
Apr 20 03:07:21 [kernel] [367916.655744] Add. Sense: No additional sense information
Apr 20 03:07:21 [kernel] [367916.655746] sd 0:0:0:0: [sda] CDB:
Apr 20 03:07:21 [kernel] [367916.655746] Write(10): 2a 00 02 e9 f3 5b 00 00 08 00
Apr 20 03:07:21 [kernel] [367916.655751] end_request: I/O error, dev sda, sector 48886619
Apr 20 03:07:21 [kernel] [367916.655754] Buffer I/O error on device sda3, logical block 103
Apr 20 03:07:21 [kernel] [367916.655755] lost page write due to I/O error on sda3
Apr 20 03:07:21 [kernel] [367916.655772] ata1: EH complete
Apr 20 03:07:21 [kernel] [367916.829822] REISERFS abort (device sda3): Journal write error in flush_commit_list
Apr 20 03:10:08 [kernel] [367988.070159] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Apr 20 03:10:08 [kernel] [367988.070165] ata1.00: failed command: WRITE DMA
Apr 20 03:10:08 [kernel] [367988.070169] ata1.00: cmd ca/00:08:04:fb:00/00:00:00:00:00/e1 tag 0 dma 4096 out
Apr 20 03:10:08 [kernel] [367988.070169] res 40/00:01:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
Apr 20 03:10:08 [kernel] [367988.070171] ata1.00: status: { DRDY }
Apr 20 03:10:08 [kernel] [367988.070288] ata1: soft resetting link
Apr 20 03:10:08 [kernel] [367988.221587] ata1.01: NODEV after polling detection
Apr 20 03:10:08 [kernel] [367988.222453] ata1.00: configured for MWDMA2
Apr 20 03:10:08 [kernel] [367988.222458] ata1.00: device reported invalid CHS sector 0
Apr 20 03:10:08 [kernel] [367988.222474] sd 0:0:0:0: [sda]
Apr 20 03:10:08 [kernel] [367988.222476] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Apr 20 03:10:08 [kernel] [367988.222478] sd 0:0:0:0: [sda]
Apr 20 03:10:08 [kernel] [367988.222480] Sense Key : Aborted Command [current] [descriptor]
Apr 20 03:10:08 [kernel] [367988.222483] Descriptor sense data with sense descriptors (in hex):
Apr 20 03:10:08 [kernel] [367988.222484] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
Apr 20 03:10:08 [kernel] [367988.222490] 00 00 00 00
Apr 20 03:10:08 [kernel] [367988.222493] sd 0:0:0:0: [sda]
Apr 20 03:10:08 [kernel] [367988.222495] Add. Sense: No additional sense information
Apr 20 03:10:08 [kernel] [367988.222497] sd 0:0:0:0: [sda] CDB:
Apr 20 03:10:08 [kernel] [367988.222498] Write(10): 2a 00 01 00 fb 04 00 00 08 00
Apr 20 03:10:08 [kernel] [367988.222504] end_request: I/O error, dev sda, sector 16841476
Apr 20 03:10:08 [kernel] [367988.222507] Buffer I/O error on device sda2, logical block 2097152
Apr 20 03:10:08 [kernel] [367988.222508] lost page write due to I/O error on sda2
Apr 20 03:10:08 [kernel] [367988.222527] ata1: EH complete
Apr 20 03:10:08 [kernel] [368034.360717] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Apr 20 03:10:08 [kernel] [368034.360723] ata1.00: failed command: WRITE DMA
Apr 20 03:10:08 [kernel] [368034.360727] ata1.00: cmd ca/00:08:14:fd:10/00:00:00:00:00/e2 tag 0 dma 4096 out
Apr 20 03:10:08 [kernel] [368034.360727] res 40/00:01:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
Apr 20 03:10:08 [kernel] [368034.360729] ata1.00: status: { DRDY }
Apr 20 03:10:08 [kernel] [368034.360848] ata1: soft resetting link
Apr 20 03:10:08 [kernel] [368034.512510] ata1.01: NODEV after polling detection
Apr 20 03:10:08 [kernel] [368034.513452] ata1.00: configured for MWDMA2
Apr 20 03:10:08 [kernel] [368034.513456] ata1.00: device reported invalid CHS sector 0
Apr 20 03:10:08 [kernel] [368034.513471] sd 0:0:0:0: [sda]
Apr 20 03:10:08 [kernel] [368034.513473] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Apr 20 03:10:08 [kernel] [368034.513474] sd 0:0:0:0: [sda]
Apr 20 03:10:08 [kernel] [368034.513476] Sense Key : Aborted Command [current] [descriptor]
Apr 20 03:10:08 [kernel] [368034.513478] Descriptor sense data with sense descriptors (in hex):
Apr 20 03:10:08 [kernel] [368034.513479] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
Apr 20 03:10:08 [kernel] [368034.513483] 00 00 00 00
Apr 20 03:10:08 [kernel] [368034.513485] sd 0:0:0:0: [sda]
Apr 20 03:10:08 [kernel] [368034.513486] Add. Sense: No additional sense information
Apr 20 03:10:08 [kernel] [368034.513488] sd 0:0:0:0: [sda] CDB:
Apr 20 03:10:08 [kernel] [368034.513489] Write(10): 2a 00 02 10 fd 14 00 00 08 00
Apr 20 03:10:08 [kernel] [368034.513494] end_request: I/O error, dev sda, sector 34667796
Apr 20 03:10:08 [kernel] [368034.513502] Buffer I/O error on device sda2, logical block 4325442
Apr 20 03:10:08 [kernel] [368034.513503] lost page write due to I/O error on sda2
Apr 20 03:10:08 [kernel] [368034.513520] ata1: EH complete
Apr 20 03:10:08 [kernel] [368084.435738] ata1.00: limiting speed to MWDMA1:PIO2
Apr 20 03:10:08 [kernel] [368084.435744] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Apr 20 03:10:08 [kernel] [368084.435748] ata1.00: failed command: WRITE DMA
Apr 20 03:10:08 [kernel] [368084.435753] ata1.00: cmd ca/00:08:b4:10:39/00:00:00:00:00/e2 tag 0 dma 4096 out
Apr 20 03:10:08 [kernel] [368084.435753] res 40/00:01:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
Apr 20 03:10:08 [kernel] [368084.435755] ata1.00: status: { DRDY }
Apr 20 03:10:08 [kernel] [368084.435876] ata1: soft resetting link
Apr 20 03:10:08 [kernel] [368084.587551] ata1.01: NODEV after polling detection
Apr 20 03:10:08 [kernel] [368084.588471] ata1.00: configured for MWDMA1
Apr 20 03:10:08 [kernel] [368084.588476] ata1.00: device reported invalid CHS sector 0
Apr 20 03:10:08 [kernel] [368084.588493] sd 0:0:0:0: [sda]
Apr 20 03:10:08 [kernel] [368084.588494] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Apr 20 03:10:08 [kernel] [368084.588496] sd 0:0:0:0: [sda]
Apr 20 03:10:08 [kernel] [368084.588497] Sense Key : Aborted Command [current] [descriptor]
Apr 20 03:10:08 [kernel] [368084.588501] Descriptor sense data with sense descriptors (in hex):
Apr 20 03:10:08 [kernel] [368084.588502] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
Apr 20 03:10:08 [kernel] [368084.588508] 00 00 00 00
Apr 20 03:10:08 [kernel] [368084.588511] sd 0:0:0:0: [sda]
Apr 20 03:10:08 [kernel] [368084.588512] Add. Sense: No additional sense information
Apr 20 03:10:08 [kernel] [368084.588514] sd 0:0:0:0: [sda] CDB:
Apr 20 03:10:08 [kernel] [368084.588515] Write(10): 2a 00 02 39 10 b4 00 00 08 00
Apr 20 03:10:08 [kernel] [368084.588524] Buffer I/O error on device sda2, logical block 4653750
Apr 20 03:10:08 [kernel] [368084.588526] lost page write due to I/O error on sda2
Apr 20 03:10:08 [kernel] [368084.588548] ata1: EH complete
|
I originally thought it to be a hardware issue, but the provider seems to think different and I just have to go with it since I don´t have access to the hardware...
Any ideas? |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54244 Location: 56N 3W
|
Posted: Wed May 01, 2013 9:09 pm Post subject: |
|
|
mrray,
It looks like a HDD or HDD data cable issue.
The SMART error log would be useful.
I guess the storage your provider shows your KVM as sda is spread over a lot of physical devices.
I'm surprised to see your storage appear as /dev/sda too. That suggests you are working through the emulated hardware that KVM provides.
The virtio driver is faster but your provider may not want to use that. Your block devices would be /dev/vda ... then.
Bugs in KVM cannot be ruled out. Search the kernel bugtracker.
Will your KVM provider provide storage access via virtio?
That may help narrow down the problem. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
mrray n00b
Joined: 02 Feb 2011 Posts: 4
|
Posted: Thu May 02, 2013 9:00 am Post subject: |
|
|
NeddySeagoon wrote: | mrray,
It looks like a HDD or HDD data cable issue.
The SMART error log would be useful. |
I installed smartmontools just now, so I will see what I can cough up and keep you posted.
NeddySeagoon wrote: | I'm surprised to see your storage appear as /dev/sda too. That suggests you are working through the emulated hardware that KVM provides.
The virtio driver is faster but your provider may not want to use that. Your block devices would be /dev/vda ... then.
Bugs in KVM cannot be ruled out. Search the kernel bugtracker.
Will your KVM provider provide storage access via virtio?
That may help narrow down the problem. |
I have to do a bit of guesswork here as I have very limited experience with the KVM hypervisor, but htis VM was converted from its original XEN to KVM when I migrated it onto SSD storage.
I am quite sure the provider can give me access via Virtio, but wouldn't that require a lot of work on my part? Gentoo and virtio do not seem to be good friends, at least that is what I deduced from a quick google search? |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|