Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Периодические I\O-проблемы+растет UDMA_CRC_Error_Count
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Russian
View previous topic :: View next topic  
Author Message
bat0r
n00b
n00b


Joined: 09 Mar 2005
Posts: 30
Location: Moscow

PostPosted: Sun Jun 24, 2012 4:47 pm    Post subject: Периодические I\O-проблемы+растет UDMA_CRC_Error_Count Reply with quote

# uname -a
Linux myserver 3.3.4-gentoo #1 SMP Wed May 9 22:58:59 MSK 2012 i686 Intel(R) Pentium(R) 4 CPU 2.80GHz GenuineIntel GNU/Linux

Периодчиески после старта сервера возникают проблемы с дисковым I\O ....

Во время проблемы:

# hdparm -tT /dev/sda

/dev/sda:
Timing cached reads: 2 MB in 5.68 seconds = 360.54 kB/sec
Timing buffered disk reads: 2 MB in 12.76 seconds = 160.50 kB/sec


/var/log/messages:

Jun 24 19:46:32 localhost kernel: res 51/84:00:88:dc:51/00:00:00:00:00/e0 Emask 0x12 (ATA bus error)
Jun 24 19:46:32 localhost kernel: ata3.00: status: { DRDY ERR }
Jun 24 19:46:32 localhost kernel: ata3.00: error: { ICRC ABRT }
Jun 24 19:46:32 localhost kernel: ata3: hard resetting link
Jun 24 19:46:32 localhost kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jun 24 19:46:32 localhost kernel: ata3.00: configured for UDMA/33
Jun 24 19:46:32 localhost kernel: ata3: EH complete
Jun 24 19:46:32 localhost kernel: ata3.00: exception Emask 0x12 SAct 0x0 SErr 0x7a0601 action 0x6
Jun 24 19:46:32 localhost kernel: ata3.00: BMDMA stat 0x4
Jun 24 19:46:32 localhost kernel: ata3: SError: { RecovData Persist Proto PHYInt 10B8B Dispar BadCRC Handshk }
Jun 24 19:46:32 localhost kernel: ata3.00: failed command: READ DMA
Jun 24 19:46:32 localhost kernel: ata3.00: cmd c8/00:20:48:99:54/00:00:00:00:00/e0 tag 0 dma 16384 in
Jun 24 19:46:32 localhost kernel: res 51/84:00:48:99:54/00:00:00:00:00/e0 Emask 0x12 (ATA bus error)
Jun 24 19:46:32 localhost kernel: ata3.00: status: { DRDY ERR }
Jun 24 19:46:32 localhost kernel: ata3.00: error: { ICRC ABRT }
Jun 24 19:46:32 localhost kernel: ata3: hard resetting link
Jun 24 19:46:32 localhost kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jun 24 19:46:32 localhost kernel: ata3.00: configured for UDMA/33
Jun 24 19:46:32 localhost kernel: ata3: EH complete
Jun 24 19:46:32 localhost kernel: sis900.c: v1.08.10 Apr. 2 2006
Jun 24 19:46:32 localhost kernel: 0000:00:04.0: Realtek RTL8201 PHY transceiver found at address 1.
Jun 24 19:46:32 localhost kernel: 0000:00:04.0: Using transceiver found at address 1 as default
Jun 24 19:46:32 localhost kernel: eth0: SiS 900 PCI Fast Ethernet at 0xd800, IRQ 19, 00:15:f2:e9:19:7c
Jun 24 19:46:32 localhost udevd[659]: renamed network interface eth0 to eth1
Jun 24 19:46:32 localhost kernel: ata3.00: exception Emask 0x12 SAct 0x0 SErr 0x7a0601 action 0x6
Jun 24 19:46:32 localhost kernel: ata3.00: BMDMA stat 0x5
Jun 24 19:46:32 localhost kernel: ata3: SError: { RecovData Persist Proto PHYInt 10B8B Dispar BadCRC Handshk }
Jun 24 19:46:32 localhost kernel: ata3.00: failed command: READ DMA
Jun 24 19:46:32 localhost kernel: ata3.00: cmd c8/00:40:b8:dc:51/00:00:00:00:00/e0 tag 0 dma 32768 in
Jun 24 19:46:32 localhost kernel: res 51/84:2f:b8:dc:51/00:00:00:00:00/e0 Emask 0x12 (ATA bus error)
Jun 24 19:46:32 localhost kernel: ata3.00: status: { DRDY ERR }
Jun 24 19:46:32 localhost kernel: ata3.00: error: { ICRC ABRT }
Jun 24 19:46:32 localhost kernel: ata3: hard resetting link
Jun 24 19:46:32 localhost kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jun 24 19:46:32 localhost kernel: ata3.00: configured for UDMA/33
Jun 24 19:46:32 localhost kernel: ata3: EH complete
Jun 24 19:46:32 localhost kernel: 3c59x: Donald Becker and others.
Jun 24 19:46:32 localhost kernel: 0000:00:09.0: 3Com PCI 3c905C Tornado at f87cc800.
Jun 24 19:46:32 localhost kernel: ata3.00: exception Emask 0x12 SAct 0x0 SErr 0x7a0601 action 0x6
Jun 24 19:46:32 localhost kernel: ata3.00: BMDMA stat 0x4
Jun 24 19:46:32 localhost kernel: ata3: SError: { RecovData Persist Proto PHYInt 10B8B Dispar BadCRC Handshk }
Jun 24 19:46:32 localhost kernel: ata3.00: failed command: READ DMA
Jun 24 19:46:32 localhost kernel: ata3.00: cmd c8/00:08:00:98:81/00:00:00:00:00/e0 tag 0 dma 4096 in
Jun 24 19:46:32 localhost kernel: res 51/84:00:00:98:81/00:00:00:00:00/e0 Emask 0x12 (ATA bus error)
Jun 24 19:46:32 localhost kernel: ata3.00: status: { DRDY ERR }
Jun 24 19:46:32 localhost kernel: ata3.00: error: { ICRC ABRT }
Jun 24 19:46:32 localhost kernel: ata3: hard resetting link
Jun 24 19:46:32 localhost kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jun 24 19:46:32 localhost kernel: ata3.00: configured for UDMA/33
Jun 24 19:46:32 localhost kernel: ata3: EH complete

......


После перезагрузки все ОК.

# hdparm -tT /dev/sda

/dev/sda:
Timing cached reads: 704 MB in 2.00 seconds = 351.22 MB/sec
Timing buffered disk reads: 286 MB in 3.01 seconds = 95.00 MB/sec


Через несколько перезагрузок проблема вновь появляется вместе с запуском сервера - сервер стартует минут 10 и потом сильные задержки i\o, производительность резко падает. Проблема независит от того, вручную стартую сервер, или через WoL.

Заменил SATA-кабель для диска /dev/sda, но проблема осталась.

Кроме того, проблема с деградацией I\O одинаково проявляется и на PATA диске (/dev/sdb).
SATA-PATA controller SiS.

# lspci|grep IDE
00:02.5 IDE interface: Silicon Integrated Systems [SiS] 5513 IDE Controller (rev 01)
00:05.0 IDE interface: Silicon Integrated Systems [SiS] SATA (rev 01)




# smartctl -i /dev/sda
smartctl 5.42 2011-10-20 r3458 [i686-linux-3.3.4-gentoo] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (Adv. Format)
Device Model: WDC WD20EARS-00MVWB0
LU WWN Device Id: 5 0014ee 25b018e83
Firmware Version: 51.0AB51
User Capacity: 2 000 394 706 432 bytes [2,00 TB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Sun Jun 24 20:01:33 2012 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

# smartctl -i /dev/sdb
smartctl 5.42 2011-10-20 r3458 [i686-linux-3.3.4-gentoo] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family: Hitachi Deskstar 7K250
Device Model: HDS722525VLAT80
Firmware Version: V36OA6MA
User Capacity: 250 058 268 160 bytes [250 GB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 6
ATA Standard is: ATA/ATAPI-6 T13 1410D revision 3a
Local Time is: Sun Jun 24 20:02:15 2012 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled


Sata диск - знаменитый своей интеллигентой "зеленой" частой парковкой головок для уменьшения энергопотребления и улучшения климата нашейпланеты - WDC Green Caviar, у которого таймаут парковки стоит в 8 сек - дефолтовый параметр от WD ("idle3" timeout value).

"This timeout controls how often the drive parks its heads and enters a low power consumption state. "

Поменял этот таймаут в 300сек, но проблема осталась.

# hdparm -J /dev/sda

/dev/sda:
wdidle3 = 300 secs (or 13.8 secs for older drives)



myserver ~ # smartctl -A /dev/sdb
smartctl 5.42 2011-10-20 r3458 [i686-linux-3.3.4-gentoo] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 060 Pre-fail Always - 0
2 Throughput_Performance 0x0005 100 100 050 Pre-fail Offline - 0
3 Spin_Up_Time 0x0007 096 096 024 Pre-fail Always - 347 (Average 377)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 448
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 100 100 020 Pre-fail Offline - 0
9 Power_On_Hours 0x0012 099 099 000 Old_age Always - 13011
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 448
192 Power-Off_Retract_Count 0x0032 100 100 050 Old_age Always - 998
193 Load_Cycle_Count 0x0012 100 100 050 Old_age Always - 998
194 Temperature_Celsius 0x0002 130 130 000 Old_age Always - 42 (Min/Max 16/5
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 84588

myserver ~ # smartctl -A /dev/sda
smartctl 5.42 2011-10-20 r3458 [i686-linux-3.3.4-gentoo] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 253 186 021 Pre-fail Always - 1025
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 53
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 255
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 46
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 28
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 2576
194 Temperature_Celsius 0x0022 112 101 000 Old_age Always - 38
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 155 000 Old_age Always - 4811
200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0


Везде пишут, что эта проблема связана с транспортом от контроллера диска до контроллера матплаты(или карты расширения), или драйвер глючит.

Первое не подтверждается, т.к. кабель был заменен.

В ядре сконфигурено:

CONFIG_SATA_SIS:

x This option enables support for SiS Serial ATA on x
x SiS 964/965/966/180 and Parallel ATA on SiS 180. x
x The PATA support for SiS 180 requires additionally to x
x enable the PATA_SIS driver in the config. x
x If unsure, say N. x
x x
x Symbol: SATA_SIS [=y]


CONFIG_PATA_SIS: x
x x
x This option enables support for SiS PATA controllers x
x x
x If unsure, say N. x
x x
x Symbol: PATA_SIS [=y] x
x Type : tristate x
x Prompt: SiS PATA support x
x Defined at drivers/ata/Kconfig:663 x
x Depends on: ATA [=y] && ATA_SFF [=y] && ATA_BMDMA [=y] && PCI [=y] x
x Location: x
x -> Device Drivers x
x -> Serial ATA and Parallel ATA drivers (ATA [=y]) x
x -> ATA SFF support (ATA_SFF [=y]) x
x -> ATA BMDMA support (ATA_BMDMA [=y]) x
x Selected by: SATA_SIфS [=y] && ATA [=y] && ATA_SFF [=y] && ATA_BMDMA [=y] && PCI [=y]
Back to top
View user's profile Send private message
burik666
n00b
n00b


Joined: 28 Jan 2007
Posts: 51
Location: Санкт-Петербург

PostPosted: Thu Jun 28, 2012 2:36 am    Post subject: Reply with quote

Была подобная проблема, поменял порт SATA в который воткнут HDD, если диск в порядке, то может умер SATA контроллер.
_________________
Linux for you
gentoo.bloodhost.ru Gentoo mirror (Russia, Saint-Petersburg)
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Russian All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum