View previous topic :: View next topic |
Author |
Message |
bfdi533 Tux's lil' helper
Joined: 11 Jun 2003 Posts: 133
|
Posted: Mon Dec 15, 2008 8:20 pm Post subject: 2.6.25-gentoo-r9 is VERY slow [Solved] |
|
|
I just upgraded my kernel from 2.6.23-gentoo-r9 to 2.6.25-gentoo-r9.
Now that I have done this, every time a program starts, it has a 20-second pause before the program runs, longer if it is an X app.
What information do I need to share to help debug this slowdown?
Any ideas on why this would be would be GREATLY appreciated.
Last edited by bfdi533 on Wed Dec 24, 2008 7:17 pm; edited 1 time in total |
|
Back to top |
|
|
mgrela Tux's lil' helper
Joined: 26 Jul 2008 Posts: 123 Location: Polska
|
Posted: Mon Dec 15, 2008 8:45 pm Post subject: |
|
|
Run the slow starting program with "strace" like this:
You may be able to spot the syscall that causes the wait and thus locate the problem. _________________ Maciej Grela
You just keep on trying till you run out of cake. |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54220 Location: 56N 3W
|
Posted: Mon Dec 15, 2008 9:43 pm Post subject: |
|
|
bfdi533,
You are probably missing DMA for your hard drive.
Please report what hdparm /dev/... shows.
If it shows DMA is off, also post your lspci, so we can describe how to fix it _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
bfdi533 Tux's lil' helper
Joined: 11 Jun 2003 Posts: 133
|
Posted: Mon Dec 15, 2008 10:18 pm Post subject: |
|
|
mgrela, not really showing anything significant that I can tell with strace. top shows 80-95% id -- not sure what "id" is though.
NeddySeagoon, here is the data requested:
Code: |
localhost ~ # hdparm /dev/hda
/dev/hda:
multcount = 16 (on)
IO_support = 1 (32-bit)
unmaskirq = 1 (on)
using_dma = 1 (on)
keepsettings = 0 (off)
readonly = 0 (off)
readahead = 256 (on)
geometry = 65535/16/63, sectors = 78165360, start = 0
localhost ~ # hdparm /dev/hdb
/dev/hdb:
multcount = 16 (on)
IO_support = 1 (32-bit)
unmaskirq = 1 (on)
using_dma = 1 (on)
keepsettings = 0 (off)
readonly = 0 (off)
readahead = 256 (on)
geometry = 65535/16/63, sectors = 78198750, start = 0
localhost ~ # lspci
00:00.0 Host bridge: Intel Corporation 82875P/E7210 Memory Controller Hub (rev 02)
00:01.0 PCI bridge: Intel Corporation 82875P Processor to AGP Controller (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2)
00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface Bridge (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller (rev 02)
00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev 02)
00:1f.5 Multimedia audio controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) AC'97 Audio Controller (rev 02)
01:00.0 VGA compatible controller: nVidia Corporation NV18GL [Quadro NVS with AGP8X] (rev a2)
02:0c.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 02)
localhost ~ #
|
|
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54220 Location: 56N 3W
|
Posted: Mon Dec 15, 2008 10:26 pm Post subject: |
|
|
bfdi533,
id in top is idle.
You have two drive controller there:-
00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller (rev 02)
00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA Controller (rev 02)
With your hardware and that kernel, I would move to the libata driver, like this
Its not clear if you have two IDE drives on the IDE controller, in which case it looks to be ok or two SATA drives on the SATA controller running with the old depreciated IDE SATA driver. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
bfdi533 Tux's lil' helper
Joined: 11 Jun 2003 Posts: 133
|
Posted: Wed Dec 17, 2008 5:56 pm Post subject: |
|
|
NeddySeagoon wrote: | With your hardware and that kernel, I would move to the libata driver, like this
Its not clear if you have two IDE drives on the IDE controller, in which case it looks to be ok or two SATA drives on the SATA controller running with the old depreciated IDE SATA driver. |
I followed those directions and that seems a lot cleaner to me in the long run.
However, the system is now slower than before as you can see here:
Code: | user@localhost ~ $ time ls /var/log
XFree86.0.log cups messages.1.gz ntpd.log
XFree86.0.log.old dmesg messages.2.gz portage
XFree86.20.log dmesg.20050729 messages.3.gz python-updater.log
XFree86.8.log emerge-sync.log messages.4.gz remote
XFree86.8.log.old emerge.log messages.5.gz samba
Xorg.0.log emerge.log.20080427 messages.6.gz sandbox
Xorg.0.log.old emerge_fix-db.log messages.7.gz scrollkeeper.log.1.gz
Xorg.8.log faillog messages.8.gz smsclient.log
Xorg.8.log.old g-cpan messages.9.gz tor
apache2 galleon mysql wtmp
boinc.log gdm mythtv wtmp.1.gz
boinc.log.old genkernel.log news xdm.log
boot.dmesg lastlog nmap-out.log
btmp messages ntp.log
real 0m9.206s
user 0m0.000s
sys 0m0.000s
user@localhost ~ $ time ls /var/log
XFree86.0.log cups messages.1.gz ntpd.log
XFree86.0.log.old dmesg messages.2.gz portage
XFree86.20.log dmesg.20050729 messages.3.gz python-updater.log
XFree86.8.log emerge-sync.log messages.4.gz remote
XFree86.8.log.old emerge.log messages.5.gz samba
Xorg.0.log emerge.log.20080427 messages.6.gz sandbox
Xorg.0.log.old emerge_fix-db.log messages.7.gz scrollkeeper.log.1.gz
Xorg.8.log faillog messages.8.gz smsclient.log
Xorg.8.log.old g-cpan messages.9.gz tor
apache2 galleon mysql wtmp
boinc.log gdm mythtv wtmp.1.gz
boinc.log.old genkernel.log news xdm.log
boot.dmesg lastlog nmap-out.log
btmp messages ntp.log
real 0m0.003s
user 0m0.000s
sys 0m0.010s
user@localhost ~ $ |
However, I have managed to isolate one factor. When accessing a file or "area of disk" that I have not used before, it is VERY slow. But if I do the same thing, or similar thing, again, it is "normal" the next and subsequent times. See above the second ls "run". Don't even think about something like emerge as it is now, it will take 30 minutes or so to just read portage dependencies.
As it is now, my system has been up for 13 minutes but the startup/init scripts are still running.
Seems like some sort of cache issue. Does any of that make sense? |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9677 Location: almost Mile High in the USA
|
Posted: Wed Dec 17, 2008 6:19 pm Post subject: |
|
|
Is your hard drive making strange noises or otherwise failing? Any SMART issues?
Are you -sure- there are no background tasks running, and does the old kernel exhibit proper behavior?
I'm having a hard time believing that any kernel change would cause a 9 second directory listing. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
bfdi533 Tux's lil' helper
Joined: 11 Jun 2003 Posts: 133
|
Posted: Wed Dec 17, 2008 6:31 pm Post subject: |
|
|
eccerr0r wrote: | Is your hard drive making strange noises or otherwise failing? Any SMART issues?
Are you -sure- there are no background tasks running, and does the old kernel exhibit proper behavior?
I'm having a hard time believing that any kernel change would cause a 9 second directory listing. |
The reason I switched to the new kernel was that I was having trouble getting modules to load properly and I was not able to track down the problem. Even after recompiling the kernel and the modules, I was still having issues. Anyway, I switched to a newer kernel and it seemed to go okay until I started using it more and realized there was a huge lag when doing disk access. I did not notice it at first and thought it was services running and X just being slow. I turned off a few things like ossec and seemed to fix the problem but it just seemed that way.
There are no signs of this in the old kernel and no background tasks that I can account for. The gnome process monitor shows that the CPU stays near 80-90% most of the time but top shows this to be about 5-10%, with 80% idle or 80% wait. No idea what gome process monitor thinks is eating CPU.
As to smart, no issues that I am aware of:
Code: | localhost~ # smartctl --all /dev/sda
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar family
Device Model: WDC WD400BB-75AUA1
Serial Number: WD-WMA6R3065709
Firmware Version: 18.20D18
User Capacity: 40,020,664,320 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 5
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Wed Dec 17 12:22:31 2008 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x84) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (2040) seconds.
Offline data collection
capabilities: (0x1b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
No General Purpose Logging support.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 32) minutes.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 200 198 051 Pre-fail Always - 1
3 Spin_Up_Time 0x0007 111 104 021 Pre-fail Always - 3275
4 Start_Stop_Count 0x0032 100 100 040 Old_age Always - 611
5 Reallocated_Sector_Ct 0x0032 198 198 112 Old_age Always - 7
7 Seek_Error_Rate 0x000b 200 200 051 Pre-fail Always - 0
9 Power_On_Hours 0x0032 034 034 000 Old_age Always - 48372
10 Spin_Retry_Count 0x0013 100 100 051 Pre-fail Always - 0
11 Calibration_Retry_Count 0x0013 100 100 051 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 467
196 Reallocated_Event_Count 0x0032 197 197 000 Old_age Always - 3
197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0012 200 200 000 Old_age Always - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 1
200 Multi_Zone_Error_Rate 0x0009 200 198 051 Pre-fail Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 0 -
Device does not support Selective Self Tests/Logging
localhost~ # smartctl --all /dev/sdb
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda ATA IV family
Device Model: ST340016A
Serial Number: 3HS2Q6ZG
Firmware Version: 3.10
User Capacity: 40,037,760,000 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 5
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Wed Dec 17 12:26:45 2008 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 422) seconds.
Offline data collection
capabilities: (0x1b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
No General Purpose Logging support.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 31) minutes.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 070 065 034 Pre-fail Always - 232989971
3 Spin_Up_Time 0x0003 072 070 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 44
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 088 060 030 Pre-fail Always - 791011282
9 Power_On_Hours 0x0032 060 060 000 Old_age Always - 35623
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 289
194 Temperature_Celsius 0x0022 042 056 000 Old_age Always - 42
195 Hardware_ECC_Recovered 0x001a 070 064 000 Old_age Always - 232989971
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 8 -
Device does not support Selective Self Tests/Logging
localhost ~ #
|
Here is the output from strace. Maybe it will make sense to someone who can say why this is happening:
Code: | localhost ~ # cat /tmp/strace.ls
execve("/usr/bin/ls", ["ls", "/usr/src/linux"], [/* 51 vars */]) = 0
brk(0) = 0x8063000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=187402, ...}) = 0
mmap2(NULL, 187402, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f7f000
close(3) = 0
open("/lib/librt.so.1", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340\30\0\0004\0\0\0\250"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=30632, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f7e000
mmap2(NULL, 33356, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7f75000
mmap2(0xb7f7c000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6) = 0xb7f7c000
close(3) = 0
open("/lib/libc.so.6", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0@a\1\0004\0\0\0\314"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1237356, ...}) = 0
mmap2(NULL, 1242576, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7e45000
mmap2(0xb7f6f000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x12a) = 0xb7f6f000
mmap2(0xb7f72000, 9680, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7f72000
close(3) = 0
open("/lib/libpthread.so.0", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0 H\0\0004\0\0\0\320"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=84256, ...}) = 0
mmap2(NULL, 90592, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7e2e000
mmap2(0xb7e41000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x13) = 0xb7e41000
mmap2(0xb7e43000, 4576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7e43000
close(3) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7e2d000
set_thread_area({entry_number:-1 -> 6, base_addr:0xb7e2d6c0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
mprotect(0xb7e41000, 4096, PROT_READ) = 0
mprotect(0xb7f6f000, 8192, PROT_READ) = 0
mprotect(0xb7f7c000, 4096, PROT_READ) = 0
mprotect(0x8061000, 4096, PROT_READ) = 0
mprotect(0xb7fc8000, 4096, PROT_READ) = 0
munmap(0xb7f7f000, 187402) = 0
set_tid_address(0xb7e2d708) = 16951
set_robust_list(0xb7e2d710, 0xc) = 0
rt_sigaction(SIGRTMIN, {0xb7e32320, [], SA_SIGINFO}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0xb7e323a0, [], SA_RESTART|SA_SIGINFO}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM_INFINITY}) = 0
uname({sys="Linux", node="ebdhome", ...}) = 0
brk(0) = 0x8063000
brk(0x8084000) = 0x8084000
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(1, TIOCGWINSZ, {ws_row=25, ws_col=80, ws_xpixel=0, ws_ypixel=0}) = 0
stat64("/usr/src/linux", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
open("/usr/src/linux", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3
fstat64(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
fcntl64(3, F_SETFD, FD_CLOEXEC) = 0
getdents64(3, /* 42 entries */, 4096) = 1312
getdents64(3, /* 0 entries */, 4096) = 0
close(3) = 0
fstat64(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(4, 1), ...}) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fac000
write(1, "COPYING Module.symvers cr"..., 71) = 71
write(1, "CREDITS README\t driv"..., 57) = 57
write(1, "Documentation REPORTING-BUGS fs"..., 50) = 50
write(1, "Kbuild\t System.map inc"..., 57) = 57
write(1, "MAINTAINERS arch\t init\tn"..., 48) = 48
write(1, "Makefile block\t ipc\ts"..., 55) = 55
close(1) = 0
munmap(0xb7fac000, 4096) = 0
close(2) = 0
exit_group(0) = ?
localhost ~ #
|
|
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54220 Location: 56N 3W
|
Posted: Wed Dec 17, 2008 7:30 pm Post subject: |
|
|
bfdi533,
The two different times you posted are due to the kernel buffering disc reads incase the data is needed again.
Your first ls forces the kernel to read the drive, the second one only reads the in RAM cache.
I'm not sure what the data in the RAW_VALE fiels indicates but /sdb is clearly in a poor state.
Seek errors cause retries to read the data. A retry costs a single revolution of the disk at minimum, sometimes several.
If it also needs the head to be recalibrated, the retry process will take a lot longer.
Hardware_ECC_Recovered errors mean the data was recovered from the platter incorrectly but the drive electronics was subsequently able to correct the errors.
I would suggest that sdb is dying. Its been operating for 35623 hours, which is over 4 years nonstop. Its working hard to return valid data both with error correction and retries, What the SMART data does not tell is if the errors occur all over the drive surface, or if its a small part that is read repeatedly. I'm inclined to think its the former, as kernel caching should minimise the latter.
For a more thorough test, get the manufactuers test software from their website. However, it will need to write all over the drive so you will need to move your data off. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
bfdi533 Tux's lil' helper
Joined: 11 Jun 2003 Posts: 133
|
Posted: Wed Dec 17, 2008 8:18 pm Post subject: |
|
|
NeddySeagoon wrote: | bfdi533,
The two different times you posted are due to the kernel buffering disc reads incase the data is needed again.
Your first ls forces the kernel to read the drive, the second one only reads the in RAM cache. |
I can see that. Definitely makes sense why subsequent execution of stuff is faster.
NeddySeagoon wrote: | I would suggest that sdb is dying. Its been operating for 35623 hours, which is over 4 years nonstop. Its working hard to return valid data both with error correction and retries, What the SMART data does not tell is if the errors occur all over the drive surface, or if its a small part that is read repeatedly. I'm inclined to think its the former, as kernel caching should minimise the latter. |
Obviously I am not in a position to deny that. However, sda is where my root is and /bin/ls and /var/log are both on sda. So the failing sdb aside, which I see I need to fix, that does no explain why there is no much delay in the ls execution (and other disk reads either). |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9677 Location: almost Mile High in the USA
|
Posted: Wed Dec 17, 2008 9:26 pm Post subject: |
|
|
bfdi533 wrote: | The gnome process monitor shows that the CPU stays near 80-90% most of the time but top shows this to be about 5-10%, with 80% idle or 80% wait.
|
Just to confirm, what process is in iowait? Is it zombied? For sure something is consuming disk bandwidth.
Is your HDD LED always on? Are there messages constantly being added to your log files? Any mess in your dmesg(1)?
Is udevd being io-waited? Is it in poll mode to look for new devices? Are CONFIG_DNOTIFY and CONFIG_INOTIFY turned on? _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
bfdi533 Tux's lil' helper
Joined: 11 Jun 2003 Posts: 133
|
Posted: Wed Dec 17, 2008 9:51 pm Post subject: |
|
|
eccerr0r wrote: | bfdi533 wrote: | The gnome process monitor shows that the CPU stays near 80-90% most of the time but top shows this to be about 5-10%, with 80% idle or 80% wait.
|
Just to confirm, what process is in iowait? Is it zombied? For sure something is consuming disk bandwidth.
Is your HDD LED always on? Are there messages constantly being added to your log files? Any mess in your dmesg(1)?
Is udevd being io-waited? Is it in poll mode to look for new devices? Are CONFIG_DNOTIFY and CONFIG_INOTIFY turned on? |
I must admin that although I know most of what you are asking I do not know how to get you all of that info.
Config_notify:
Code: | # grep NOTIFY .config
# CONFIG_I2O_LCT_NOTIFY_ON_CHANGES is not set
CONFIG_DNOTIFY=y
CONFIG_INOTIFY=y
CONFIG_INOTIFY_USER=y
|
udev:
Code: | # ps axl | grep udevd
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
5 0 8457 1 16 -4 2624 1376 - S<s ? 0:00 /sbin/udev --daemon
|
Code: | # udev.conf
# The initial syslog(3) priority: "err", "info", "debug" or its
# numerical equivalent. For runtime debugging, the daemons internal
# state can be changed with: "udevcontrol log_priority=<value>".
udev_log="err"
# If you need to change mount-options, do it in /etc/fstab
|
Dmesg does not show any problems. The log files are written to somewhat regularly but not constantly, just the normal stuff every couple of minutes like any other linux system.
How to I determine what process is in iowait and consuming this wait state? |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9677 Location: almost Mile High in the USA
|
Posted: Thu Dec 18, 2008 12:15 am Post subject: |
|
|
Run 'ps ax' and look for any processes whose STATe are "Z" or "D"...
Also cat /proc/interrupts and see if there are any interrupts that are "ringing off hook"? screwed up USB? _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
bfdi533 Tux's lil' helper
Joined: 11 Jun 2003 Posts: 133
|
Posted: Wed Dec 24, 2008 7:17 pm Post subject: |
|
|
It turned out to be just the hard drive. I copied all of the contents to a new drive and replaced it and the system is now zippy again. My guess is that about the time of one of the kernel builds and reboots, the hard drive started to have issues since I KNOW it was coincident with the new kernel and reboot.
Thanks for all for the helpful tips and insight. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|