Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
2.6.25-gentoo-r9 is VERY slow [Solved]
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
bfdi533
Tux's lil' helper
Tux's lil' helper


Joined: 11 Jun 2003
Posts: 133

PostPosted: Mon Dec 15, 2008 8:20 pm    Post subject: 2.6.25-gentoo-r9 is VERY slow [Solved] Reply with quote

I just upgraded my kernel from 2.6.23-gentoo-r9 to 2.6.25-gentoo-r9.

Now that I have done this, every time a program starts, it has a 20-second pause before the program runs, longer if it is an X app.

What information do I need to share to help debug this slowdown?

Any ideas on why this would be would be GREATLY appreciated.


Last edited by bfdi533 on Wed Dec 24, 2008 7:17 pm; edited 1 time in total
Back to top
View user's profile Send private message
mgrela
Tux's lil' helper
Tux's lil' helper


Joined: 26 Jul 2008
Posts: 123
Location: Polska

PostPosted: Mon Dec 15, 2008 8:45 pm    Post subject: Reply with quote

Run the slow starting program with "strace" like this:

Code:

strace bash


You may be able to spot the syscall that causes the wait and thus locate the problem.
_________________
Maciej Grela
You just keep on trying till you run out of cake.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54220
Location: 56N 3W

PostPosted: Mon Dec 15, 2008 9:43 pm    Post subject: Reply with quote

bfdi533,

You are probably missing DMA for your hard drive.

Please report what hdparm /dev/... shows.
If it shows DMA is off, also post your lspci, so we can describe how to fix it
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
bfdi533
Tux's lil' helper
Tux's lil' helper


Joined: 11 Jun 2003
Posts: 133

PostPosted: Mon Dec 15, 2008 10:18 pm    Post subject: Reply with quote

mgrela, not really showing anything significant that I can tell with strace. top shows 80-95% id -- not sure what "id" is though.

NeddySeagoon, here is the data requested:

Code:

localhost ~ # hdparm /dev/hda

/dev/hda:
 multcount     = 16 (on)
 IO_support    =  1 (32-bit)
 unmaskirq     =  1 (on)
 using_dma     =  1 (on)
 keepsettings  =  0 (off)
 readonly      =  0 (off)
 readahead     = 256 (on)
 geometry      = 65535/16/63, sectors = 78165360, start = 0
localhost ~ # hdparm /dev/hdb

/dev/hdb:
 multcount     = 16 (on)
 IO_support    =  1 (32-bit)
 unmaskirq     =  1 (on)
 using_dma     =  1 (on)
 keepsettings  =  0 (off)
 readonly      =  0 (off)
 readahead     = 256 (on)
 geometry      = 65535/16/63, sectors = 78198750, start = 0
localhost ~ # lspci
00:00.0 Host bridge: Intel Corporation 82875P/E7210 Memory Controller Hub (rev 02)
00:01.0 PCI bridge: Intel Corporation 82875P Processor to AGP Controller (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2)
00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface Bridge (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller (rev 02)
00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev 02)
00:1f.5 Multimedia audio controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) AC'97 Audio Controller (rev 02)
01:00.0 VGA compatible controller: nVidia Corporation NV18GL [Quadro NVS with AGP8X] (rev a2)
02:0c.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 02)
localhost ~ #
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54220
Location: 56N 3W

PostPosted: Mon Dec 15, 2008 10:26 pm    Post subject: Reply with quote

bfdi533,

id in top is idle.
You have two drive controller there:-
00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller (rev 02)
00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA Controller (rev 02)

With your hardware and that kernel, I would move to the libata driver, like this
Its not clear if you have two IDE drives on the IDE controller, in which case it looks to be ok or two SATA drives on the SATA controller running with the old depreciated IDE SATA driver.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
bfdi533
Tux's lil' helper
Tux's lil' helper


Joined: 11 Jun 2003
Posts: 133

PostPosted: Wed Dec 17, 2008 5:56 pm    Post subject: Reply with quote

NeddySeagoon wrote:
With your hardware and that kernel, I would move to the libata driver, like this
Its not clear if you have two IDE drives on the IDE controller, in which case it looks to be ok or two SATA drives on the SATA controller running with the old depreciated IDE SATA driver.


I followed those directions and that seems a lot cleaner to me in the long run.

However, the system is now slower than before as you can see here:

Code:
user@localhost ~ $ time ls /var/log
XFree86.0.log      cups                 messages.1.gz  ntpd.log
XFree86.0.log.old  dmesg                messages.2.gz  portage
XFree86.20.log     dmesg.20050729       messages.3.gz  python-updater.log
XFree86.8.log      emerge-sync.log      messages.4.gz  remote
XFree86.8.log.old  emerge.log           messages.5.gz  samba
Xorg.0.log         emerge.log.20080427  messages.6.gz  sandbox
Xorg.0.log.old     emerge_fix-db.log    messages.7.gz  scrollkeeper.log.1.gz
Xorg.8.log         faillog              messages.8.gz  smsclient.log
Xorg.8.log.old     g-cpan               messages.9.gz  tor
apache2            galleon              mysql          wtmp
boinc.log          gdm                  mythtv         wtmp.1.gz
boinc.log.old      genkernel.log        news           xdm.log
boot.dmesg         lastlog              nmap-out.log
btmp               messages             ntp.log

real    0m9.206s
user    0m0.000s
sys     0m0.000s
user@localhost ~ $ time ls /var/log
XFree86.0.log      cups                 messages.1.gz  ntpd.log
XFree86.0.log.old  dmesg                messages.2.gz  portage
XFree86.20.log     dmesg.20050729       messages.3.gz  python-updater.log
XFree86.8.log      emerge-sync.log      messages.4.gz  remote
XFree86.8.log.old  emerge.log           messages.5.gz  samba
Xorg.0.log         emerge.log.20080427  messages.6.gz  sandbox
Xorg.0.log.old     emerge_fix-db.log    messages.7.gz  scrollkeeper.log.1.gz
Xorg.8.log         faillog              messages.8.gz  smsclient.log
Xorg.8.log.old     g-cpan               messages.9.gz  tor
apache2            galleon              mysql          wtmp
boinc.log          gdm                  mythtv         wtmp.1.gz
boinc.log.old      genkernel.log        news           xdm.log
boot.dmesg         lastlog              nmap-out.log
btmp               messages             ntp.log

real    0m0.003s
user    0m0.000s
sys     0m0.010s
user@localhost ~ $


However, I have managed to isolate one factor. When accessing a file or "area of disk" that I have not used before, it is VERY slow. But if I do the same thing, or similar thing, again, it is "normal" the next and subsequent times. See above the second ls "run". Don't even think about something like emerge as it is now, it will take 30 minutes or so to just read portage dependencies.

As it is now, my system has been up for 13 minutes but the startup/init scripts are still running.

Seems like some sort of cache issue. Does any of that make sense?
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9677
Location: almost Mile High in the USA

PostPosted: Wed Dec 17, 2008 6:19 pm    Post subject: Reply with quote

Is your hard drive making strange noises or otherwise failing? Any SMART issues?

Are you -sure- there are no background tasks running, and does the old kernel exhibit proper behavior?

I'm having a hard time believing that any kernel change would cause a 9 second directory listing.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
bfdi533
Tux's lil' helper
Tux's lil' helper


Joined: 11 Jun 2003
Posts: 133

PostPosted: Wed Dec 17, 2008 6:31 pm    Post subject: Reply with quote

eccerr0r wrote:
Is your hard drive making strange noises or otherwise failing? Any SMART issues?

Are you -sure- there are no background tasks running, and does the old kernel exhibit proper behavior?

I'm having a hard time believing that any kernel change would cause a 9 second directory listing.



The reason I switched to the new kernel was that I was having trouble getting modules to load properly and I was not able to track down the problem. Even after recompiling the kernel and the modules, I was still having issues. Anyway, I switched to a newer kernel and it seemed to go okay until I started using it more and realized there was a huge lag when doing disk access. I did not notice it at first and thought it was services running and X just being slow. I turned off a few things like ossec and seemed to fix the problem but it just seemed that way.

There are no signs of this in the old kernel and no background tasks that I can account for. The gnome process monitor shows that the CPU stays near 80-90% most of the time but top shows this to be about 5-10%, with 80% idle or 80% wait. No idea what gome process monitor thinks is eating CPU.

As to smart, no issues that I am aware of:

Code:
localhost~ # smartctl --all /dev/sda
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar family
Device Model:     WDC WD400BB-75AUA1
Serial Number:    WD-WMA6R3065709
Firmware Version: 18.20D18
User Capacity:    40,020,664,320 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   5
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Wed Dec 17 12:22:31 2008 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (2040) seconds.
Offline data collection
capabilities:                    (0x1b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        No General Purpose Logging support.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  32) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   200   198   051    Pre-fail  Always       -       1
  3 Spin_Up_Time            0x0007   111   104   021    Pre-fail  Always       -       3275
  4 Start_Stop_Count        0x0032   100   100   040    Old_age   Always       -       611
  5 Reallocated_Sector_Ct   0x0032   198   198   112    Old_age   Always       -       7
  7 Seek_Error_Rate         0x000b   200   200   051    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   034   034   000    Old_age   Always       -       48372
 10 Spin_Retry_Count        0x0013   100   100   051    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0013   100   100   051    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       467
196 Reallocated_Event_Count 0x0032   197   197   000    Old_age   Always       -       3
197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0012   200   200   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       1
200 Multi_Zone_Error_Rate   0x0009   200   198   051    Pre-fail  Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%         0         -

Device does not support Selective Self Tests/Logging
localhost~ # smartctl --all /dev/sdb
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda ATA IV family
Device Model:     ST340016A
Serial Number:    3HS2Q6ZG
Firmware Version: 3.10
User Capacity:    40,037,760,000 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   5
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Wed Dec 17 12:26:45 2008 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 ( 422) seconds.
Offline data collection
capabilities:                    (0x1b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        No General Purpose Logging support.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (  31) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   070   065   034    Pre-fail  Always       -       232989971
  3 Spin_Up_Time            0x0003   072   070   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       44
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   088   060   030    Pre-fail  Always       -       791011282
  9 Power_On_Hours          0x0032   060   060   000    Old_age   Always       -       35623
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       289
194 Temperature_Celsius     0x0022   042   056   000    Old_age   Always       -       42
195 Hardware_ECC_Recovered  0x001a   070   064   000    Old_age   Always       -       232989971
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%         8         -

Device does not support Selective Self Tests/Logging
localhost ~ #


Here is the output from strace. Maybe it will make sense to someone who can say why this is happening:

Code:
localhost ~ # cat /tmp/strace.ls
execve("/usr/bin/ls", ["ls", "/usr/src/linux"], [/* 51 vars */]) = 0
brk(0)                                  = 0x8063000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=187402, ...}) = 0
mmap2(NULL, 187402, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f7f000
close(3)                                = 0
open("/lib/librt.so.1", O_RDONLY)       = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340\30\0\0004\0\0\0\250"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=30632, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f7e000
mmap2(NULL, 33356, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7f75000
mmap2(0xb7f7c000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6) = 0xb7f7c000
close(3)                                = 0
open("/lib/libc.so.6", O_RDONLY)        = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0@a\1\0004\0\0\0\314"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1237356, ...}) = 0
mmap2(NULL, 1242576, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7e45000
mmap2(0xb7f6f000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x12a) = 0xb7f6f000
mmap2(0xb7f72000, 9680, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7f72000
close(3)                                = 0
open("/lib/libpthread.so.0", O_RDONLY)  = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0 H\0\0004\0\0\0\320"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=84256, ...}) = 0
mmap2(NULL, 90592, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7e2e000
mmap2(0xb7e41000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x13) = 0xb7e41000
mmap2(0xb7e43000, 4576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7e43000
close(3)                                = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7e2d000
set_thread_area({entry_number:-1 -> 6, base_addr:0xb7e2d6c0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
mprotect(0xb7e41000, 4096, PROT_READ)   = 0
mprotect(0xb7f6f000, 8192, PROT_READ)   = 0
mprotect(0xb7f7c000, 4096, PROT_READ)   = 0
mprotect(0x8061000, 4096, PROT_READ)    = 0

mprotect(0xb7fc8000, 4096, PROT_READ)   = 0
munmap(0xb7f7f000, 187402)              = 0
set_tid_address(0xb7e2d708)             = 16951
set_robust_list(0xb7e2d710, 0xc)        = 0
rt_sigaction(SIGRTMIN, {0xb7e32320, [], SA_SIGINFO}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0xb7e323a0, [], SA_RESTART|SA_SIGINFO}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM_INFINITY}) = 0
uname({sys="Linux", node="ebdhome", ...}) = 0
brk(0)                                  = 0x8063000
brk(0x8084000)                          = 0x8084000
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(1, TIOCGWINSZ, {ws_row=25, ws_col=80, ws_xpixel=0, ws_ypixel=0}) = 0
stat64("/usr/src/linux", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
open("/usr/src/linux", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3
fstat64(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
fcntl64(3, F_SETFD, FD_CLOEXEC)         = 0
getdents64(3, /* 42 entries */, 4096)   = 1312
getdents64(3, /* 0 entries */, 4096)    = 0
close(3)                                = 0
fstat64(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(4, 1), ...}) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fac000
write(1, "COPYING        Module.symvers  cr"..., 71) = 71
write(1, "CREDITS        README\t       driv"..., 57) = 57
write(1, "Documentation  REPORTING-BUGS  fs"..., 50) = 50
write(1, "Kbuild\t       System.map      inc"..., 57) = 57
write(1, "MAINTAINERS    arch\t       init\tn"..., 48) = 48
write(1, "Makefile       block\t       ipc\ts"..., 55) = 55
close(1)                                = 0
munmap(0xb7fac000, 4096)                = 0
close(2)                                = 0
exit_group(0)                           = ?
localhost ~ #
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54220
Location: 56N 3W

PostPosted: Wed Dec 17, 2008 7:30 pm    Post subject: Reply with quote

bfdi533,

The two different times you posted are due to the kernel buffering disc reads incase the data is needed again.
Your first ls forces the kernel to read the drive, the second one only reads the in RAM cache.

I'm not sure what the data in the RAW_VALE fiels indicates but /sdb is clearly in a poor state.
Seek errors cause retries to read the data. A retry costs a single revolution of the disk at minimum, sometimes several.
If it also needs the head to be recalibrated, the retry process will take a lot longer.
Hardware_ECC_Recovered errors mean the data was recovered from the platter incorrectly but the drive electronics was subsequently able to correct the errors.

I would suggest that sdb is dying. Its been operating for 35623 hours, which is over 4 years nonstop. Its working hard to return valid data both with error correction and retries, What the SMART data does not tell is if the errors occur all over the drive surface, or if its a small part that is read repeatedly. I'm inclined to think its the former, as kernel caching should minimise the latter.

For a more thorough test, get the manufactuers test software from their website. However, it will need to write all over the drive so you will need to move your data off.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
bfdi533
Tux's lil' helper
Tux's lil' helper


Joined: 11 Jun 2003
Posts: 133

PostPosted: Wed Dec 17, 2008 8:18 pm    Post subject: Reply with quote

NeddySeagoon wrote:
bfdi533,

The two different times you posted are due to the kernel buffering disc reads incase the data is needed again.
Your first ls forces the kernel to read the drive, the second one only reads the in RAM cache.


I can see that. Definitely makes sense why subsequent execution of stuff is faster.

NeddySeagoon wrote:
I would suggest that sdb is dying. Its been operating for 35623 hours, which is over 4 years nonstop. Its working hard to return valid data both with error correction and retries, What the SMART data does not tell is if the errors occur all over the drive surface, or if its a small part that is read repeatedly. I'm inclined to think its the former, as kernel caching should minimise the latter.


Obviously I am not in a position to deny that. However, sda is where my root is and /bin/ls and /var/log are both on sda. So the failing sdb aside, which I see I need to fix, that does no explain why there is no much delay in the ls execution (and other disk reads either).
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9677
Location: almost Mile High in the USA

PostPosted: Wed Dec 17, 2008 9:26 pm    Post subject: Reply with quote

bfdi533 wrote:
The gnome process monitor shows that the CPU stays near 80-90% most of the time but top shows this to be about 5-10%, with 80% idle or 80% wait.


Just to confirm, what process is in iowait? Is it zombied? For sure something is consuming disk bandwidth.

Is your HDD LED always on? Are there messages constantly being added to your log files? Any mess in your dmesg(1)?

Is udevd being io-waited? Is it in poll mode to look for new devices? Are CONFIG_DNOTIFY and CONFIG_INOTIFY turned on?
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
bfdi533
Tux's lil' helper
Tux's lil' helper


Joined: 11 Jun 2003
Posts: 133

PostPosted: Wed Dec 17, 2008 9:51 pm    Post subject: Reply with quote

eccerr0r wrote:
bfdi533 wrote:
The gnome process monitor shows that the CPU stays near 80-90% most of the time but top shows this to be about 5-10%, with 80% idle or 80% wait.


Just to confirm, what process is in iowait? Is it zombied? For sure something is consuming disk bandwidth.

Is your HDD LED always on? Are there messages constantly being added to your log files? Any mess in your dmesg(1)?

Is udevd being io-waited? Is it in poll mode to look for new devices? Are CONFIG_DNOTIFY and CONFIG_INOTIFY turned on?


I must admin that although I know most of what you are asking I do not know how to get you all of that info.

Config_notify:

Code:
# grep NOTIFY .config
# CONFIG_I2O_LCT_NOTIFY_ON_CHANGES is not set
CONFIG_DNOTIFY=y
CONFIG_INOTIFY=y
CONFIG_INOTIFY_USER=y


udev:

Code:
# ps axl | grep udevd
F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
5     0  8457     1  16  -4   2624  1376 -      S<s  ?          0:00 /sbin/udev  --daemon


Code:
# udev.conf

# The initial syslog(3) priority: "err", "info", "debug" or its
# numerical equivalent. For runtime debugging, the daemons internal
# state can be changed with: "udevcontrol log_priority=<value>".
udev_log="err"

# If you need to change mount-options, do it in /etc/fstab


Dmesg does not show any problems. The log files are written to somewhat regularly but not constantly, just the normal stuff every couple of minutes like any other linux system.

How to I determine what process is in iowait and consuming this wait state?
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9677
Location: almost Mile High in the USA

PostPosted: Thu Dec 18, 2008 12:15 am    Post subject: Reply with quote

Run 'ps ax' and look for any processes whose STATe are "Z" or "D"...

Also cat /proc/interrupts and see if there are any interrupts that are "ringing off hook"? screwed up USB?
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
bfdi533
Tux's lil' helper
Tux's lil' helper


Joined: 11 Jun 2003
Posts: 133

PostPosted: Wed Dec 24, 2008 7:17 pm    Post subject: Reply with quote

It turned out to be just the hard drive. I copied all of the contents to a new drive and replaced it and the system is now zippy again. My guess is that about the time of one of the kernel builds and reboots, the hard drive started to have issues since I KNOW it was coincident with the new kernel and reboot.

Thanks for all for the helpful tips and insight.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum