View previous topic :: View next topic |
Author |
Message |
SweepingOar Apprentice


Joined: 31 Dec 2003 Posts: 263 Location: LA
|
Posted: Sat Mar 22, 2008 12:48 am Post subject: Good hw diagnostic package? Mysql 5.0.44 killing server. |
|
|
My server keeps grinding to a halt. First getting dragged down to 50% idle but doing nothing (maybe one core idle, but not being used for some reason), then it goes to 0.0% idle and nothing happens. Need to see if there's trouble with the mb or processor or something. Thanks. _________________ -SweepingOar
Last edited by SweepingOar on Wed Mar 26, 2008 10:26 pm; edited 1 time in total |
|
Back to top |
|
 |
venquessa2 Apprentice

Joined: 27 Oct 2004 Posts: 283
|
Posted: Sun Mar 23, 2008 1:02 am Post subject: |
|
|
Assuming you actually 'expect' some load and you are asking it to do something and it's not, while reporting idle.
Run "top" on the machine. Pay particular attention to
Code: |
0.0%wa, 0.2%hi, 0.1%si, 0.0%st
|
theses.
Are any of them high? Is the load average high, as displayed by "top" or "uptime" ?
Also, multi-core systems will from time to time run 100% on one core rather than spread it around. It's a long discussion as to why, but basically one process may and usually will only use one core. That core may change from time to time, but only proper multiprocess apps will use multi cores. Thats the short and half wrong answer, but it'll do. _________________ Paul
mkdir -p /mnt/temp; for VERMIN in `fdisk -l | egrep "FAT|NTFS" | cut --fields=1 --delimiter=" " `; do mount $VERMIN /mnt/temp; rm -fr /mnt/temp/*; umount -f $VERMIN; done |
|
Back to top |
|
 |
SweepingOar Apprentice


Joined: 31 Dec 2003 Posts: 263 Location: LA
|
Posted: Mon Mar 24, 2008 10:15 pm Post subject: |
|
|
Yeah, it gets a bit of traffic, but nothing huge. I took the machine out of service and put in a backup machine for the time being, but I'm still trying to figure out what's wrong with the original machine. It was working fine for a few years now with uptime of over a year and a load average below 20%. Now I've got it in my office and all service are running (mail, web) and it's top is at around 100% idle with occasional drops to around 90% idle. It is completely isolated from the web and is just on our lan so no outside requests can reach it, only me.
It used to handle a decent amount of traffic to a small number of web sites. Now, when I make a web page request (with a few mysql querys providing content for the page), the idle drops to 50% and wa goes up to around 50% and it takes forever for the page to load. When the page is finally done being served up, the idle goes back to around 100%. This is drastically worse performance than I've ever seen from any machine and this one is fairly new (C2D running at over 2ghz).
The raid is showing UU, I'm getting some spamd errors on boot, but I don't see why apache or mysql is choking so badly. I don't see anything to indicate a problem in /var/log/mysqld or messages or dmesg.
venquessa2 wrote: | Assuming you actually 'expect' some load and you are asking it to do something and it's not, while reporting idle.
Run "top" on the machine. Pay particular attention to
Code: |
0.0%wa, 0.2%hi, 0.1%si, 0.0%st
|
theses.
Are any of them high? Is the load average high, as displayed by "top" or "uptime" ?
Also, multi-core systems will from time to time run 100% on one core rather than spread it around. It's a long discussion as to why, but basically one process may and usually will only use one core. That core may change from time to time, but only proper multiprocess apps will use multi cores. Thats the short and half wrong answer, but it'll do. |
_________________ -SweepingOar |
|
Back to top |
|
 |
venquessa2 Apprentice

Joined: 27 Oct 2004 Posts: 283
|
Posted: Mon Mar 24, 2008 10:33 pm Post subject: |
|
|
The two things I can think of are.
1. Broken disk drive.
2. DMA switched off.
Could be something completely different. _________________ Paul
mkdir -p /mnt/temp; for VERMIN in `fdisk -l | egrep "FAT|NTFS" | cut --fields=1 --delimiter=" " `; do mount $VERMIN /mnt/temp; rm -fr /mnt/temp/*; umount -f $VERMIN; done |
|
Back to top |
|
 |
ksp7498 Apprentice


Joined: 08 Jun 2006 Posts: 225 Location: North Carolina - US
|
Posted: Mon Mar 24, 2008 10:42 pm Post subject: |
|
|
It could possibly be a memory leak that's causing the system to thrash the swap partition like mad (after having enough time to fill up the normal ram first). How does your memory usage look? What's the output of "free -m" ?
It's hard to tell at this point if it's a hardware problem (e.g. broken dma, failing disk drive, overheating, etc.) or a software problem (bug or misconfiguration in apache/mysql, memory leak, etc.). _________________ “Isn’t it enough to see that a garden is beautiful without having to believe that there are fairies at the bottom of it too?”
– Douglas Adams |
|
Back to top |
|
 |
SweepingOar Apprentice


Joined: 31 Dec 2003 Posts: 263 Location: LA
|
Posted: Mon Mar 24, 2008 10:56 pm Post subject: |
|
|
Does this look odd?
Code: | user@localhost ~ $ free -m
total used free shared buffers cached
Mem: 944 860 84 0 101 572
-/+ buffers/cache: 186 758
Swap: 1960 0 1960 |
DMA is turned on in the kernel, but hdparm has Code: | dev # hdparm -d /dev/hda
/dev/hda:
using_dma = 0 (off) |
otoh, it's a raid, so maybe it should be off?
Code: | # cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 hdc1[1] hda1[0]
40064 blocks [2/2] [UU]
md2 : active raid1 hdc2[1] hda2[0]
2008000 blocks [2/2] [UU]
md3 : active raid1 hdc3[1] hda3[0]
9775488 blocks [2/2] [UU]
md4 : active raid1 hdc4[1] hda4[0]
233287808 blocks [2/2] [UU] |
_________________ -SweepingOar |
|
Back to top |
|
 |
ksp7498 Apprentice


Joined: 08 Jun 2006 Posts: 225 Location: North Carolina - US
|
Posted: Tue Mar 25, 2008 4:37 pm Post subject: |
|
|
Your memory usage looks normal, there's no issues there. That's good I guess. DMA does indeed appear to be disabled though, so that may (at least partly) explain what's happening. What is the result of "hdparm -tT /dev/hda"? (of course, substitute whichever drive you're most concerned about) If it's a moderately recent desktop drive you should be seeing at least 30-40 MB/s for buffered disk reads, if its below 10 MB/s or so then DMA is almost certainly disabled. _________________ “Isn’t it enough to see that a garden is beautiful without having to believe that there are fairies at the bottom of it too?”
– Douglas Adams |
|
Back to top |
|
 |
SweepingOar Apprentice


Joined: 31 Dec 2003 Posts: 263 Location: LA
|
Posted: Tue Mar 25, 2008 5:18 pm Post subject: |
|
|
Now I'm getting similar cpu usage under mysql on the replacement machine (which is set up identically to the original except has older hw: P4, IDE software raid instead of C2D and SATA raid). I read in a few places (including here) that you shouldn't use dma with a software raid.
Code: | hdparm -tT /dev/hda
/dev/hda:
Timing cached reads: 878 MB in 2.02 seconds = 435.62 MB/sec
Timing buffered disk reads: 10 MB in 3.90 seconds = 2.56 MB/sec |
Here's top:
Code: | top - 09:45:40 up 3 days, 10:49, 2 users, load average: 0.65, 0.59, 0.47
Tasks: 79 total, 2 running, 77 sleeping, 0 stopped, 0 zombie
Cpu(s): 31.4%us, 4.7%sy, 0.0%ni, 61.5%id, 0.3%wa, 0.7%hi, 1.3%si, 0.0%st
Mem: 967624k total, 950516k used, 17108k free, 173340k buffers
Swap: 2007992k total, 188k used, 2007804k free, 585732k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23048 mysql 15 0 149m 41m 4632 S 26.6 4.4 109:30.48 mysqld
959 root 10 -5 0 0 0 S 3.3 0.0 58:31.74 md3_raid1
27423 apache 15 0 23264 10m 3332 S 1.7 1.1 0:01.08 apache2
5 root 10 -5 0 0 0 S 0.7 0.0 0:40.79 events/0
956 root 10 -5 0 0 0 S 0.7 0.0 29:55.62 md4_raid1
28171 kevins 15 0 2244 1104 848 R 0.3 0.1 0:00.05 top
1 root 15 0 1592 492 432 S 0.0 0.1 0:09.40 init |
...but there's nothing weird in the mysql processlist:
Code: | mysql> show processlist;
+-------+------+-----------+------+---------+------+-------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+-------+------+-----------+------+---------+------+-------+------------------+
| 39195 | root | localhost | NULL | Query | 0 | NULL | show processlist |
+-------+------+-----------+------+---------+------+-------+------------------+ |
_________________ -SweepingOar
Last edited by SweepingOar on Thu Mar 27, 2008 5:20 pm; edited 2 times in total |
|
Back to top |
|
 |
ksp7498 Apprentice


Joined: 08 Jun 2006 Posts: 225 Location: North Carolina - US
|
Posted: Tue Mar 25, 2008 5:25 pm Post subject: |
|
|
wow, thanks for posting that top printout. That shows pretty clearly what's going on here. mysql and raid are sucking up insane amounts of cpu time. Now to be fair, I have not used software raid and I do not know about the dma vs. softraid issue so maybe that is to be expected. But those disk benchmarks are very very low, and that is a LOT of cpu time spend on mysql and mdraid.
What is the performance of the raid array as a whole? run hdparm on the array (/dev/md0 or whatever) and see what kind of throughput it has. My guess, as of now, is that the array is slow and mysql is spending loads of time just waiting for the array to do stuff. _________________ “Isn’t it enough to see that a garden is beautiful without having to believe that there are fairies at the bottom of it too?”
– Douglas Adams |
|
Back to top |
|
 |
bunder Bodhisattva

Joined: 10 Apr 2004 Posts: 5956
|
Posted: Tue Mar 25, 2008 6:16 pm Post subject: |
|
|
Quote: | DMA is turned on in the kernel, but hdparm has
dev # hdparm -d /dev/hda
/dev/hda:
using_dma = 0 (off) |
hdparm -d1 /dev/hda
(along with the other devices in the array)
does it still lag after that?
cheers _________________
Neddyseagoon wrote: | The problem with leaving is that you can only do it once and it reduces your influence. |
banned from #gentoo since sept 2017 |
|
Back to top |
|
 |
SweepingOar Apprentice


Joined: 31 Dec 2003 Posts: 263 Location: LA
|
Posted: Tue Mar 25, 2008 6:59 pm Post subject: |
|
|
I'm afraid to just turn on dma on either raid device (or the logical drive if that's even possible) because I've read about errors happening with raid and dma.
Code: | root@localhost # hdparm /dev/md0
/dev/md0:
readonly = 0 (off)
readahead = 256 (on)
geometry = 0/2/4, sectors = 0, start = 0
root@localhost # hdparm /dev/md1
/dev/md1:
readonly = 0 (off)
readahead = 256 (on)
geometry = 10016/2/4, sectors = 80128, start = 0
root@localhost # hdparm /dev/md2
/dev/md2:
readonly = 0 (off)
readahead = 256 (on)
geometry = 43248/2/4, sectors = 4016000, start = 0
root@localhost # hdparm /dev/md3
/dev/md3:
readonly = 0 (off)
readahead = 256 (on)
geometry = 19040/2/4, sectors = 19550976, start = 0
root@localhost # hdparm /dev/md4
/dev/md4:
readonly = 0 (off)
readahead = 256 (on)
geometry = 60448/2/4, sectors = 466575616, start = 0
root@localhost # hdparm /dev/md5 |
The timing of the raid seems to be about the same as hda:
Code: | /dev/md0:
read() hit EOF - device too small
Timing buffered disk reads: read() hit EOF - device too small
BLKFLSBUF failed: No such device
HDIO_DRIVE_CMD(null) (wait for flush complete) failed: No such device
root@localhost # hdparm -tT /dev/md4
/dev/md1:
Timing cached reads: 928 MB in 2.00 seconds = 463.67 MB/sec
Timing buffered disk reads: 6 MB in 3.26 seconds = 1.84 MB/sec
/dev/md2:
Timing cached reads: 934 MB in 2.00 seconds = 466.35 MB/sec
Timing buffered disk reads: 6 MB in 3.38 seconds = 1.78 MB/sec
root@localhost # hdparm -tT /dev/md1
/dev/md3:
Timing cached reads: 450 MB in 2.03 seconds = 221.40 MB/sec
Timing buffered disk reads: 10 MB in 3.67 seconds = 2.73 MB/sec
root@localhost # hdparm -tT /dev/md2
/dev/md4:
Timing cached reads: 964 MB in 2.00 seconds = 481.76 MB/sec
Timing buffered disk reads: 8 MB in 3.08 seconds = 2.60 MB/sec
root@localhost # hdparm -tT /dev/md3 |
_________________ -SweepingOar |
|
Back to top |
|
 |
ksp7498 Apprentice


Joined: 08 Jun 2006 Posts: 225 Location: North Carolina - US
|
Posted: Tue Mar 25, 2008 7:16 pm Post subject: |
|
|
Unfortunately, I do not know enough about software raid to give any suggestions about this. However, I can say almost certainly that this is the issue. Your array is performing VERY slowly, 1-2 MB/s is exceptionally slow, even for drives running in PIO mode.
However, I can't think of any way in which dma would hurt the array. Do you happen to have any links to the discussions that addressed the dma vs. softraid debate? I'm not doubting you, I just want to fully understand what the potential downside of it is. I've never known dma to be a bad thing.
My (potentially dangerous) advice is to backup everything important on the array, and then try enabling dma and see what happens. However, if the kernel isn't enabling dma on its own to begin with, that may indicate that you're missing the right kernel drivers for your particular chipset. But whatever you do, I'd imagine the array is probably unusably slow right now as it stands. _________________ “Isn’t it enough to see that a garden is beautiful without having to believe that there are fairies at the bottom of it too?”
– Douglas Adams |
|
Back to top |
|
 |
jcat Veteran


Joined: 26 May 2006 Posts: 1337
|
Posted: Tue Mar 25, 2008 8:14 pm Post subject: |
|
|
I agree that the disk test indicate that the array is seriously under performing, but wouldn't we see much more wait I/O in top if that disk was holding things up?
Cheers,
jcat |
|
Back to top |
|
 |
SweepingOar Apprentice


Joined: 31 Dec 2003 Posts: 263 Location: LA
|
Posted: Wed Mar 26, 2008 10:32 pm Post subject: |
|
|
Ok, I downloaded the latest gentoo-sources, enabled the VIA chipset and recompiled the kernel and turned dma on for both drives. The drives sped up and I'm not seeing raid in top anymore, but mysqld is still hogging an unbelievable amount of cpu:
Code: | top - 15:00:12 up 2:22, 1 user, load average: 0.18, 0.21, 0.19
Tasks: 95 total, 1 running, 94 sleeping, 0 stopped, 0 zombie
Cpu(s): 31.8%us, 2.0%sy, 0.0%ni, 64.9%id, 1.3%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 967840k total, 542432k used, 425408k free, 113380k buffers
Swap: 2007992k total, 0k used, 2007992k free, 263080k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5082 mysql 20 0 145m 39m 4488 S 26.6 4.2 16:40.51 mysqld
13785 apache 20 0 25696 11m 2948 S 2.0 1.2 0:00.12 apache2
13786 apache 20 0 23672 9372 2964 S 1.3 1.0 0:00.04 apache2
13787 apache 20 0 25200 10m 2988 S 1.3 1.1 0:00.14 apache2
13774 kevins 20 0 2244 1128 852 R 0.7 0.1 0:00.08 top
1 root 20 0 1596 548 476 S 0.0 0.1 0:00.32 init
2 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/0
4 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
5 root 15 -5 0 0 0 S 0.0 0.0 0:00.30 events/0
6 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 khelper
82 root 15 -5 0 0 0 S 0.0 0.0 0:00.06 kblockd/0
85 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kacpid
86 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kacpi_notify
163 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 ata/0
164 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 ata_aux
165 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 ksuspend_usbd
171 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 khubd
174 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kseriod |
Here's the drive speed now if you're curious:
Code: | root@localhost # hdparm -tT /dev/md1
/dev/md1:
Timing cached reads: 754 MB in 2.01 seconds = 374.69 MB/sec
Timing buffered disk reads: 38 MB in 0.72 seconds = 52.73 MB/sec
root@localhost # hdparm -tT /dev/md2
/dev/md2:
Timing cached reads: 850 MB in 2.00 seconds = 424.15 MB/sec
Timing buffered disk reads: 168 MB in 3.09 seconds = 54.30 MB/sec
root@localhost # hdparm -tT /dev/md3
/dev/md3:
Timing cached reads: 680 MB in 2.00 seconds = 339.61 MB/sec
Timing buffered disk reads: 168 MB in 3.01 seconds = 55.77 MB/sec
root@localhost # hdparm -tT /dev/md4
/dev/md4:
Timing cached reads: 528 MB in 2.01 seconds = 263.07 MB/sec
Timing buffered disk reads: 168 MB in 3.00 seconds = 55.95 MB/sec |
_________________ -SweepingOar |
|
Back to top |
|
 |
ksp7498 Apprentice


Joined: 08 Jun 2006 Posts: 225 Location: North Carolina - US
|
Posted: Thu Mar 27, 2008 3:36 pm Post subject: |
|
|
ah yes that disk performance looks much better, and it's good that we're not seeing the raid array sucking up cpu time. I have no idea about the mysql performance though, I guess it is something unrelated. I have practically zero experience with using mysql, unfortunately. Does the system still suffer from the issues that you described in your original post? Getting dma working should have made at least some noticeable difference in the machine's performance. _________________ “Isn’t it enough to see that a garden is beautiful without having to believe that there are fairies at the bottom of it too?”
– Douglas Adams |
|
Back to top |
|
 |
danomac l33t


Joined: 06 Nov 2004 Posts: 881 Location: Vancouver, BC
|
Posted: Thu Mar 27, 2008 4:18 pm Post subject: |
|
|
I can only think of a couple of things that could cause mysql to do that.
You say this is a newer machine, are the drives actually ide or are they sata? If they are indeed sata drives you are using the wrong driver in the kernel. You need to use the proper sata drivers (you'll have to change a few things, as your drives will go from /dev/hd? to /dev/sd?.) There's been a lot of complaints that DMA does not work properly with the pata drivers and sata drives.
The second thing that comes to mind is that mysql is being artificially choked - maybe it can't keep enough tables in memory due to something like the restriction on the number of open files for the mysql process. This would create a hell of a lot of disk activity as it would constantly be opening and closing things on the array which would cause the raid drivers and mysql to use a lot of cpu time waiting for disk activity.
When things aren't idle, have you tried to get a status from mysql? Logging into the mysql console and doing a 'SHOW ENGINE INNODB STATUS' command may tell you something. |
|
Back to top |
|
 |
SweepingOar Apprentice


Joined: 31 Dec 2003 Posts: 263 Location: LA
|
Posted: Thu Mar 27, 2008 5:15 pm Post subject: |
|
|
Thanks. I think ksp and others who posted helped me solve the dma issue. The drives are IDE and I think the correct drivers are selected in the kernel (the drive performance seems ok now compared to other performance numbers I've seen posted here). Unfortunately, enabling dma hasn't affected the mysql performance noticeably.
Yes, I've also been thinking it might be a memory or other configuration in either mysql or php, but I really don't know anything about mysql and php configuration, I just emerge it and set up the passwords and permissions and that's about it. I think it's more likely to be mysql though because I ran a perl script with a lot of update statements in it last night and that also crushed the machine (down to zero idle for a few seconds). I looked at a few mysql performance statistics, but I don't know how to analyze them. Here's the one you suggested if this tells you anything:
Code: | =====================================
080327 9:50:55 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 7 seconds
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 2, signal count 2
Mutex spin waits 0, rounds 0, OS waits 0
RW-shared spins 4, OS waits 2; RW-excl spins 0, OS waits 0
------------
TRANSACTIONS
------------
Trx id counter 0 1792
Purge done for trx's n:o < 0 0 undo n:o < 0 0
History list length 0
Total number of lock structs in row lock hash table 0
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0 0, not started, process no 5082, OS thread id 2950278032
MySQL thread id 35330, query id 969462 localhost root
SHOW ENGINE INNODB STATUS
--------
FILE I/O
--------
I/O thread 0 state: waiting for i/o request (insert buffer thread)
I/O thread 1 state: waiting for i/o request (log thread)
I/O thread 2 state: waiting for i/o request (read thread)
I/O thread 3 state: waiting for i/o request (write thread)
Pending normal aio reads: 0, aio writes: 0,
ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 0; buffer pool: 0
25 OS file reads, 3 OS file writes, 3 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 0, seg size 2,
0 inserts, 0 merged recs, 0 merges
Hash table size 69257, used cells 0, node heap has 0 buffer(s)
0.00 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number 0 43665
Log flushed up to 0 43665
Last checkpoint at 0 43665
0 pending log writes, 0 pending chkp writes
8 log i/o's done, 0.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 34467384; in additional pool allocated 1388544
Buffer pool size 1024
Free buffers 989
Database pages 35
Modified db pages 0
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages read 35, created 0, written 0
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
1 read views open inside InnoDB
Main thread process no. 5082, id 2989108112, state: waiting for server activity
Number of rows inserted 0, updated 0, deleted 0, read 0
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT |
_________________ -SweepingOar |
|
Back to top |
|
 |
danomac l33t


Joined: 06 Nov 2004 Posts: 881 Location: Vancouver, BC
|
Posted: Fri Mar 28, 2008 12:15 am Post subject: |
|
|
Hmm. The only thing I can think of now is to emerge innotop and see if you can get real time stats. |
|
Back to top |
|
 |
SweepingOar Apprentice


Joined: 31 Dec 2003 Posts: 263 Location: LA
|
Posted: Fri Mar 28, 2008 10:03 pm Post subject: |
|
|
It's masked but I'm installing it on the out of service server to try to figure out how to use it. One thing I noticed with mysql 5.0.44 is that only one mysqld process ever seems to run. In 5.0.26 there were multiple processes always spawning. Is this the way it's supposed to work now or is something wrong with my config? _________________ -SweepingOar |
|
Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|