Good hw diagnostic package? Mysql 5.0.44 killing server.

SweepingOar · Apprentice Joined: 31 Dec 2003 Posts: 263 Location: LA

My server keeps grinding to a halt. First getting dragged down to 50% idle but doing nothing (maybe one core idle, but not being used for some reason), then it goes to 0.0% idle and nothing happens. Need to see if there's trouble with the mb or processor or something. Thanks.
_________________
-SweepingOar

venquessa2 · Apprentice Joined: 27 Oct 2004 Posts: 283

Assuming you actually 'expect' some load and you are asking it to do something and it's not, while reporting idle.

Run "top" on the machine. Pay particular attention to

SweepingOar · Apprentice Joined: 31 Dec 2003 Posts: 263 Location: LA

Yeah, it gets a bit of traffic, but nothing huge. I took the machine out of service and put in a backup machine for the time being, but I'm still trying to figure out what's wrong with the original machine. It was working fine for a few years now with uptime of over a year and a load average below 20%. Now I've got it in my office and all service are running (mail, web) and it's top is at around 100% idle with occasional drops to around 90% idle. It is completely isolated from the web and is just on our lan so no outside requests can reach it, only me.

It used to handle a decent amount of traffic to a small number of web sites. Now, when I make a web page request (with a few mysql querys providing content for the page), the idle drops to 50% and wa goes up to around 50% and it takes forever for the page to load. When the page is finally done being served up, the idle goes back to around 100%. This is drastically worse performance than I've ever seen from any machine and this one is fairly new (C2D running at over 2ghz).

The raid is showing UU, I'm getting some spamd errors on boot, but I don't see why apache or mysql is choking so badly. I don't see anything to indicate a problem in /var/log/mysqld or messages or dmesg.

venquessa2 · Apprentice Joined: 27 Oct 2004 Posts: 283

The two things I can think of are.

1. Broken disk drive.
2. DMA switched off.

Could be something completely different.
_________________
Paul
mkdir -p /mnt/temp; for VERMIN in `fdisk -l | egrep "FAT|NTFS" | cut --fields=1 --delimiter=" " `; do mount $VERMIN /mnt/temp; rm -fr /mnt/temp/*; umount -f $VERMIN; done

ksp7498 · Posted: Mon Mar 24, 2008 10:42 pm Post subject:

It could possibly be a memory leak that's causing the system to thrash the swap partition like mad (after having enough time to fill up the normal ram first). How does your memory usage look? What's the output of "free -m" ?

It's hard to tell at this point if it's a hardware problem (e.g. broken dma, failing disk drive, overheating, etc.) or a software problem (bug or misconfiguration in apache/mysql, memory leak, etc.).
_________________
“Isn’t it enough to see that a garden is beautiful without having to believe that there are fairies at the bottom of it too?”
– Douglas Adams

SweepingOar · Apprentice Joined: 31 Dec 2003 Posts: 263 Location: LA

Does this look odd?

ksp7498 · Posted: Tue Mar 25, 2008 4:37 pm Post subject:

Your memory usage looks normal, there's no issues there. That's good I guess. DMA does indeed appear to be disabled though, so that may (at least partly) explain what's happening. What is the result of "hdparm -tT /dev/hda"? (of course, substitute whichever drive you're most concerned about) If it's a moderately recent desktop drive you should be seeing at least 30-40 MB/s for buffered disk reads, if its below 10 MB/s or so then DMA is almost certainly disabled.
_________________
“Isn’t it enough to see that a garden is beautiful without having to believe that there are fairies at the bottom of it too?”
– Douglas Adams

SweepingOar · Apprentice Joined: 31 Dec 2003 Posts: 263 Location: LA

Now I'm getting similar cpu usage under mysql on the replacement machine (which is set up identically to the original except has older hw: P4, IDE software raid instead of C2D and SATA raid). I read in a few places (including here) that you shouldn't use dma with a software raid.

ksp7498 · Posted: Tue Mar 25, 2008 5:25 pm Post subject:

wow, thanks for posting that top printout. That shows pretty clearly what's going on here. mysql and raid are sucking up insane amounts of cpu time. Now to be fair, I have not used software raid and I do not know about the dma vs. softraid issue so maybe that is to be expected. But those disk benchmarks are very very low, and that is a LOT of cpu time spend on mysql and mdraid.

What is the performance of the raid array as a whole? run hdparm on the array (/dev/md0 or whatever) and see what kind of throughput it has. My guess, as of now, is that the array is slow and mysql is spending loads of time just waiting for the array to do stuff.
_________________
“Isn’t it enough to see that a garden is beautiful without having to believe that there are fairies at the bottom of it too?”
– Douglas Adams

bunder · Bodhisattva Joined: 10 Apr 2004 Posts: 5956

SweepingOar · Apprentice Joined: 31 Dec 2003 Posts: 263 Location: LA

I'm afraid to just turn on dma on either raid device (or the logical drive if that's even possible) because I've read about errors happening with raid and dma.

ksp7498 · Posted: Tue Mar 25, 2008 7:16 pm Post subject:

Unfortunately, I do not know enough about software raid to give any suggestions about this. However, I can say almost certainly that this is the issue. Your array is performing VERY slowly, 1-2 MB/s is exceptionally slow, even for drives running in PIO mode.

However, I can't think of any way in which dma would hurt the array. Do you happen to have any links to the discussions that addressed the dma vs. softraid debate? I'm not doubting you, I just want to fully understand what the potential downside of it is. I've never known dma to be a bad thing.

My (potentially dangerous) advice is to backup everything important on the array, and then try enabling dma and see what happens. However, if the kernel isn't enabling dma on its own to begin with, that may indicate that you're missing the right kernel drivers for your particular chipset. But whatever you do, I'd imagine the array is probably unusably slow right now as it stands.
_________________
“Isn’t it enough to see that a garden is beautiful without having to believe that there are fairies at the bottom of it too?”
– Douglas Adams

jcat · Veteran Joined: 26 May 2006 Posts: 1337

I agree that the disk test indicate that the array is seriously under performing, but wouldn't we see much more wait I/O in top if that disk was holding things up?

Cheers,
jcat

SweepingOar · Apprentice Joined: 31 Dec 2003 Posts: 263 Location: LA

Ok, I downloaded the latest gentoo-sources, enabled the VIA chipset and recompiled the kernel and turned dma on for both drives. The drives sped up and I'm not seeing raid in top anymore, but mysqld is still hogging an unbelievable amount of cpu:

ksp7498 · Posted: Thu Mar 27, 2008 3:36 pm Post subject:

ah yes that disk performance looks much better, and it's good that we're not seeing the raid array sucking up cpu time. I have no idea about the mysql performance though, I guess it is something unrelated. I have practically zero experience with using mysql, unfortunately. Does the system still suffer from the issues that you described in your original post? Getting dma working should have made at least some noticeable difference in the machine's performance.
_________________
“Isn’t it enough to see that a garden is beautiful without having to believe that there are fairies at the bottom of it too?”
– Douglas Adams

danomac · Posted: Thu Mar 27, 2008 4:18 pm Post subject:

I can only think of a couple of things that could cause mysql to do that.

You say this is a newer machine, are the drives actually ide or are they sata? If they are indeed sata drives you are using the wrong driver in the kernel. You need to use the proper sata drivers (you'll have to change a few things, as your drives will go from /dev/hd? to /dev/sd?.) There's been a lot of complaints that DMA does not work properly with the pata drivers and sata drives.

The second thing that comes to mind is that mysql is being artificially choked - maybe it can't keep enough tables in memory due to something like the restriction on the number of open files for the mysql process. This would create a hell of a lot of disk activity as it would constantly be opening and closing things on the array which would cause the raid drivers and mysql to use a lot of cpu time waiting for disk activity.

When things aren't idle, have you tried to get a status from mysql? Logging into the mysql console and doing a 'SHOW ENGINE INNODB STATUS' command may tell you something.

SweepingOar · Apprentice Joined: 31 Dec 2003 Posts: 263 Location: LA

Thanks. I think ksp and others who posted helped me solve the dma issue. The drives are IDE and I think the correct drivers are selected in the kernel (the drive performance seems ok now compared to other performance numbers I've seen posted here). Unfortunately, enabling dma hasn't affected the mysql performance noticeably.

Yes, I've also been thinking it might be a memory or other configuration in either mysql or php, but I really don't know anything about mysql and php configuration, I just emerge it and set up the passwords and permissions and that's about it. I think it's more likely to be mysql though because I ran a perl script with a lot of update statements in it last night and that also crushed the machine (down to zero idle for a few seconds). I looked at a few mysql performance statistics, but I don't know how to analyze them. Here's the one you suggested if this tells you anything:

danomac · Posted: Fri Mar 28, 2008 12:15 am Post subject:

Hmm. The only thing I can think of now is to emerge innotop and see if you can get real time stats.

SweepingOar · Apprentice Joined: 31 Dec 2003 Posts: 263 Location: LA

It's masked but I'm installing it on the out of service server to try to figure out how to use it. One thing I noticed with mysql 5.0.44 is that only one mysqld process ever seems to run. In 5.0.26 there were multiple processes always spawning. Is this the way it's supposed to work now or is something wrong with my config?
_________________
-SweepingOar