Gentoo Forums
Gentoo Forums
Quick Search: in
Resources for analyzing linux/*nix performance & tuning?
View unanswered posts
View posts from last 24 hours

Goto page 1, 2  Next  
Reply to topic    Gentoo Forums Forum Index Off the Wall
View previous topic :: View next topic  
Author Message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 15977
Location: Colorado

PostPosted: Tue Feb 02, 2010 8:08 pm    Post subject: Resources for analyzing linux/*nix performance & tuning? Reply with quote

Just curious if anyone has anything. I've looked at various times with a focus on Solaris and was never impressed with what I could find.

Particularly for identifying problem applications, identifying if they have a memory leak, or proving the app is "out of control" (prove it is the app and not an OS problem). Capacity planning too.
_________________
Safety is my gaol.
US Constitution | Amendments
Back to top
View user's profile Send private message
The Earth
l33t
l33t


Joined: 22 Oct 2002
Posts: 648
Location: The Holy city of Honolulu

PostPosted: Tue Feb 02, 2010 8:13 pm    Post subject: Re: Resources for analyzing linux/*nix performance & tun Reply with quote

pjp wrote:
Just curious if anyone has anything. I've looked at various times with a focus on Solaris and was never impressed with what I could find.

Particularly for identifying problem applications, identifying if they have a memory leak, or proving the app is "out of control" (prove it is the app and not an OS problem). Capacity planning too.


I just use top, though I don't think it fits all the criteria you are looking for.
_________________
Libertarianism : The radical notion that other people are not your property.
Back to top
View user's profile Send private message
Kenji Miyamoto
Veteran
Veteran


Joined: 28 May 2005
Posts: 1452
Location: Looking over your shoulder.

PostPosted: Tue Feb 02, 2010 8:15 pm    Post subject: Re: Resources for analyzing linux/*nix performance & tun Reply with quote

pjp wrote:
Particularly for identifying problem applications, identifying if they have a memory leak, or proving the app is "out of control" (prove it is the app and not an OS problem). Capacity planning too.
Valgrind will help here, especially if you compile the stuff you want to test with -g.
_________________
[ Kawa-kun, new and improved!! ]

Alex Libman seems to be more of an anarchist than a libertarian.
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 15977
Location: Colorado

PostPosted: Tue Feb 02, 2010 8:19 pm    Post subject: Reply with quote

That's a start, but yeah, doesn't go very far toward proving to the developer it isn't an OS problem.
_________________
Safety is my gaol.
US Constitution | Amendments
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 15977
Location: Colorado

PostPosted: Tue Feb 02, 2010 8:21 pm    Post subject: Re: Resources for analyzing linux/*nix performance & tun Reply with quote

Kenji Miyamoto wrote:
Valgrind will help here, especially if you compile the stuff you want to test with -g.
Thanks, I'll take a look. I'm assuming RH or other common "Enterprise" Linux implementations... don't know if -g would be used.
_________________
Safety is my gaol.
US Constitution | Amendments
Back to top
View user's profile Send private message
mdeininger
Veteran
Veteran


Joined: 15 Jun 2005
Posts: 1737
Location: University of Tuebingen, Germany

PostPosted: Tue Feb 02, 2010 8:32 pm    Post subject: Reply with quote

i think "enterprise" linux distros all had some way to install external debugging symbols that gdb and stuff would automatically pull in, so you should probably be fine.

but yeah, valgrind all the way. it's THE tool i'm missing the most when developing/debugging programmes on windows...
_________________
"Confident, lazy, cocky, dead." -- Felix Jongleur, Otherland

( hot: libcurie - freestanding C goodness | alea.iacta.at | syn.chroni.se )
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 15977
Location: Colorado

PostPosted: Tue Feb 02, 2010 8:39 pm    Post subject: Reply with quote

I've only just started reading about it, but it seems more of a development tool to me. I'm looking for something to deal with Production issues where downtime should be avoided if at all possible (and even then only until it would be of minimal impact, aka 2am)? Am I mistaken?
_________________
Safety is my gaol.
US Constitution | Amendments
Back to top
View user's profile Send private message
mdeininger
Veteran
Veteran


Joined: 15 Jun 2005
Posts: 1737
Location: University of Tuebingen, Germany

PostPosted: Tue Feb 02, 2010 10:17 pm    Post subject: Reply with quote

ah... no you're right. valgrind is indeed a development tool, usually used to search for memory leaks and erratic behaviour in programmes. identifying memory leaks and/or proving an application is malfunctioning is, however, impossible (according to a long list of generally agreed-upon theoretical computer science proofs).

your original post was, however, somewhat broad, so it might help if you narrowed it down a bit. maybe with an example? i dunno, it might just be me though...
_________________
"Confident, lazy, cocky, dead." -- Felix Jongleur, Otherland

( hot: libcurie - freestanding C goodness | alea.iacta.at | syn.chroni.se )
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 15977
Location: Colorado

PostPosted: Tue Feb 02, 2010 11:53 pm    Post subject: Reply with quote

mdeininger wrote:
your original post was, however, somewhat broad, so it might help if you narrowed it down a bit. maybe with an example? i dunno, it might just be me though...
I was trying to be general, as it isn't a specific problem. I see 'performance analysis & tuning' listed for job requirements, so I'm just trying to familiarize myself.

The specific Solaris problem was related to java app(s), and we were convinced it was a code problem, but couldn't really prove it (other than the fact that restarting the app made it work better for a while). Memory usage wasn't typically at capacity, and disk utilization wasn't either. Some problems were related to the # of processes open, but that's as far as I was able to go (time and knowledge).

More specifically, what proves something isn't (or is) swapping, cpu bound, disk bound (SAN), ...? I've seen vmstat output, but it seems like something interpreted as opposed to "read." Finally, instead of reacting to the problem, what tools are used for trending? If the problem is simply increased usage, that should show up and be somewhat predictable and something which can be planned around.
_________________
Safety is my gaol.
US Constitution | Amendments
Back to top
View user's profile Send private message
Kenji Miyamoto
Veteran
Veteran


Joined: 28 May 2005
Posts: 1452
Location: Looking over your shoulder.

PostPosted: Wed Feb 03, 2010 2:28 am    Post subject: Reply with quote

If it's Java, run the program in a console and hit control-\ to get an on-demand thread dump. I learned about that a while back when messing with threads for class and it's been a very useful tip (whenever I actually use Java).
_________________
[ Kawa-kun, new and improved!! ]

Alex Libman seems to be more of an anarchist than a libertarian.
Back to top
View user's profile Send private message
notageek
Tux's lil' helper
Tux's lil' helper


Joined: 05 Jun 2008
Posts: 76
Location: Bangalore, India

PostPosted: Wed Feb 03, 2010 4:34 am    Post subject: Reply with quote

Do you know how to DTrace in Solaris? It's documentation does say you can do all that (find out specific points of failure in OS etc), I didn't study further.
_________________
What looks like a cat, flies like a bat, brays like a donkey, and plays like a monkey?
Back to top
View user's profile Send private message
cach0rr0
Moderator
Moderator


Joined: 13 Nov 2008
Posts: 3592
Location: Houston, Republic of Texas

PostPosted: Wed Feb 03, 2010 7:12 am    Post subject: Reply with quote

I don't know if something like nagios could be used to at least alert you when something like this creeps up? Just thinking maybe that would allow the avoidance of having to sit there and stare at it - i dont know what monitoring you do already.

That may well be way way completely off and irrelevant to what you're doing, though. It doesn't really do heaps for hashing out whether or not an app is working
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 15977
Location: Colorado

PostPosted: Wed Feb 03, 2010 7:23 pm    Post subject: Reply with quote

Kenji Miyamoto wrote:
If it's Java
That was just an example, and why I didn't originally mention it. I'm looking for generic tips. In that scenario, I don't believe I had that as an option in a Production environment.


notageek wrote:
Do you know how to DTrace in Solaris? It's documentation does say you can do all that (find out specific points of failure in OS etc), I didn't study further.
It is pretty in depth. I've seen a little bit of it, but certainly don't know near enough about it. I think a linux port was in the works, but not sure where that is now. I'm assuming there was something before DTrace too (I'm assuming it won't be available / installable on all linux or unix systems).


cach0rr0 wrote:
That may well be way way completely off and irrelevant to what you're doing, though. It doesn't really do heaps for hashing out whether or not an app is working
Put another way, if a system has a problem, how do you track it down from a performance / tuning perspective.

It just seems like a magic black hole of SA.
_________________
Safety is my gaol.
US Constitution | Amendments
Back to top
View user's profile Send private message
mdeininger
Veteran
Veteran


Joined: 15 Jun 2005
Posts: 1737
Location: University of Tuebingen, Germany

PostPosted: Wed Feb 03, 2010 8:02 pm    Post subject: Reply with quote

pjp wrote:
It just seems like a magic black hole of SA.
it kind of is... finding misbehaving programmes is a very hard task, and proving it's really the programme and not something else tends to be as well.

usually all you will be able to get is output of things like top/htop and vmstats, possibly also valgrind and strace, etc if you can either isolate the misbehaving programme in a testing environment or reasonably find times where you can restart and test them on a production server.
since every application is different, it's not really possible to write a programme that analyses general applications and ends up giving you a definitive answer as to whether it's acting up. what you could do, however, would be writing a programme yourself that regularly gathers the data you can get (the last stuff, possibly also queries if it's a server) regularly, and then analyse it in some way, possibly finding a way to find out if that specific piece of software is currently failing (statistics is your friend at this point).

for example, at work we've set up a number of cron jobs that, next to common backup tasks, will try to "ping" all of the webapps that we're hosting for customers on a regular basis by fetching a data page from them, and generate an e-mail report of that if there were any issues with connectivity or an extra report that everything was okay during that day. this is a fairly simple but typically effective way of finding faults on the production servers, and if things go bad we're typically able to react faster than the customers are able to call in.

// "performance analysis & tuning" seems like an odd job requirement; unless it's for a software developer or administrative position... i'd say for a developer position that'd mean you'd have to be familiar with typical profiling tools and be able to optimise code by substituting and applying new algorithms where possible, and for an administrative position it'd really be how to use top or analyse slow server queries and possibly rewrite things where appropriate.
_________________
"Confident, lazy, cocky, dead." -- Felix Jongleur, Otherland

( hot: libcurie - freestanding C goodness | alea.iacta.at | syn.chroni.se )
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 15977
Location: Colorado

PostPosted: Tue Feb 09, 2010 9:12 pm    Post subject: Reply with quote

mdeininger wrote:
// "performance analysis & tuning" seems like an odd job requirement; unless it's for a software developer or administrative position... i'd say for a developer position that'd mean you'd have to be familiar with typical profiling tools and be able to optimise code by substituting and applying new algorithms where possible, and for an administrative position it'd really be how to use top or analyse slow server queries and possibly rewrite things where appropriate.
Here's an example, though not exact...
Quote:
Required Skills:

* Deploy, configure and maintain physical and virtual hosts running Linux (SUSE and Redhat)
* Perform intermediate- to advanced-level UNIX systems administration activities for a physical and
virtual enterprise infrastructure.
* Provide backup system-level support of multiple Oracle databases.
* Undertake preventative monitoring and performance analysis.
* Analyze and tune systems for optimal uptime and performance.
* Use shell or Perl scripting to automate processes and maintain the application environments.
* Document support procedures and systems configurations.
* Provide collaborative 4th tier support for Service Desk user problems pertaining to applications
hosted on the UNIX / Linux servers.
* Provide on-call staff support 24/7/365 for complex issues which require escalation to the expert
level.
* Consolidation/Standardization: Work with network engineers, application support teams, and
other groups throughout the internal IT organization to identify, plan, and implement consolidation
of standardized infrastructure hardware and software resources where appropriate.

_________________
Safety is my gaol.
US Constitution | Amendments
Back to top
View user's profile Send private message
linuxtuxhellsinki
l33t
l33t


Joined: 15 Nov 2004
Posts: 691
Location: Hellsinki

PostPosted: Tue Feb 09, 2010 9:43 pm    Post subject: Reply with quote

There's some tutorials of capacity planning for LPIC-3 at IBM (you've to register though).
_________________
1st use 'Search' & lastly add [Solved] to
the subject of your first post in the thread.
Back to top
View user's profile Send private message
msalerno
Veteran
Veteran


Joined: 17 Dec 2002
Posts: 1336
Location: Sweating in South Florida

PostPosted: Tue Feb 09, 2010 9:49 pm    Post subject: Reply with quote

My best guess would be something along the lines of using sysstat and cacti for recording system performance (Disk queue,DIsk I/O,network I/O,etc). Then using something like cacti for trending.
_________________
When harmonious relationships dissolve
Then respect and devotion arise;
When a nation falls to chaos
Then loyalty and patriotism are born.
-Lao Tse
Back to top
View user's profile Send private message
hellbringer
Tux's lil' helper
Tux's lil' helper


Joined: 12 Feb 2003
Posts: 82

PostPosted: Tue Feb 09, 2010 9:56 pm    Post subject: Re: Resources for analyzing linux/*nix performance & tun Reply with quote

pjp wrote:
Just curious if anyone has anything. I've looked at various times with a focus on Solaris and was never impressed with what I could find.

Particularly for identifying problem applications, identifying if they have a memory leak, or proving the app is "out of control" (prove it is the app and not an OS problem). Capacity planning too.

Actually you will find that Solaris has DTrace wich is much better than what Linux can offer (Strace, gdb, etc) Valgrind kind of rules for memory leaks if your dependent libs don't suck.
_________________
There is a lot of novelty and truth in what you say, but that which is true is not novel and that which is novel is not true.
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 15977
Location: Colorado

PostPosted: Tue Feb 09, 2010 10:17 pm    Post subject: Reply with quote

linuxtuxhellsinki wrote:
There's some tutorials of capacity planning for LPIC-3 at IBM (you've to register though).
I've tried to avoid registering there, but seems it might be worth checking out. Thanks.


msalerno wrote:
My best guess would be something along the lines of using sysstat and cacti for recording system performance (Disk queue,DIsk I/O,network I/O,etc). Then using something like cacti for trending.
Makes sense. I've done some with HP tools, but not much.
_________________
Safety is my gaol.
US Constitution | Amendments
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 15977
Location: Colorado

PostPosted: Tue Feb 09, 2010 10:20 pm    Post subject: Re: Resources for analyzing linux/*nix performance & tun Reply with quote

hellbringer wrote:
Actually you will find that Solaris has DTrace wich is much better than what Linux can offer (Strace, gdb, etc) Valgrind kind of rules for memory leaks if your dependent libs don't suck.
Yeah, that seems to be the Solaris 10 consensus. I'm hoping to get into Linux and more away from the Ancients though :)
_________________
Safety is my gaol.
US Constitution | Amendments
Back to top
View user's profile Send private message
kernelOfTruth
Watchman
Watchman


Joined: 20 Dec 2005
Posts: 5314
Location: Vienna, Austria; Germany; hello world :)

PostPosted: Tue Feb 09, 2010 11:49 pm    Post subject: Re: Resources for analyzing linux/*nix performance & tun Reply with quote

Chopinzee wrote:
pjp wrote:
Just curious if anyone has anything. I've looked at various times with a focus on Solaris and was never impressed with what I could find.

Particularly for identifying problem applications, identifying if they have a memory leak, or proving the app is "out of control" (prove it is the app and not an OS problem). Capacity planning too.


I just use top, though I don't think it fits all the criteria you are looking for.


++

htop iotop iftop atop powertop xrestop

<-- I use a bunch of those *top programs and that mostly all I need
_________________
Unofficial minimal livecd x86/amd64 w/reiser4+truecrypt (by Neo2)
2.6.37.2_plus_v1: BFS, CFS,THP,compaction, zcache or TOI
Hardcore Linux user since 2004 :D
Back to top
View user's profile Send private message
msalerno
Veteran
Veteran


Joined: 17 Dec 2002
Posts: 1336
Location: Sweating in South Florida

PostPosted: Wed Feb 10, 2010 2:10 am    Post subject: Reply with quote

The answer to the question also depends on how segmented the company is. There should be little reason for a sysadmin to have to dig too deep into an application to look for memory leaks.
_________________
When harmonious relationships dissolve
Then respect and devotion arise;
When a nation falls to chaos
Then loyalty and patriotism are born.
-Lao Tse
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 15977
Location: Colorado

PostPosted: Wed Feb 10, 2010 3:23 am    Post subject: Reply with quote

msalerno wrote:
The answer to the question also depends on how segmented the company is. There should be little reason for a sysadmin to have to dig too deep into an application to look for memory leaks.
Yeah, the problem is when they say it isn't their program, it's the OS. Being able to demonstrate why it isn't the OS was they key part there.
_________________
Safety is my gaol.
US Constitution | Amendments
Back to top
View user's profile Send private message
djsmiley2k
n00b
n00b


Joined: 08 Apr 2005
Posts: 70
Location: Coventry

PostPosted: Wed Feb 10, 2010 1:39 pm    Post subject: Reply with quote

pjp wrote:
msalerno wrote:
The answer to the question also depends on how segmented the company is. There should be little reason for a sysadmin to have to dig too deep into an application to look for memory leaks.
Yeah, the problem is when they say it isn't their program, it's the OS. Being able to demonstrate why it isn't the OS was they key part there.


You stop the program, everything else works ok - then its the program, or the program causing a fault within the O/S. Either way, its still the program causing the problem, and (THIS IS THE IMPORTANT BIT) if they dont fix the problem then the $$$ will be going elsewhere; but of course this is dependant on alot of other things (does your boss trust you when you tell him the developers wont listen and your only choice is to go elsewhere...)

Any good developer for a system should understand the underlying O/S anyway to know if their program is causing the problem or not. From what you've said previously it sounds like you've experienced this before?
Back to top
View user's profile Send private message
msalerno
Veteran
Veteran


Joined: 17 Dec 2002
Posts: 1336
Location: Sweating in South Florida

PostPosted: Wed Feb 10, 2010 2:06 pm    Post subject: Reply with quote

pjp wrote:
Yeah, the problem is when they say it isn't their program, it's the OS. Being able to demonstrate why it isn't the OS was they key part there.

I come across that all of the time, I would imagine that most sysadmins do too. The solution to the above issue is 50% technical and 50% political. Most of the time on the technical side, it's best to clear all hardware before you start looking at the OS. Check system performance, queue lengths, iops, memory, network and cpu utilization etc.. Searching the developers code is usually not the approach to take. Just whittle the problem down to the lowest common denominator. You will always hear developers say it's not my app and sysadmins say it's not my system.

I always approach it the same way.
1. Eliminate hardware
2. Standard health checks on OS, check patches etc.
3. Environmental - check network, power and fiber cables, temperature of data center, etc.
4. Try to duplicate issue with other software.
ex. SQL database running slow due to disk subsystem, run bonnie++ and some testing with dd.
Slow network communication - fire up netcat, tcpdump
etc...

It also takes a manager who is willing to stand up to development and stand their ground when they announce, "It's not the OS or hardware". Pretty rare by my experience.

Never had to dig too deep into application debugging, and I don't expect to.
_________________
When harmonious relationships dissolve
Then respect and devotion arise;
When a nation falls to chaos
Then loyalty and patriotism are born.
-Lao Tse
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Off the Wall All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum