Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
using bash to add up distccd time -- ugly hack
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
chrisis
Tux's lil' helper
Tux's lil' helper


Joined: 22 May 2003
Posts: 112
Location: Auckland

PostPosted: Tue Sep 09, 2003 2:53 am    Post subject: using bash to add up distccd time -- ugly hack Reply with quote

Background: I have a distccd running on boxA that is being used by boxB to assist with compiling during an 'emerge system', during a stage1 install. I'm trying to find out how much the distccd is actually used.

I have distccd outputting to a log called /tmp/distccd-log.

Here's the hack for extracting the seconds values contained in the log:
Code:

for i in `grep ' user ' /tmp/distccd-log | cut -f6,8 -d\  | sed -e 's/s,/ +/g'`; do echo -n $i; done | sed 's/+$//' | sed 's/self://g' | sed 's/user//g'


Next step: the output of this is a single line of values separated by '+'. atm I can only perform the actual calculation by piping this output into a_file, and using vi to bung an "echo " on the front and a "| bc -l" on the back, then executing

Code:

$ sh a_file


Question 1: how do I change the command line so that the calc is included? I've tried all manner of combos of redirects and nothing seems to work.

Question 2: the output after boxB has been merrily emerging for a good 4 hours is only around 1000 seconds -- around 17 minutes. How can I verify that this is a valid result? I have NO IDEA whether I'm calculating anything meaningful at this point! 17 minutes of shared CPU load seems a tiny amount to me -- but is this normal?

Question 3: I'm sure I should be able to combine those last three sed subsitutions (or, at the least, the last two). But again, I fiddled with a regex combining 'self:' and 'user', and couldn't get it right. Help!

Disclaimer: this may not be pretty, but I'm trying to get my head around sed, mostly, and bash command line hacking; more elegant solutions appreciated.

:)
Chrisis
_________________
But the situation seemed to call for witty repartee. "Huh?" I said.
Back to top
View user's profile Send private message
robdavies
Tux's lil' helper
Tux's lil' helper


Joined: 06 Sep 2003
Posts: 90

PostPosted: Tue Sep 09, 2003 3:02 pm    Post subject: Reply with quote

Frankly you'ld be far better off using perl, python or even a gawk program for this sort of processing. With perl, you can use an re (regular expression) and seperate the parts into variables, so you should be able to add the seconds directly.

That said can you post a few lines of the data, showing how it's transformed? At the moment I can't see the point of 'echo -n $i' in the loop, as that seems to be just what you would have as output from the `grep ...'s/s,/ +/g'`. So :

Code:
echo `grep ' user ' /tmp/distccd-log | cut -f6,8 -d\  | sed -e 's/s,/ +/g'` | sed 's/+$//' | sed 's/self://g' | sed 's/user//g'


The final echo and pipe seems redundant to, why doesn't bc < file just work? If it does you could just generate an expression for direct input into bc eg) echo 2 + 2 | bc.

If the cut -f6,8 -d\ is setting the Field seperator to <space>, then in awk :

Code:
awk < /tmp/distccd-log '/ user / { print $6,$8 }' |


Does the same job as the grep & cut in one step.

I think all your sed stuff, could be done by multiple -e options to sed.

My brain hurts bad from looking at this... but I think

1) pipe the expression direct into bc
2) tail the log whilst it's running and manually calculate some of it. Run your scripts on a small subset of this data and see if it's correct. To debug pipelines the tee command is useful.
3) Oh man, doesn't sed -e 's/(self:|user)//' work? Nope so it's :

Code:
$ echo 'bill self: 2 user' | sed -e 's/self://' -e 's/user//'
bill  2
$ echo 'bill self: 2 user' | perl -pl -e 's/(self:|user)//g;'
bill  2


Yet another reason to use perl!

Lastly, a programmers trick, rather than trim the last +, it might be simpler just to add a last 0, at the end of input. Similarly perhaps it's easier to search for what you want (the digits making up the seconds) and throw the rest away.

Now for the bugs I spot :

If think if you have many lines to process, the + expression will get too long and there's a limit of about 10KB for a command, at which point the `grep ...` will fail. As a general technique the solution is to either use xargs, or a program like perl or awk to do line by line processing, and add the numbers to a variable.

Yes, you are right the script is ugly :) If my hunch is right this will only be a few lines of perl, using funky command line switches, one re and a sum variable which is printed at end. Still you are absolutely right to give it a try, better maybe post shorter code snippets for folk to comment on, and also go read system shell scripts to pick up neat ways of doing things.
Back to top
View user's profile Send private message
chrisis
Tux's lil' helper
Tux's lil' helper


Joined: 22 May 2003
Posts: 112
Location: Auckland

PostPosted: Tue Sep 09, 2003 9:59 pm    Post subject: thanks :-) Reply with quote

Ahem.

I guess my problem, being so new at this, is that 'just' using perl, or awk or even xargs = go away for a few months and try again.

Which I'm not averse to at all :-).

Thanks for the tips, I'll digest this.

In response to
Quote:
That said can you post a few lines of the data, showing how it's transformed?


Here is a sample of the log data data:

Code:

distccd[1562] (dcc_r_file_timed) 153123 bytes received in 0.014656s, rate 10203kB/s
distccd[1562] (dcc_collect_child) cc times: user 0.600000s, system 0.020000s, 1200 minflt, 1112 majflt
distccd[1562] gcc on localhost completed ok
distccd[1562] job complete
distccd[1579] (dcc_check_client) connection from 10.110.201.45:2000
distccd[1579] compile from fe.c to fe.o
distccd[1579] (dcc_r_file_timed) 132599 bytes received in 0.012850s, rate 10077kB/s
distccd[1579] (dcc_collect_child) cc times: user 0.145000s, system 0.020000s, 831 minflt, 1075 majflt
distccd[1579] gcc on localhost completed ok
distccd[1579] job complete


Which results in:

Code:

0.600000+0.020000+0.145000+0.020000

after being squeezed thru my sed-experiment.

In response to
Quote:
I can't see the point of 'echo -n $i'


I did this to create one single line of arguments to be added (it eliminates the newline that plain old grep outputs). Maybe not necessary? Can bc add arguments separated by newlines?

Quote:
why doesn't bc < file just work


bc returns a (standard_in) parse error. Dunno why.

Thanks for your input so far. I gotta go do some reading! :-D

Chrisis
_________________
But the situation seemed to call for witty repartee. "Huh?" I said.
Back to top
View user's profile Send private message
robdavies
Tux's lil' helper
Tux's lil' helper


Joined: 06 Sep 2003
Posts: 90

PostPosted: Tue Sep 09, 2003 10:56 pm    Post subject: Reply with quote

Quote:
I guess my problem, being so new at this, is that 'just' using perl, or awk or even xargs = go away for a few months and try again.


No, no!! Don't take it that way, see if I don't mention how I'd do it, you'll wonder why a 'simple' report program is so hard in UNIX/Linux. Also later on when you read 'bout perl or awk you'ld wonder why noone mentioned them :)

Quote:
I did this to create one single line of arguments to be added (it eliminates the newline that plain old grep outputs). Maybe not necessary?


Right it's not necessary. Try using date(1) (means it's a user command in section 1 of man pages) on it's own. Then wonder why :

Code:
echo "Embedded `date`, where did date's newline go?"


See the shell actually does the right thing and chomp's up the newline when you use ``.

Quote:

Can bc add arguments separated by newlines?


Arguments are on command line, bc(1) reads stdin, so it would read each expression one to a line. The argument to bc(1) is interpreted as a filename.

Oh the data is useful, shows my mental thought experiment was not so far off, I don't have a 2nd machine at moment to try distccd on, so it's useful to see an extract.

Quote:

Thanks for your input so far. I gotta go do some reading!


Oh it's been kind of fun, unravelling the program then trying to explain things clearly. The program is actually doing something useful, so I've actually got some outlines for the perl version to compare already. I'm planning to use Gentoo on a Pentium MMX, for firewalling and it needs to use the grunt, of my SMP box for source builds.
Back to top
View user's profile Send private message
chrisis
Tux's lil' helper
Tux's lil' helper


Joined: 22 May 2003
Posts: 112
Location: Auckland

PostPosted: Tue Sep 09, 2003 11:09 pm    Post subject: Reply with quote

robdavies wrote:
Quote:
I guess my problem, being so new at this, is that 'just' using perl, or awk or even xargs = go away for a few months and try again.


No, no!! Don't take it that way, see if I don't mention how I'd do it, you'll wonder why a 'simple' report program is so hard in UNIX/Linux. Also later on when you read 'bout perl or awk you'ld wonder why noone mentioned them :)


What I meant is "go away for a few months, do a bunch of reading and learn a whole lot of new things (which is cool)"... :-D

Quote:
Oh it's been kind of fun, unravelling the program then trying to explain things clearly. The program is actually doing something useful, so I've actually got some outlines for the perl version to compare already. I'm planning to use Gentoo on a Pentium MMX, for firewalling and it needs to use the grunt, of my SMP box for source builds.


*warm feeling*. Good luck with your build. I'm planning on doing a similar thing with a celery 300. If you don't mind, can you post your perl version when it's ready?

Chrisis
_________________
But the situation seemed to call for witty repartee. "Huh?" I said.
Back to top
View user's profile Send private message
robdavies
Tux's lil' helper
Tux's lil' helper


Joined: 06 Sep 2003
Posts: 90

PostPosted: Wed Sep 10, 2003 8:16 am    Post subject: Reply with quote

OK here it is, just few minutes debugging time over 1 cup of coffee! If it looks cryptic to you at first, just console yourself with thought, that this will take you much less time to figure out and lookup 4 statements, than it did me to puzzle out, all the grep & cut, loops, and sed commands :

Code:

539rob@ash$ !per
perl distcc-time.pl < distcc.log
user    0.745s  sys     0.04s
543rob@ash$ perl distcc-time.pl distcc.log
user    0.745s  sys     0.04s


It wouldn't be too hard to alter END, to report in more human readable format format like time(1) 0m0.000s. Perhaps you'ld like to try that? The use of perl switches '-nl' turns on automatic line processing, and makes it loop like 'sed -n' over stdin or file command line arguments.

Code:

#!/usr/bin/perl -nl

END {
    print "user\t", $utime, "s\tsys\t", $stime, "s";
}

# distccd[12] (dcc_collect_child) cc times: user 0.600000s, system 0.020000s, 1200 minflt, 1112 majflt
if (/cc\s+times:\s+user\s+(\d+\.\d+)s,\s+system\s+(\d+\.\d+)s,/) {
    $utime += $1;
    $stime += $2;
}


The re uses \s+ to match one or more whitespace chars, and stuff between ( ) is automatically saved in $1, $2 etc. \d+ matches digits, \. a literal point. An re returns true, when it matches, so the times are only added for cc times: lines.

The main part of program, is run on each line of input, and could be from multiple log files specified on command line. The END { } block is awk(1) like, and is run at end of all input. It's possible to have BEGIN blocks to, and you could use getopts, to set command line switches, to only print combined total for example.

Most perl programs, should use the '-w' switch, which if you run it, will warn you of slippiness, like using uninitialised variables. Generally perl programs do not use -n or -p, nor BEGIN or END blocks, and are structured in similar way to shell or C.

Note the awesome power of re's applied to text processing, and how the reporting is seperate, finally how much boiler plate the funky perl switches can do for you.
Back to top
View user's profile Send private message
chrisis
Tux's lil' helper
Tux's lil' helper


Joined: 22 May 2003
Posts: 112
Location: Auckland

PostPosted: Mon Sep 22, 2003 8:38 pm    Post subject: Wow Reply with quote

Where did you learn this stuff? Thanks for sharing! :-)
_________________
But the situation seemed to call for witty repartee. "Huh?" I said.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum