View previous topic :: View next topic |
Author |
Message |
Dralnu Veteran
Joined: 24 May 2006 Posts: 1919
|
Posted: Mon Sep 03, 2007 6:49 pm Post subject: Why is sed faster than grep? |
|
|
I've a few scripts that I've used grep in quite frequently (the major one is a script I made to make my life easier when dealing with my music collection and mpc), and spent some time one boring night seeing what the performance diffrence is between sed and grep.
What I found was (suprisingly to me), that sed outperforms grep by several times, especially on longer inputs (keep in mind this as done on what amounts to a single file).
Anyone know why it seems that grep would be slower than sed? _________________ The day Microsoft makes a product that doesn't suck, is the day they make a vacuum cleaner. |
|
Back to top |
|
|
Sadako Advocate
Joined: 05 Aug 2004 Posts: 3792 Location: sleeping in the bathtub
|
|
Back to top |
|
|
ppurka Advocate
Joined: 26 Dec 2004 Posts: 3256
|
Posted: Mon Sep 03, 2007 7:44 pm Post subject: Re: Why is sed faster than grep? |
|
|
Dralnu wrote: | Anyone know why it seems that grep would be slower than sed? | Did you try grep with UTF turned off? LC_ALL="C" grep <something>,
and compare that with sed. _________________ emerge --quiet redefined | E17 vids: I, II | Now using kde5 | e is unstable :-/ |
|
Back to top |
|
|
Dralnu Veteran
Joined: 24 May 2006 Posts: 1919
|
Posted: Tue Sep 04, 2007 5:34 am Post subject: Re: Why is sed faster than grep? |
|
|
ppurka wrote: | Dralnu wrote: | Anyone know why it seems that grep would be slower than sed? | Did you try grep with UTF turned off? LC_ALL="C" grep <something>,
and compare that with sed. |
Since my system uses UTF-8, I don't think that will do me any real good. I do have a few bits that mpd doesn't like when they are in their original format (but then again, ttys don't like to display them either, and show them in \??? format...).
I'm trying --mmap, but so far from a single run of grep -v --mmap vs. sed -e '/word/d', sed wins by ~.2 seconds.
I'm going to mess around some, but this still seems, well, a bit retarded... _________________ The day Microsoft makes a product that doesn't suck, is the day they make a vacuum cleaner. |
|
Back to top |
|
|
sts Tux's lil' helper
Joined: 02 Jul 2007 Posts: 97
|
Posted: Tue Sep 04, 2007 5:51 am Post subject: |
|
|
If you are not doing regex searches then use fgrep or grep -F instead.
I read an article on "writing a faster grep" months ago and the author determined that grep actually was implemented with some tricks that made searches a lot faster ~90% of the time and much slower for the remaining 10%. I don't remember what the exceptions were so I'll try and find the article for you. Of course, if your search type is pretty common this is likely not the issue.
Edit: Here's the article for those interested.
Quote: | Put another way, grep sells out its worst case (lots of partial matches) to make the best case (few partial matches) go faster. How treacherous! |
|
|
Back to top |
|
|
Dralnu Veteran
Joined: 24 May 2006 Posts: 1919
|
Posted: Tue Sep 04, 2007 6:08 am Post subject: |
|
|
I'm not going to kill myself trying this a dozen times, so I'll post my findings:
Command format: time | mpc listall | <command timed>
Command piped in: mpc listall (Total of 705 lines)
First goal: Remove everything in the "New" dir
Matches: 383 lines displayed
Code: |
fgrep -v --mmap "New"
.237
.234
.253
fgrep -v "New"
.231
.228
.232
sed -e '/New/d'
.032
.015
.034
|
Second goal: Match a specific group (Tool)
Matches: 45 lines displayed
Code: |
fgrep --mmap "Tool"
.056
.039
.043
fgrep "Tool"
.041
.037
.037
sed -n -e '/Tool/p'
.014
.014
.016
|
I'm not going to bother trying this with a standard file right now, but so far sed has blown fgrep out of the water, and I'm not even going to bother trying grep itself. I've a page of times (granted I don't remember what times refer to what test, but it was quite a bit more extensive, with similar results).
The gap on fewer returned results is narrower, but still significant. _________________ The day Microsoft makes a product that doesn't suck, is the day they make a vacuum cleaner. |
|
Back to top |
|
|
Dralnu Veteran
Joined: 24 May 2006 Posts: 1919
|
Posted: Tue Sep 04, 2007 6:16 am Post subject: |
|
|
sts wrote: | If you are not doing regex searches then use fgrep or grep -F instead.
I read an article on "writing a faster grep" months ago and the author determined that grep actually was implemented with some tricks that made searches a lot faster ~90% of the time and much slower for the remaining 10%. I don't remember what the exceptions were so I'll try and find the article for you. Of course, if your search type is pretty common this is likely not the issue.
Edit: Here's the article for those interested.
Quote: | Put another way, grep sells out its worst case (lots of partial matches) to make the best case (few partial matches) go faster. How treacherous! |
|
I read the article. I should mention how (seemingly slow) grep is vs. sed, in what I can only expect to be a case where the longer the file, the larger the gap... _________________ The day Microsoft makes a product that doesn't suck, is the day they make a vacuum cleaner. |
|
Back to top |
|
|
Sadako Advocate
Joined: 05 Aug 2004 Posts: 3792 Location: sleeping in the bathtub
|
Posted: Tue Sep 04, 2007 6:36 am Post subject: |
|
|
Dralnu, I think you really need to try this without utf8, as ppurka suggested.
On my system, with LC_ALL="en_IE@euro"; Code: | fgrep --mmap portage /var/log/emerge.log
real 0m6.746s
user 0m0.036s
sys 0m0.060s
time fgrep portage /var/log/emerge.log
real 0m6.510s
user 0m0.036s
sys 0m0.065s
time sed -n -e '/portage/p' /var/log/emerge.log
real 0m6.782s
user 0m0.095s
sys 0m0.071s |
grep is actually a little faster, however after an `export LC_ALL=en_IE.UTF8`; Code: | fgrep --mmap portage /var/log/emerge.log
real 0m38.121s
user 0m34.504s
sys 0m0.226s
fgrep portage /var/log/emerge.log
real 0m31.048s
user 0m28.860s
sys 0m0.201s
sed -n -e '/portage/p' /var/log/emerge.log
real 0m6.796s
user 0m0.485s
sys 0m0.080s |
I've read about how utf8 can have a performance hit in some cases, and it's one of the main reasons I don't use it, but this is ridiculous... _________________ "You have to invite me in" |
|
Back to top |
|
|
Dralnu Veteran
Joined: 24 May 2006 Posts: 1919
|
Posted: Tue Sep 04, 2007 6:45 am Post subject: |
|
|
Except for the fact I do use utf-8, in which case I see no point in using a diffrent locale for a test on my machine. In either case, there is still an issue with grep if there is such a massive diffrence in performance when it comes down to utf-8. _________________ The day Microsoft makes a product that doesn't suck, is the day they make a vacuum cleaner. |
|
Back to top |
|
|
Sadako Advocate
Joined: 05 Aug 2004 Posts: 3792 Location: sleeping in the bathtub
|
Posted: Tue Sep 04, 2007 6:50 am Post subject: |
|
|
Dralnu wrote: | Except for the fact I do use utf-8, in which case I see no point in using a diffrent locale for a test on my machine. In either case, there is still an issue with grep if there is such a massive diffrence in performance when it comes down to utf-8. |
Are you not even remotely interested in seeing the results on your box without utf-8? _________________ "You have to invite me in" |
|
Back to top |
|
|
Dralnu Veteran
Joined: 24 May 2006 Posts: 1919
|
Posted: Tue Sep 04, 2007 6:59 am Post subject: |
|
|
Hopeless wrote: | Dralnu wrote: | Except for the fact I do use utf-8, in which case I see no point in using a diffrent locale for a test on my machine. In either case, there is still an issue with grep if there is such a massive diffrence in performance when it comes down to utf-8. |
Are you not even remotely interested in seeing the results on your box without utf-8? |
Are you even remotely understanding that I use utf8? I'm not going to switch just for something like this. _________________ The day Microsoft makes a product that doesn't suck, is the day they make a vacuum cleaner. |
|
Back to top |
|
|
Sadako Advocate
Joined: 05 Aug 2004 Posts: 3792 Location: sleeping in the bathtub
|
Posted: Tue Sep 04, 2007 7:05 am Post subject: |
|
|
Dralnu wrote: | Are you even remotely understanding that I use utf8? I'm not going to switch just for something like this. |
I never said anything about switching, but if it was me I'd at least like to know for sure whether or not utf-8 was the culprit, and what kind of performance hit it causes.
I thought that was the reason you started this thread in the first place, guess I was mistaken. _________________ "You have to invite me in" |
|
Back to top |
|
|
Dralnu Veteran
Joined: 24 May 2006 Posts: 1919
|
Posted: Tue Sep 04, 2007 7:12 am Post subject: |
|
|
Hopeless wrote: | Dralnu wrote: | Are you even remotely understanding that I use utf8? I'm not going to switch just for something like this. |
I never said anything about switching, but if it was me I'd at least like to know for sure whether or not utf-8 was the culprit, and what kind of performance hit it causes.
I thought that was the reason you started this thread in the first place, guess I was mistaken. |
I asked for the reason why grep was so slow. You're argument is a nice example of it working somewhere else, but I'm not going to run export every time I want to run grep. What would be the point of seeing if UTF is the culprit, when UTF is a factor that is there to stay? _________________ The day Microsoft makes a product that doesn't suck, is the day they make a vacuum cleaner. |
|
Back to top |
|
|
Sadako Advocate
Joined: 05 Aug 2004 Posts: 3792 Location: sleeping in the bathtub
|
Posted: Tue Sep 04, 2007 7:14 am Post subject: |
|
|
Dralnu wrote: | I asked for the reason why grep was so slow. You're argument is a nice example of it working somewhere else, but I'm not going to run export every time I want to run grep. What would be the point of seeing if UTF is the culprit, when UTF is a factor that is there to stay? |
I'm sorry, I just don't understand your logic here at all.
Anyway, goodnight. _________________ "You have to invite me in" |
|
Back to top |
|
|
tylerwylie Guru
Joined: 19 Sep 2004 Posts: 458 Location: /US/Georgia/Atlanta
|
Posted: Tue Sep 04, 2007 9:15 am Post subject: |
|
|
Hopeless wrote: |
I'm sorry, I just don't understand your logic here at all.
| Good luck. |
|
Back to top |
|
|
ppurka Advocate
Joined: 26 Dec 2004 Posts: 3256
|
Posted: Wed Sep 05, 2007 12:38 am Post subject: |
|
|
For those of you (still) interested
Code: | time grep portage /var/log/emerge.log >& /dev/null
grep portage /var/log/emerge.log >&/dev/null 24.45s user 0.12s system 99% cpu 24.687 total
time sed -n -e '/portage/p' /var/log/emerge.log >& /dev/null
sed -n -e '/portage/p' /var/log/emerge.log >&/dev/null 0.45s user 0.00s system 97% cpu 0.463 total
time LC_ALL="C" grep portage /var/log/emerge.log >& /dev/null
LC_ALL="C" grep portage /var/log/emerge.log >&/dev/null 0.01s user 0.00s system 75% cpu 0.025 total
|
NB: this topic had come into focus earlier in the forum thread on cfg-update _________________ emerge --quiet redefined | E17 vids: I, II | Now using kde5 | e is unstable :-/ |
|
Back to top |
|
|
sts Tux's lil' helper
Joined: 02 Jul 2007 Posts: 97
|
Posted: Wed Sep 05, 2007 12:44 am Post subject: |
|
|
ppurka wrote: | For those of you (still) interested
Code: | time grep portage /var/log/emerge.log >& /dev/null
grep portage /var/log/emerge.log >&/dev/null 24.45s user 0.12s system 99% cpu 24.687 total
time sed -n -e '/portage/p' /var/log/emerge.log >& /dev/null
sed -n -e '/portage/p' /var/log/emerge.log >&/dev/null 0.45s user 0.00s system 97% cpu 0.463 total
time LC_ALL="C" grep portage /var/log/emerge.log >& /dev/null
LC_ALL="C" grep portage /var/log/emerge.log >&/dev/null 0.01s user 0.00s system 75% cpu 0.025 total
|
NB: this topic had come into focus earlier in the forum thread on cfg-update |
Doesn't sed respect the locale settings, though?
If it does not, then what this essentially means is that by using sed Dranlu was no better-off than using grep w/out utf8 support and was actually worse-off since it is slower. |
|
Back to top |
|
|
ppurka Advocate
Joined: 26 Dec 2004 Posts: 3256
|
Posted: Wed Sep 05, 2007 3:57 pm Post subject: |
|
|
sts wrote: | Doesn't sed respect the locale settings, though?
If it does not, then what this essentially means is that by using sed Dranlu was no better-off than using grep w/out utf8 support and was actually worse-off since it is slower. | Glad you mentioned it! Code: | time LC_ALL="C" sed -n -e '/portage/p' /var/log/emerge.log >& /dev/null
LC_ALL="C" sed -n -e '/portage/p' /var/log/emerge.log >&/dev/null 0.06s user 0.01s system 99% cpu 0.068 total | Yes, sed seems slightly slower, for the same locale (as long as the locale is set to C),- I guess grep is just optimized for certain specific cases. _________________ emerge --quiet redefined | E17 vids: I, II | Now using kde5 | e is unstable :-/ |
|
Back to top |
|
|
desultory Bodhisattva
Joined: 04 Nov 2005 Posts: 9410
|
Posted: Tue Oct 20, 2009 10:48 am Post subject: |
|
|
Moved from Off the Wall to Duplicate Threads, refer to "[SOLVED] grep is too slow". |
|
Back to top |
|
|
|