Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Why is sed faster than grep?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Duplicate Threads
View previous topic :: View next topic  
Author Message
Dralnu
Veteran
Veteran


Joined: 24 May 2006
Posts: 1919

PostPosted: Mon Sep 03, 2007 6:49 pm    Post subject: Why is sed faster than grep? Reply with quote

I've a few scripts that I've used grep in quite frequently (the major one is a script I made to make my life easier when dealing with my music collection and mpc), and spent some time one boring night seeing what the performance diffrence is between sed and grep.

What I found was (suprisingly to me), that sed outperforms grep by several times, especially on longer inputs (keep in mind this as done on what amounts to a single file).

Anyone know why it seems that grep would be slower than sed?
_________________
The day Microsoft makes a product that doesn't suck, is the day they make a vacuum cleaner.
Back to top
View user's profile Send private message
Sadako
Advocate
Advocate


Joined: 05 Aug 2004
Posts: 3792
Location: sleeping in the bathtub

PostPosted: Mon Sep 03, 2007 7:02 pm    Post subject: Reply with quote

Did you try grep with the --mmap flag?
_________________
"You have to invite me in"
Back to top
View user's profile Send private message
ppurka
Advocate
Advocate


Joined: 26 Dec 2004
Posts: 3256

PostPosted: Mon Sep 03, 2007 7:44 pm    Post subject: Re: Why is sed faster than grep? Reply with quote

Dralnu wrote:
Anyone know why it seems that grep would be slower than sed?
Did you try grep with UTF turned off? LC_ALL="C" grep <something>,
and compare that with sed.
_________________
emerge --quiet redefined | E17 vids: I, II | Now using kde5 | e is unstable :-/
Back to top
View user's profile Send private message
Dralnu
Veteran
Veteran


Joined: 24 May 2006
Posts: 1919

PostPosted: Tue Sep 04, 2007 5:34 am    Post subject: Re: Why is sed faster than grep? Reply with quote

ppurka wrote:
Dralnu wrote:
Anyone know why it seems that grep would be slower than sed?
Did you try grep with UTF turned off? LC_ALL="C" grep <something>,
and compare that with sed.


Since my system uses UTF-8, I don't think that will do me any real good. I do have a few bits that mpd doesn't like when they are in their original format (but then again, ttys don't like to display them either, and show them in \??? format...).

I'm trying --mmap, but so far from a single run of grep -v --mmap vs. sed -e '/word/d', sed wins by ~.2 seconds.

I'm going to mess around some, but this still seems, well, a bit retarded...
_________________
The day Microsoft makes a product that doesn't suck, is the day they make a vacuum cleaner.
Back to top
View user's profile Send private message
sts
Tux's lil' helper
Tux's lil' helper


Joined: 02 Jul 2007
Posts: 97

PostPosted: Tue Sep 04, 2007 5:51 am    Post subject: Reply with quote

If you are not doing regex searches then use fgrep or grep -F instead.

I read an article on "writing a faster grep" months ago and the author determined that grep actually was implemented with some tricks that made searches a lot faster ~90% of the time and much slower for the remaining 10%. I don't remember what the exceptions were so I'll try and find the article for you. Of course, if your search type is pretty common this is likely not the issue.

Edit: Here's the article for those interested.
Quote:
Put another way, grep sells out its worst case (lots of partial matches) to make the best case (few partial matches) go faster. How treacherous!
Back to top
View user's profile Send private message
Dralnu
Veteran
Veteran


Joined: 24 May 2006
Posts: 1919

PostPosted: Tue Sep 04, 2007 6:08 am    Post subject: Reply with quote

I'm not going to kill myself trying this a dozen times, so I'll post my findings:

Command format: time | mpc listall | <command timed>

Command piped in: mpc listall (Total of 705 lines)
First goal: Remove everything in the "New" dir

Matches: 383 lines displayed

Code:

fgrep -v --mmap "New"

.237
.234
.253

fgrep -v "New"

.231
.228
.232

sed -e '/New/d'

.032
.015
.034


Second goal: Match a specific group (Tool)
Matches: 45 lines displayed

Code:

fgrep --mmap "Tool"

.056
.039
.043

fgrep "Tool"

.041
.037
.037

sed -n -e '/Tool/p'

.014
.014
.016


I'm not going to bother trying this with a standard file right now, but so far sed has blown fgrep out of the water, and I'm not even going to bother trying grep itself. I've a page of times (granted I don't remember what times refer to what test, but it was quite a bit more extensive, with similar results).

The gap on fewer returned results is narrower, but still significant.
_________________
The day Microsoft makes a product that doesn't suck, is the day they make a vacuum cleaner.
Back to top
View user's profile Send private message
Dralnu
Veteran
Veteran


Joined: 24 May 2006
Posts: 1919

PostPosted: Tue Sep 04, 2007 6:16 am    Post subject: Reply with quote

sts wrote:
If you are not doing regex searches then use fgrep or grep -F instead.

I read an article on "writing a faster grep" months ago and the author determined that grep actually was implemented with some tricks that made searches a lot faster ~90% of the time and much slower for the remaining 10%. I don't remember what the exceptions were so I'll try and find the article for you. Of course, if your search type is pretty common this is likely not the issue.

Edit: Here's the article for those interested.
Quote:
Put another way, grep sells out its worst case (lots of partial matches) to make the best case (few partial matches) go faster. How treacherous!


I read the article. I should mention how (seemingly slow) grep is vs. sed, in what I can only expect to be a case where the longer the file, the larger the gap...
_________________
The day Microsoft makes a product that doesn't suck, is the day they make a vacuum cleaner.
Back to top
View user's profile Send private message
Sadako
Advocate
Advocate


Joined: 05 Aug 2004
Posts: 3792
Location: sleeping in the bathtub

PostPosted: Tue Sep 04, 2007 6:36 am    Post subject: Reply with quote

Dralnu, I think you really need to try this without utf8, as ppurka suggested.

On my system, with LC_ALL="en_IE@euro";
Code:
fgrep --mmap portage /var/log/emerge.log

real    0m6.746s
user    0m0.036s
sys     0m0.060s

time fgrep portage /var/log/emerge.log

real    0m6.510s
user    0m0.036s
sys     0m0.065s

time sed -n -e '/portage/p' /var/log/emerge.log

real    0m6.782s
user    0m0.095s
sys     0m0.071s

grep is actually a little faster, however after an `export LC_ALL=en_IE.UTF8`;
Code:
fgrep --mmap portage /var/log/emerge.log

real    0m38.121s
user    0m34.504s
sys     0m0.226s

fgrep portage /var/log/emerge.log

real    0m31.048s
user    0m28.860s
sys     0m0.201s

sed -n -e '/portage/p' /var/log/emerge.log

real    0m6.796s
user    0m0.485s
sys     0m0.080s
8O

I've read about how utf8 can have a performance hit in some cases, and it's one of the main reasons I don't use it, but this is ridiculous...
_________________
"You have to invite me in"
Back to top
View user's profile Send private message
Dralnu
Veteran
Veteran


Joined: 24 May 2006
Posts: 1919

PostPosted: Tue Sep 04, 2007 6:45 am    Post subject: Reply with quote

Except for the fact I do use utf-8, in which case I see no point in using a diffrent locale for a test on my machine. In either case, there is still an issue with grep if there is such a massive diffrence in performance when it comes down to utf-8.
_________________
The day Microsoft makes a product that doesn't suck, is the day they make a vacuum cleaner.
Back to top
View user's profile Send private message
Sadako
Advocate
Advocate


Joined: 05 Aug 2004
Posts: 3792
Location: sleeping in the bathtub

PostPosted: Tue Sep 04, 2007 6:50 am    Post subject: Reply with quote

Dralnu wrote:
Except for the fact I do use utf-8, in which case I see no point in using a diffrent locale for a test on my machine. In either case, there is still an issue with grep if there is such a massive diffrence in performance when it comes down to utf-8.

Are you not even remotely interested in seeing the results on your box without utf-8?
_________________
"You have to invite me in"
Back to top
View user's profile Send private message
Dralnu
Veteran
Veteran


Joined: 24 May 2006
Posts: 1919

PostPosted: Tue Sep 04, 2007 6:59 am    Post subject: Reply with quote

Hopeless wrote:
Dralnu wrote:
Except for the fact I do use utf-8, in which case I see no point in using a diffrent locale for a test on my machine. In either case, there is still an issue with grep if there is such a massive diffrence in performance when it comes down to utf-8.

Are you not even remotely interested in seeing the results on your box without utf-8?

Are you even remotely understanding that I use utf8? I'm not going to switch just for something like this.
_________________
The day Microsoft makes a product that doesn't suck, is the day they make a vacuum cleaner.
Back to top
View user's profile Send private message
Sadako
Advocate
Advocate


Joined: 05 Aug 2004
Posts: 3792
Location: sleeping in the bathtub

PostPosted: Tue Sep 04, 2007 7:05 am    Post subject: Reply with quote

Dralnu wrote:
Are you even remotely understanding that I use utf8? I'm not going to switch just for something like this.

I never said anything about switching, but if it was me I'd at least like to know for sure whether or not utf-8 was the culprit, and what kind of performance hit it causes.

I thought that was the reason you started this thread in the first place, guess I was mistaken.
_________________
"You have to invite me in"
Back to top
View user's profile Send private message
Dralnu
Veteran
Veteran


Joined: 24 May 2006
Posts: 1919

PostPosted: Tue Sep 04, 2007 7:12 am    Post subject: Reply with quote

Hopeless wrote:
Dralnu wrote:
Are you even remotely understanding that I use utf8? I'm not going to switch just for something like this.

I never said anything about switching, but if it was me I'd at least like to know for sure whether or not utf-8 was the culprit, and what kind of performance hit it causes.

I thought that was the reason you started this thread in the first place, guess I was mistaken.


I asked for the reason why grep was so slow. You're argument is a nice example of it working somewhere else, but I'm not going to run export every time I want to run grep. What would be the point of seeing if UTF is the culprit, when UTF is a factor that is there to stay?
_________________
The day Microsoft makes a product that doesn't suck, is the day they make a vacuum cleaner.
Back to top
View user's profile Send private message
Sadako
Advocate
Advocate


Joined: 05 Aug 2004
Posts: 3792
Location: sleeping in the bathtub

PostPosted: Tue Sep 04, 2007 7:14 am    Post subject: Reply with quote

Dralnu wrote:
I asked for the reason why grep was so slow. You're argument is a nice example of it working somewhere else, but I'm not going to run export every time I want to run grep. What would be the point of seeing if UTF is the culprit, when UTF is a factor that is there to stay?

I'm sorry, I just don't understand your logic here at all.

Anyway, goodnight.
_________________
"You have to invite me in"
Back to top
View user's profile Send private message
tylerwylie
Guru
Guru


Joined: 19 Sep 2004
Posts: 458
Location: /US/Georgia/Atlanta

PostPosted: Tue Sep 04, 2007 9:15 am    Post subject: Reply with quote

Hopeless wrote:

I'm sorry, I just don't understand your logic here at all.

Good luck.
Back to top
View user's profile Send private message
ppurka
Advocate
Advocate


Joined: 26 Dec 2004
Posts: 3256

PostPosted: Wed Sep 05, 2007 12:38 am    Post subject: Reply with quote

For those of you (still) interested
Code:
time grep portage /var/log/emerge.log >& /dev/null
grep portage /var/log/emerge.log >&/dev/null  24.45s user 0.12s system 99% cpu 24.687 total

time sed -n -e '/portage/p' /var/log/emerge.log >& /dev/null
sed -n -e '/portage/p' /var/log/emerge.log >&/dev/null  0.45s user 0.00s system 97% cpu 0.463 total

time LC_ALL="C" grep portage /var/log/emerge.log >& /dev/null
LC_ALL="C" grep portage /var/log/emerge.log >&/dev/null  0.01s user 0.00s system 75% cpu 0.025 total

NB: this topic had come into focus earlier in the forum thread on cfg-update
_________________
emerge --quiet redefined | E17 vids: I, II | Now using kde5 | e is unstable :-/
Back to top
View user's profile Send private message
sts
Tux's lil' helper
Tux's lil' helper


Joined: 02 Jul 2007
Posts: 97

PostPosted: Wed Sep 05, 2007 12:44 am    Post subject: Reply with quote

ppurka wrote:
For those of you (still) interested
Code:
time grep portage /var/log/emerge.log >& /dev/null
grep portage /var/log/emerge.log >&/dev/null  24.45s user 0.12s system 99% cpu 24.687 total

time sed -n -e '/portage/p' /var/log/emerge.log >& /dev/null
sed -n -e '/portage/p' /var/log/emerge.log >&/dev/null  0.45s user 0.00s system 97% cpu 0.463 total

time LC_ALL="C" grep portage /var/log/emerge.log >& /dev/null
LC_ALL="C" grep portage /var/log/emerge.log >&/dev/null  0.01s user 0.00s system 75% cpu 0.025 total

NB: this topic had come into focus earlier in the forum thread on cfg-update

Doesn't sed respect the locale settings, though?

If it does not, then what this essentially means is that by using sed Dranlu was no better-off than using grep w/out utf8 support and was actually worse-off since it is slower. :)
Back to top
View user's profile Send private message
ppurka
Advocate
Advocate


Joined: 26 Dec 2004
Posts: 3256

PostPosted: Wed Sep 05, 2007 3:57 pm    Post subject: Reply with quote

sts wrote:
Doesn't sed respect the locale settings, though?

If it does not, then what this essentially means is that by using sed Dranlu was no better-off than using grep w/out utf8 support and was actually worse-off since it is slower. :)
Glad you mentioned it!
Code:
time LC_ALL="C" sed -n -e '/portage/p' /var/log/emerge.log >& /dev/null
LC_ALL="C" sed -n -e '/portage/p' /var/log/emerge.log >&/dev/null  0.06s user 0.01s system 99% cpu 0.068 total
Yes, sed seems slightly slower, for the same locale (as long as the locale is set to C),- I guess grep is just optimized for certain specific cases.
_________________
emerge --quiet redefined | E17 vids: I, II | Now using kde5 | e is unstable :-/
Back to top
View user's profile Send private message
desultory
Bodhisattva
Bodhisattva


Joined: 04 Nov 2005
Posts: 9410

PostPosted: Tue Oct 20, 2009 10:48 am    Post subject: Reply with quote

Moved from Off the Wall to Duplicate Threads, refer to "[SOLVED] grep is too slow".
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Duplicate Threads All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum