Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
grep 2 out of 3?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
bastibasti
Guru
Guru


Joined: 27 Nov 2006
Posts: 581

PostPosted: Thu Sep 19, 2013 5:37 pm    Post subject: grep 2 out of 3? Reply with quote

Hi I have 3 words. if a line contains at least two of them, i need grep to output the line. is this possible?

next step is two of 4 and 5 - but if theres a function this should only be a parameter?
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10587
Location: Somewhere over Atlanta, Georgia

PostPosted: Thu Sep 19, 2013 5:48 pm    Post subject: Reply with quote

Is grep the required tool here? This is much more elegantly implemented in AWK or Perl. Let me know. I can provide examples in any of those languages. With grep, it's not just a parameter, though: it's a complex regular expression that gets exponentially more complicated as your number goes up.

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
Back to top
View user's profile Send private message
bastibasti
Guru
Guru


Joined: 27 Nov 2006
Posts: 581

PostPosted: Thu Sep 19, 2013 6:16 pm    Post subject: Reply with quote

awk would be great. I was thinking to do it with two loops involving grep, but this might become terribly slow ;-)
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10587
Location: Somewhere over Atlanta, Georgia

PostPosted: Thu Sep 19, 2013 10:33 pm    Post subject: Reply with quote

Okay, here you go. ;)

Example test file:
test.txt:
one three
one four
one two
two three
five one
frog bert
fish one
Here's a simple 2 of 3 implemented as an AWK one liner:
Code:
~ $ awk '/one/ && /two/ || /one/ && /three/ || /two/ && /three/' test.txt
one three
one two
two three
And here's a somewhat more complex AWK implementation that allows you to set up an M of N match:
test20a.awk:
#!/usr/bin/awk

BEGIN {
    MinMatch = 2;
    split("one two three four five", WordList);
}

{
    Count = 0;
    for (i in WordList) {
        if (index($0, WordList[i])) {
            if (++Count == MinMatch) {
                print;
                break;
            }
        }
    }
}
which results in:
Code:
~ $ awk -f test20a.awk test.txt
one three
one four
one two
two three
five one
and voilà! :)

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
Back to top
View user's profile Send private message
BitJam
Advocate
Advocate


Joined: 12 Aug 2003
Posts: 2506
Location: Silver City, NM

PostPosted: Fri Sep 20, 2013 9:23 am    Post subject: Reply with quote

Here is a Perl version. It only detects complete words. For example, the general awk script above will hit on "bone driftwood" or even "twoness" but the Perl script will not.
Code:
#!/usr/bin/perl

my %word_list = map { $_ => 1 } qw/one two three four five/;
my $min_match = 2;

while (<>) {
    my $cnt = 0;
    my $found = {};
    for my $word (split /\b/) {
        #print "$word\n";
        $word_list{$word}    or  next;
        $found->{$word}++    and next;
        ++$cnt >= $min_match or  next;
        print;
        last;
    }
}
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6747

PostPosted: Fri Sep 20, 2013 10:57 am    Post subject: Reply with quote

John R. Graham wrote:
And here's a somewhat more complex AWK implementation that allows you to set up an M of N match:

If you allow that even repeated occurences of the same word match then you can do this rather simple with a single perl regular expresion (untested: probably also with GNU grep if you switch on perl regular expressions):
Code:
perl -ne '/((word1|word2|word3).*){2}/ and print' file(s)
Replace the inner brace by \b( ... )\b if you want that only full words match. Overlapping matches are not recognized in any case.
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10587
Location: Somewhere over Atlanta, Georgia

PostPosted: Fri Sep 20, 2013 1:22 pm    Post subject: Reply with quote

Right. I was assuming that repeated words would not count towards the match criteria. If they would, then it's much simpler.

BitJam also has a very good point. Here's a corrected version of the AWK implementation that respects word boundaries:
Code:
#!/usr/bin/awk

BEGIN {
    MinMatch = 2;
    split("one two three four five", WordList);
}

{
    Count = 0;
    for (i in WordList) {
        for (j = 1; j <= NF; j++) {
            if (WordList[i] == $j) {
                if (++Count == MinMatch) {
                    print;
                    next;
                }
                break;
            }
        }
    }
}
- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum