View previous topic :: View next topic |
Author |
Message |
bastibasti Guru
Joined: 27 Nov 2006 Posts: 581
|
Posted: Thu Sep 19, 2013 5:37 pm Post subject: grep 2 out of 3? |
|
|
Hi I have 3 words. if a line contains at least two of them, i need grep to output the line. is this possible?
next step is two of 4 and 5 - but if theres a function this should only be a parameter? |
|
Back to top |
|
|
John R. Graham Administrator
Joined: 08 Mar 2005 Posts: 10589 Location: Somewhere over Atlanta, Georgia
|
Posted: Thu Sep 19, 2013 5:48 pm Post subject: |
|
|
Is grep the required tool here? This is much more elegantly implemented in AWK or Perl. Let me know. I can provide examples in any of those languages. With grep, it's not just a parameter, though: it's a complex regular expression that gets exponentially more complicated as your number goes up.
- John _________________ I can confirm that I have received between 0 and 499 National Security Letters. |
|
Back to top |
|
|
bastibasti Guru
Joined: 27 Nov 2006 Posts: 581
|
Posted: Thu Sep 19, 2013 6:16 pm Post subject: |
|
|
awk would be great. I was thinking to do it with two loops involving grep, but this might become terribly slow |
|
Back to top |
|
|
John R. Graham Administrator
Joined: 08 Mar 2005 Posts: 10589 Location: Somewhere over Atlanta, Georgia
|
Posted: Thu Sep 19, 2013 10:33 pm Post subject: |
|
|
Okay, here you go.
Example test file: test.txt: | one three
one four
one two
two three
five one
frog bert
fish one | Here's a simple 2 of 3 implemented as an AWK one liner: Code: | ~ $ awk '/one/ && /two/ || /one/ && /three/ || /two/ && /three/' test.txt
one three
one two
two three | And here's a somewhat more complex AWK implementation that allows you to set up an M of N match: test20a.awk: | #!/usr/bin/awk
BEGIN {
MinMatch = 2;
split("one two three four five", WordList);
}
{
Count = 0;
for (i in WordList) {
if (index($0, WordList[i])) {
if (++Count == MinMatch) {
print;
break;
}
}
}
} | which results in: Code: | ~ $ awk -f test20a.awk test.txt
one three
one four
one two
two three
five one | and voilà!
- John _________________ I can confirm that I have received between 0 and 499 National Security Letters. |
|
Back to top |
|
|
BitJam Advocate
Joined: 12 Aug 2003 Posts: 2508 Location: Silver City, NM
|
Posted: Fri Sep 20, 2013 9:23 am Post subject: |
|
|
Here is a Perl version. It only detects complete words. For example, the general awk script above will hit on "bone driftwood" or even "twoness" but the Perl script will not.
Code: | #!/usr/bin/perl
my %word_list = map { $_ => 1 } qw/one two three four five/;
my $min_match = 2;
while (<>) {
my $cnt = 0;
my $found = {};
for my $word (split /\b/) {
#print "$word\n";
$word_list{$word} or next;
$found->{$word}++ and next;
++$cnt >= $min_match or next;
print;
last;
}
} |
|
|
Back to top |
|
|
mv Watchman
Joined: 20 Apr 2005 Posts: 6747
|
Posted: Fri Sep 20, 2013 10:57 am Post subject: |
|
|
John R. Graham wrote: | And here's a somewhat more complex AWK implementation that allows you to set up an M of N match: |
If you allow that even repeated occurences of the same word match then you can do this rather simple with a single perl regular expresion (untested: probably also with GNU grep if you switch on perl regular expressions): Code: | perl -ne '/((word1|word2|word3).*){2}/ and print' file(s) | Replace the inner brace by \b( ... )\b if you want that only full words match. Overlapping matches are not recognized in any case. |
|
Back to top |
|
|
John R. Graham Administrator
Joined: 08 Mar 2005 Posts: 10589 Location: Somewhere over Atlanta, Georgia
|
Posted: Fri Sep 20, 2013 1:22 pm Post subject: |
|
|
Right. I was assuming that repeated words would not count towards the match criteria. If they would, then it's much simpler.
BitJam also has a very good point. Here's a corrected version of the AWK implementation that respects word boundaries: Code: | #!/usr/bin/awk
BEGIN {
MinMatch = 2;
split("one two three four five", WordList);
}
{
Count = 0;
for (i in WordList) {
for (j = 1; j <= NF; j++) {
if (WordList[i] == $j) {
if (++Count == MinMatch) {
print;
next;
}
break;
}
}
}
} | - John _________________ I can confirm that I have received between 0 and 499 National Security Letters. |
|
Back to top |
|
|
|