Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
hashing out every second line after hashed line [SOLVED]
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
ryszardzonk
Apprentice
Apprentice


Joined: 18 Dec 2003
Posts: 225
Location: Rzeszów, POLAND

PostPosted: Fri Jun 29, 2018 10:16 pm    Post subject: hashing out every second line after hashed line [SOLVED] Reply with quote

Adblock2privoxy converter I use for some reason sometimes converts rules twice instead of just once while the second rule is worse than first and should not be there in the first place. Therefore question arises before converted gets fixed. Is it possible to hash out a line that comes two lines after a hash and if the new line with hash is reached start the count over?

File has about 700 000 lines and is recreated every 2 days so manual intervention is not feasible.

Original example
Code:
# ||wwwapteka.info^$third-party,popup (popup.txt: 456)
.wwwapteka.info
# ||www.*.xyz/*&key=$third-party,popup (popup.txt: 455)
.www.*./(*PRUNE).*?\.xyz/(*PRUNE).*?&key=
.www.*.xyz/(*PRUNE).*?&key=
# ||www.*.club/*&key=$third-party,popup (popup.txt: 454)
.www.*./(*PRUNE).*?\.club/(*PRUNE).*?&key=
.www.*.club/(*PRUNE).*?&key=
# ||wow-partners.com/click.php^$popup (popup.txt: 453)
.wow-partners.com/click\.php[^\w%.-]
# ||wildmikky.com^$popup (popup.txt: 452)
.wildmikky.com


what I would like to achieve
Code:
# ||wwwapteka.info^$third-party,popup (popup.txt: 456)
.wwwapteka.info
# ||www.*.xyz/*&key=$third-party,popup (popup.txt: 455)
.www.*./(*PRUNE).*?\.xyz/(*PRUNE).*?&key=
#.www.*.xyz/(*PRUNE).*?&key=
# ||www.*.club/*&key=$third-party,popup (popup.txt: 454)
.www.*./(*PRUNE).*?\.club/(*PRUNE).*?&key=
#.www.*.club/(*PRUNE).*?&key=
# ||wow-partners.com/click.php^$popup (popup.txt: 453)
.wow-partners.com/click\.php[^\w%.-]
# ||wildmikky.com^$popup (popup.txt: 452)
.wildmikky.com

_________________
Sky is not the limit...


Last edited by ryszardzonk on Sat Jun 30, 2018 8:22 am; edited 1 time in total
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 21607

PostPosted: Sat Jun 30, 2018 12:11 am    Post subject: Reply with quote

This could be a bit simpler, but is written long to make it clearer.
Code:
#!/usr/bin/python3

import sys

def main():
   state = None
   seen_comment = object()
   for line in sys.stdin:
      if line.startswith('#'):
         state = seen_comment
         # Fallthrough, print comment line unchanged
      elif state is seen_comment:
         state = None
         # Fallthrough, print active line unchanged
      else:
         # Current line is not a comment.
         # Most recent line was not a comment.
         # Convert current line.
         line = '#' + line
      print(line, end='')

main()
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3131

PostPosted: Sat Jun 30, 2018 12:51 am    Post subject: Reply with quote

Code:
#! /bin/sh
while read line
do case "$line" in
    \#* )
        count=0 ;;
    * )
        count=$(($count+1))
        if [[ "$count" == 2 ]]
        then line="#$line"
        fi ;;
    esac
    echo "$line"
done
Back to top
View user's profile Send private message
ryszardzonk
Apprentice
Apprentice


Joined: 18 Dec 2003
Posts: 225
Location: Rzeszów, POLAND

PostPosted: Sat Jun 30, 2018 7:59 am    Post subject: Reply with quote

Wow, I spend quite some time reading to get some sort of magical one liner using sed, but I guess it is much more complicated than that

@szatox
I run your code with cat filter.orig |./hash_fix.sh >filter.new and it did exactly what it was told with the exception that it also removed \ from all the lines which is undesired. Not sure why it did that, but when I removed \ from the code it obviously failed with improper syntax.

@Hu
Yes. It worked and worked even better than I expected as it turned out there were cases which I did not anticipated. Your code hashed not only second line after hash, but all lines following it until next hash which is not what I asked for, but because I have not seen it before converter sometimes created more than two rules while the every next one was more greedy than previous one. Thanks to you that problem is now gone before I even new it existed ;)

part of diff after running cat filter.orig |python hash_fix.py >filter.new
Code:
@@ -22380,14 +22380,14 @@
 /(*PRUNE).*?=ads_top&rand
 # ://*.*.biz/x$third-party,script (ru_advblock.txt: 1908)
 /(*PRUNE).*?://(*PRUNE).*?\.(*PRUNE).*?\.biz/x
-/(*PRUNE).*?\.(*PRUNE).*?\.biz/x
-.*.*./(*PRUNE).*?\.biz/x
-.*.*.biz/x
+#/(*PRUNE).*?\.(*PRUNE).*?\.biz/x
+#.*.*./(*PRUNE).*?\.biz/x
+#.*.*.biz/x
 # /zozoter.php?bid= (ru_advblock.txt: 1907)
 /(?:(*PRUNE).*?/)?zozoter\.php\?bid=
 # /youtube.php|$third-party,script (ru_advblock.txt: 1906)
 /(?:(*PRUNE).*?/)?youtube\.php$
-youtube.php
+#youtube.php
 # /ya-awaps2/* (ru_advblock.txt: 1905)
 /(?:(*PRUNE).*?/)?ya-awaps2/(*PRUNE).*?
 # /xhr_ab_block (ru_advblock.txt: 1904)


EDIT:
Hence code originally also consisted of lines that should not be hashed code did a bit more than intended, but they where easily fixed with sed
sed -i -e 's/#}/}/' filter.new
sed -i -e 's/#TAG/TAG/' filter.new

original code required for rules to work
Code:
#-ab2p-block-request-nX
{-client-header-tagger{ab2p-block-request-nX} \
}
TAG:^-ab2p-block-request-nX$

#ab2p-block-dnt
{+add-header{DNT: 1} \
}

_________________
Sky is not the limit...
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 21607

PostPosted: Sat Jun 30, 2018 2:52 pm    Post subject: Reply with quote

Those are Useless Uses of Cat. You can redirect stdin from the file instead of using cat file |.

That is a useless double use of sed. You can do both expressions in a single run. You could even do them in a single expression with the right syntax. For two expressions in one run, sed -i -e expr1 -e expr2 file instead of your sed -i -e expr1 file; sed -i -e expr2 file.

If there are particular lines that need to be excluded, you can modify the Python script not to mangle them in the first place. Change the else: to elif condition:, where condition matches lines that are commented out by my script, but that you want to exclude from modification. Based on your shown sed, I think you want elif line.startswith('}') or line.startswith('TAG'):.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum