View previous topic :: View next topic |
Author |
Message |
ryszardzonk Apprentice
Joined: 18 Dec 2003 Posts: 225 Location: Rzeszów, POLAND
|
Posted: Fri Jun 29, 2018 10:16 pm Post subject: hashing out every second line after hashed line [SOLVED] |
|
|
Adblock2privoxy converter I use for some reason sometimes converts rules twice instead of just once while the second rule is worse than first and should not be there in the first place. Therefore question arises before converted gets fixed. Is it possible to hash out a line that comes two lines after a hash and if the new line with hash is reached start the count over?
File has about 700 000 lines and is recreated every 2 days so manual intervention is not feasible.
Original example
Code: | # ||wwwapteka.info^$third-party,popup (popup.txt: 456)
.wwwapteka.info
# ||www.*.xyz/*&key=$third-party,popup (popup.txt: 455)
.www.*./(*PRUNE).*?\.xyz/(*PRUNE).*?&key=
.www.*.xyz/(*PRUNE).*?&key=
# ||www.*.club/*&key=$third-party,popup (popup.txt: 454)
.www.*./(*PRUNE).*?\.club/(*PRUNE).*?&key=
.www.*.club/(*PRUNE).*?&key=
# ||wow-partners.com/click.php^$popup (popup.txt: 453)
.wow-partners.com/click\.php[^\w%.-]
# ||wildmikky.com^$popup (popup.txt: 452)
.wildmikky.com |
what I would like to achieve
Code: | # ||wwwapteka.info^$third-party,popup (popup.txt: 456)
.wwwapteka.info
# ||www.*.xyz/*&key=$third-party,popup (popup.txt: 455)
.www.*./(*PRUNE).*?\.xyz/(*PRUNE).*?&key=
#.www.*.xyz/(*PRUNE).*?&key=
# ||www.*.club/*&key=$third-party,popup (popup.txt: 454)
.www.*./(*PRUNE).*?\.club/(*PRUNE).*?&key=
#.www.*.club/(*PRUNE).*?&key=
# ||wow-partners.com/click.php^$popup (popup.txt: 453)
.wow-partners.com/click\.php[^\w%.-]
# ||wildmikky.com^$popup (popup.txt: 452)
.wildmikky.com |
_________________ Sky is not the limit...
Last edited by ryszardzonk on Sat Jun 30, 2018 8:22 am; edited 1 time in total |
|
Back to top |
|
|
Hu Moderator
Joined: 06 Mar 2007 Posts: 21607
|
Posted: Sat Jun 30, 2018 12:11 am Post subject: |
|
|
This could be a bit simpler, but is written long to make it clearer.
Code: | #!/usr/bin/python3
import sys
def main():
state = None
seen_comment = object()
for line in sys.stdin:
if line.startswith('#'):
state = seen_comment
# Fallthrough, print comment line unchanged
elif state is seen_comment:
state = None
# Fallthrough, print active line unchanged
else:
# Current line is not a comment.
# Most recent line was not a comment.
# Convert current line.
line = '#' + line
print(line, end='')
main() |
|
|
Back to top |
|
|
szatox Advocate
Joined: 27 Aug 2013 Posts: 3131
|
Posted: Sat Jun 30, 2018 12:51 am Post subject: |
|
|
Code: | #! /bin/sh
while read line
do case "$line" in
\#* )
count=0 ;;
* )
count=$(($count+1))
if [[ "$count" == 2 ]]
then line="#$line"
fi ;;
esac
echo "$line"
done |
|
|
Back to top |
|
|
ryszardzonk Apprentice
Joined: 18 Dec 2003 Posts: 225 Location: Rzeszów, POLAND
|
Posted: Sat Jun 30, 2018 7:59 am Post subject: |
|
|
Wow, I spend quite some time reading to get some sort of magical one liner using sed, but I guess it is much more complicated than that
@szatox
I run your code with cat filter.orig |./hash_fix.sh >filter.new and it did exactly what it was told with the exception that it also removed \ from all the lines which is undesired. Not sure why it did that, but when I removed \ from the code it obviously failed with improper syntax.
@Hu
Yes. It worked and worked even better than I expected as it turned out there were cases which I did not anticipated. Your code hashed not only second line after hash, but all lines following it until next hash which is not what I asked for, but because I have not seen it before converter sometimes created more than two rules while the every next one was more greedy than previous one. Thanks to you that problem is now gone before I even new it existed
part of diff after running cat filter.orig |python hash_fix.py >filter.new
Code: | @@ -22380,14 +22380,14 @@
/(*PRUNE).*?=ads_top&rand
# ://*.*.biz/x$third-party,script (ru_advblock.txt: 1908)
/(*PRUNE).*?://(*PRUNE).*?\.(*PRUNE).*?\.biz/x
-/(*PRUNE).*?\.(*PRUNE).*?\.biz/x
-.*.*./(*PRUNE).*?\.biz/x
-.*.*.biz/x
+#/(*PRUNE).*?\.(*PRUNE).*?\.biz/x
+#.*.*./(*PRUNE).*?\.biz/x
+#.*.*.biz/x
# /zozoter.php?bid= (ru_advblock.txt: 1907)
/(?:(*PRUNE).*?/)?zozoter\.php\?bid=
# /youtube.php|$third-party,script (ru_advblock.txt: 1906)
/(?:(*PRUNE).*?/)?youtube\.php$
-youtube.php
+#youtube.php
# /ya-awaps2/* (ru_advblock.txt: 1905)
/(?:(*PRUNE).*?/)?ya-awaps2/(*PRUNE).*?
# /xhr_ab_block (ru_advblock.txt: 1904) |
EDIT:
Hence code originally also consisted of lines that should not be hashed code did a bit more than intended, but they where easily fixed with sed
sed -i -e 's/#}/}/' filter.new
sed -i -e 's/#TAG/TAG/' filter.new
original code required for rules to work
Code: | #-ab2p-block-request-nX
{-client-header-tagger{ab2p-block-request-nX} \
}
TAG:^-ab2p-block-request-nX$
#ab2p-block-dnt
{+add-header{DNT: 1} \
}
|
_________________ Sky is not the limit... |
|
Back to top |
|
|
Hu Moderator
Joined: 06 Mar 2007 Posts: 21607
|
Posted: Sat Jun 30, 2018 2:52 pm Post subject: |
|
|
Those are Useless Uses of Cat. You can redirect stdin from the file instead of using cat file |.
That is a useless double use of sed. You can do both expressions in a single run. You could even do them in a single expression with the right syntax. For two expressions in one run, sed -i -e expr1 -e expr2 file instead of your sed -i -e expr1 file; sed -i -e expr2 file.
If there are particular lines that need to be excluded, you can modify the Python script not to mangle them in the first place. Change the else: to elif condition:, where condition matches lines that are commented out by my script, but that you want to exclude from modification. Based on your shown sed, I think you want elif line.startswith('}') or line.startswith('TAG'):. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|