Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Mail Magic, creating filters from prexisting mail
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks
View previous topic :: View next topic  
Author Message
nalin
Apprentice
Apprentice


Joined: 27 Sep 2002
Posts: 172
Location: Long Beach

PostPosted: Wed Sep 29, 2004 8:18 pm    Post subject: Mail Magic, creating filters from prexisting mail Reply with quote

A beta,beta,beta setup for gererating procmail filters which sort incoming mail, based on the manner in which existing mail is sorted.

The filters are created to (roughly) duplicate the manner that the user has sorted their mail for incoming messages, specifically they examine sender and subject for all messages in a given folder. If there are (a limited number of) similar charicteristics in sender, subject, or both (for instance an identical subject, a subject always containing "this is an example" with or without variations before or after, a sender domain always matching example.com or test.com), then they generate appropriate regular expressions and spit out a procmailrc file.

Cavearts:
This handling break down when there is too much variation (it is unable to get a pattern and does nothing adverse, it just wont sort into your folder). The solution to this is to divide the mail into more subfolders (ie manual filtering so that there are more commonalities in each folder).
This handling breaks down when there are only a handful of messages in each folder. There is too little variation and my script picks up specifics rather then more broad criteria. The solution to this one is to wait until there are ~5 messages to break them into a new folder.
The handling favors large folders over small ones (this is intentional because of the aforementioned caveart, but figured I would mention it).

The idea (and script) could be extended to cover other header fields, but first and formost it requires variation (generally present in the two fields I have chosen) because repeated rules execute only on the first occurance, and second I dont really need any other fields so I have not checked them.

Requirements:
IMAP mail handling. Imap folder names must not contain spaces (use underscores), regular expression (sed) reserved chars, or anthing else funky.
procmail (my instructions assume it is fully setup and following your .procmailrc)
vpopmail (I use a pretty standard qmail setup per the qmail vpopmail guide)

Script
Code:

#!/bin/bash
#mailmagic.sh
USER_DOMAIN="example_domain.com"
USER_NAME=$1
MAILDIR_NAME=".maildir"
MAILDIR_ROOT="/var/vpopmail/domains"
MAILDIR_PATH=$MAILDIR_ROOT/$USER_DOMAIN/$USER_NAME/$MAILDIR_NAME
TMP_DIR="/tmp/auto_rules/"
MAX_PERCENTAGE_VARIATIONS=5
MAX_ALLOWED_VARIATIONS=3
PROCMAIL_FILE=$MAILDIR_ROOT/$USER_DOMAIN/$USER_NAME/.procmailrc
PROCMAIL_FILE_BAC=$PROCMAIL_FILE.bac

mv $PROCMAIL_FILE $PROCMAIL_FILE_BAC
echo "processing maildirs"
echo "" > $PROCMAIL_FILE
for MAIL_DIR in $(\
                        find $MAILDIR_PATH/ -name cur -mindepth 2 -type d -exec du -s {} \; | \
                        sort -n -r | \
                        sed -e "s/^[0-9]*[ \t]*//"
                ); do
        echo started $MAIL_DIR...
        if [ $MAIL_DIR != $MAILDIR_PATH/.Templates/cur ]; then
        if [ `ls $MAIL_DIR | wc -l` -gt 0 ]; then
                rm -rf $TMP_DIR
                mkdir $TMP_DIR
                echo -e "#autogenerated rule for folder $MAIL_DIR\n:0" >> $TMP_DIR\_procmail_new
                for CONTENT in From Subject; do
                        grep -Erh -m1 "^$CONTENT: " $MAIL_DIR | sed  -e 's/^/\\:\n\\:\\:\\:\n\\:\\:\n/' -e 's/ /\n/g' | nl -nrz -v1 | sort > $TMP_DIR$CONTENT\_temp
                        cat $TMP_DIR$CONTENT\_temp | uniq -c > $TMP_DIR$CONTENT\_temp_counts
                        cat $TMP_DIR$CONTENT\_temp | uniq -c -w 7 > $TMP_DIR$CONTENT\_temp_totals
                        MESSAGE_WORD_COUNT_MAX=$[`head -n1 $TMP_DIR$CONTENT\_temp_totals | sed -e "s/[^[^0-9]*\([0-9]*\)/\1/"`/3]
                        echo -n "* ^" > $TMP_DIR$CONTENT\_regexp
                        for MESSAGE_VALID_LINES in $(grep -e "^[^0-9]*$MESSAGE_WORD_COUNT_MAX" $TMP_DIR$CONTENT\_temp_totals | sed -e "s/^[ \t]*$MESSAGE_WORD_COUNT_MAX[ \t]*\([0-9]*\).*$/\1/"); do
                                MESSAGE_VARIATIONS=`grep $MESSAGE_VALID_LINES $TMP_DIR$CONTENT\_temp_counts | wc -l`
                                if [ $MESSAGE_VARIATIONS -lt 10 -a $MESSAGE_VARIATIONS -lt $[$MESSAGE_WORD_COUNT_MAX/2] ]; then
                                        grep $MESSAGE_VALID_LINES $TMP_DIR$CONTENT\_temp_counts | sed -e "s/^[ \t]*[0-9]*[ \t]*[0-9]*[ \t*]//"  | tr "\012" "\174" | sed -e "s/^\(.*\)|$/\(\1\).*/" >> $TMP_DIR$CONTENT\_regexp
                                elif [ $MESSAGE_VARIATIONS -lt 10 ]; then
                                        grep $MESSAGE_VALID_LINES $TMP_DIR$CONTENT\_temp_counts | sed -e "s/^[ \t]*[0-9]*[ \t]*[0-9]*[ \t*]//"  | tr "\012" "\174" | sed -e "s/^\(.*\)|$/\(\1\).*/" >> $TMP_DIR$CONTENT\_regexp
                                fi
                        done
                        #echo "\$" >> $TMP_DIR$CONTENT\_regexp
                        #cat $TMP_DIR$CONTENT\_regexp
                        #sed -i -e "s/\(^\^($CONTENT:)\.\*\$$\|^\^\$$\)//" $TMP_DIR$CONTENT\_regexp
                        if [ `cat $TMP_DIR$CONTENT\_regexp | wc -c` -gt `echo "* ^(" $CONTENT ").*^" | wc -c` ]; then
                                cat $TMP_DIR$CONTENT\_regexp >> $TMP_DIR\_procmail_new
                                echo "" >> $TMP_DIR\_procmail_new
                        else
                                rm $TMP_DIR$CONTENT\_regexp
                        fi
                done
                if [ `cat $TMP_DIR\_procmail_new | wc -l` -gt 2 ]; then
                        cat $TMP_DIR\_procmail_new >> $PROCMAIL_FILE
                        echo -e "$MAIL_DIR\n" | sed "s/cur$//" >> $PROCMAIL_FILE
                else
                        echo -e "#autogenerated rule for folder $MAIL_DIR\n#CANNOT DETERMINE A PATTERN\n" >> $PROCMAIL_FILE
                fi
        else
                echo "in ignore list..."
        fi
                echo "empty..."
        fi
        echo completed $MAIL_DIR
done
echo -e "#unhandled\n:0\n$MAILDIR_PATH/" >> $PROCMAIL_FILE


Setup:
copy the above to some location and chmod 755 it
change "example_domain.com" to your domain (it should match a subdirectory of /var/vpopmail/domains/)

Usage
/path/to/mail_auto_sort.sh user_to_sort
(user_to_sort should be a subdirectory of /var/vpopmail/domains/example_domain/)

Note that the old .procmailrc is renamed to .procmailrc.bak, you might want to prepend a line to .procmailrc that backs up all incoming mail, particularly if this is your first time using the script. Any feedback would be much appreciated.
_________________
The "shopping" key is a whole different beast, "m" gets stuck and you hit it again - "shopping" gets stuck and you end up closing 129 instances of konqueror - thats why the hotkeys people are bastards
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum