View previous topic :: View next topic |
Author |
Message |
nalin Apprentice
Joined: 27 Sep 2002 Posts: 172 Location: Long Beach
|
Posted: Wed Sep 29, 2004 8:18 pm Post subject: Mail Magic, creating filters from prexisting mail |
|
|
A beta,beta,beta setup for gererating procmail filters which sort incoming mail, based on the manner in which existing mail is sorted.
The filters are created to (roughly) duplicate the manner that the user has sorted their mail for incoming messages, specifically they examine sender and subject for all messages in a given folder. If there are (a limited number of) similar charicteristics in sender, subject, or both (for instance an identical subject, a subject always containing "this is an example" with or without variations before or after, a sender domain always matching example.com or test.com), then they generate appropriate regular expressions and spit out a procmailrc file.
Cavearts:
This handling break down when there is too much variation (it is unable to get a pattern and does nothing adverse, it just wont sort into your folder). The solution to this is to divide the mail into more subfolders (ie manual filtering so that there are more commonalities in each folder).
This handling breaks down when there are only a handful of messages in each folder. There is too little variation and my script picks up specifics rather then more broad criteria. The solution to this one is to wait until there are ~5 messages to break them into a new folder.
The handling favors large folders over small ones (this is intentional because of the aforementioned caveart, but figured I would mention it).
The idea (and script) could be extended to cover other header fields, but first and formost it requires variation (generally present in the two fields I have chosen) because repeated rules execute only on the first occurance, and second I dont really need any other fields so I have not checked them.
Requirements:
IMAP mail handling. Imap folder names must not contain spaces (use underscores), regular expression (sed) reserved chars, or anthing else funky.
procmail (my instructions assume it is fully setup and following your .procmailrc)
vpopmail (I use a pretty standard qmail setup per the qmail vpopmail guide)
Script
Code: |
#!/bin/bash
#mailmagic.sh
USER_DOMAIN="example_domain.com"
USER_NAME=$1
MAILDIR_NAME=".maildir"
MAILDIR_ROOT="/var/vpopmail/domains"
MAILDIR_PATH=$MAILDIR_ROOT/$USER_DOMAIN/$USER_NAME/$MAILDIR_NAME
TMP_DIR="/tmp/auto_rules/"
MAX_PERCENTAGE_VARIATIONS=5
MAX_ALLOWED_VARIATIONS=3
PROCMAIL_FILE=$MAILDIR_ROOT/$USER_DOMAIN/$USER_NAME/.procmailrc
PROCMAIL_FILE_BAC=$PROCMAIL_FILE.bac
mv $PROCMAIL_FILE $PROCMAIL_FILE_BAC
echo "processing maildirs"
echo "" > $PROCMAIL_FILE
for MAIL_DIR in $(\
find $MAILDIR_PATH/ -name cur -mindepth 2 -type d -exec du -s {} \; | \
sort -n -r | \
sed -e "s/^[0-9]*[ \t]*//"
); do
echo started $MAIL_DIR...
if [ $MAIL_DIR != $MAILDIR_PATH/.Templates/cur ]; then
if [ `ls $MAIL_DIR | wc -l` -gt 0 ]; then
rm -rf $TMP_DIR
mkdir $TMP_DIR
echo -e "#autogenerated rule for folder $MAIL_DIR\n:0" >> $TMP_DIR\_procmail_new
for CONTENT in From Subject; do
grep -Erh -m1 "^$CONTENT: " $MAIL_DIR | sed -e 's/^/\\:\n\\:\\:\\:\n\\:\\:\n/' -e 's/ /\n/g' | nl -nrz -v1 | sort > $TMP_DIR$CONTENT\_temp
cat $TMP_DIR$CONTENT\_temp | uniq -c > $TMP_DIR$CONTENT\_temp_counts
cat $TMP_DIR$CONTENT\_temp | uniq -c -w 7 > $TMP_DIR$CONTENT\_temp_totals
MESSAGE_WORD_COUNT_MAX=$[`head -n1 $TMP_DIR$CONTENT\_temp_totals | sed -e "s/[^[^0-9]*\([0-9]*\)/\1/"`/3]
echo -n "* ^" > $TMP_DIR$CONTENT\_regexp
for MESSAGE_VALID_LINES in $(grep -e "^[^0-9]*$MESSAGE_WORD_COUNT_MAX" $TMP_DIR$CONTENT\_temp_totals | sed -e "s/^[ \t]*$MESSAGE_WORD_COUNT_MAX[ \t]*\([0-9]*\).*$/\1/"); do
MESSAGE_VARIATIONS=`grep $MESSAGE_VALID_LINES $TMP_DIR$CONTENT\_temp_counts | wc -l`
if [ $MESSAGE_VARIATIONS -lt 10 -a $MESSAGE_VARIATIONS -lt $[$MESSAGE_WORD_COUNT_MAX/2] ]; then
grep $MESSAGE_VALID_LINES $TMP_DIR$CONTENT\_temp_counts | sed -e "s/^[ \t]*[0-9]*[ \t]*[0-9]*[ \t*]//" | tr "\012" "\174" | sed -e "s/^\(.*\)|$/\(\1\).*/" >> $TMP_DIR$CONTENT\_regexp
elif [ $MESSAGE_VARIATIONS -lt 10 ]; then
grep $MESSAGE_VALID_LINES $TMP_DIR$CONTENT\_temp_counts | sed -e "s/^[ \t]*[0-9]*[ \t]*[0-9]*[ \t*]//" | tr "\012" "\174" | sed -e "s/^\(.*\)|$/\(\1\).*/" >> $TMP_DIR$CONTENT\_regexp
fi
done
#echo "\$" >> $TMP_DIR$CONTENT\_regexp
#cat $TMP_DIR$CONTENT\_regexp
#sed -i -e "s/\(^\^($CONTENT:)\.\*\$$\|^\^\$$\)//" $TMP_DIR$CONTENT\_regexp
if [ `cat $TMP_DIR$CONTENT\_regexp | wc -c` -gt `echo "* ^(" $CONTENT ").*^" | wc -c` ]; then
cat $TMP_DIR$CONTENT\_regexp >> $TMP_DIR\_procmail_new
echo "" >> $TMP_DIR\_procmail_new
else
rm $TMP_DIR$CONTENT\_regexp
fi
done
if [ `cat $TMP_DIR\_procmail_new | wc -l` -gt 2 ]; then
cat $TMP_DIR\_procmail_new >> $PROCMAIL_FILE
echo -e "$MAIL_DIR\n" | sed "s/cur$//" >> $PROCMAIL_FILE
else
echo -e "#autogenerated rule for folder $MAIL_DIR\n#CANNOT DETERMINE A PATTERN\n" >> $PROCMAIL_FILE
fi
else
echo "in ignore list..."
fi
echo "empty..."
fi
echo completed $MAIL_DIR
done
echo -e "#unhandled\n:0\n$MAILDIR_PATH/" >> $PROCMAIL_FILE
|
Setup:
copy the above to some location and chmod 755 it
change "example_domain.com" to your domain (it should match a subdirectory of /var/vpopmail/domains/)
Usage
/path/to/mail_auto_sort.sh user_to_sort
(user_to_sort should be a subdirectory of /var/vpopmail/domains/example_domain/)
Note that the old .procmailrc is renamed to .procmailrc.bak, you might want to prepend a line to .procmailrc that backs up all incoming mail, particularly if this is your first time using the script. Any feedback would be much appreciated. _________________ The "shopping" key is a whole different beast, "m" gets stuck and you hit it again - "shopping" gets stuck and you end up closing 129 instances of konqueror - thats why the hotkeys people are bastards |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|