Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Causing spammers serious pain
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2  
Reply to topic    Gentoo Forums Forum Index Off the Wall
View previous topic :: View next topic  
Author Message
jkcunningham
l33t
l33t


Joined: 28 Apr 2003
Posts: 649
Location: 47.49N 121.79W

PostPosted: Mon Jun 02, 2003 1:05 am    Post subject: Reply with quote

Actually, Bayes filtering is in my field of expertise. If you read my comment more carefully, you'll notice my caveat about small LAN's in there. I was not commenting on implementations necessary for high volume email systems. I was responding to what appeared to me no recognition that SpamAssassin has Bayes filtering at all. Here's the quote:
Quote:

A properly trained bogofilter is much better than SpamAssassin because it's trained on the spam that you get, not Somebody Else's Spam.

Do you see another way to read that? They both have Bayes filters (sort of), and they both train on your own spam and your own not-spam.

For people running small volumes of email - like most people - the speed differences are probably not quantifiable - like I said. SpamAssassin is a decent piece of work, and it was one of the first, and there's really no reason for you to be bad-mouthing it.

-Jeff
Back to top
View user's profile Send private message
LosD
n00b
n00b


Joined: 12 Jun 2002
Posts: 61
Location: Taastrup, Denmark

PostPosted: Mon Jun 02, 2003 4:50 am    Post subject: Reply with quote

jkcunningham wrote:

For people running small volumes of email - like most people - the speed differences are probably not quantifiable - like I said. SpamAssassin is a decent piece of work, and it was one of the first, and there's really no reason for you to be bad-mouthing it.
-Jeff

Of course there is... He thinks that something else is better, as he has experienced less false positives and faster/less CPU consuming implementation, on a very used mail server.

Why shouldn't he tell that?

Dennis
Back to top
View user's profile Send private message
smoku
n00b
n00b


Joined: 02 Jun 2003
Posts: 0
Location: Warsaw, Poland

PostPosted: Mon Jun 02, 2003 8:41 am    Post subject: Reply with quote

There is a way of actually hurting spammers without hitting innocent people.
Hurt their email harversters, the web spiders hunting for email addresses.
Fill their databases with junk.

Look how Spam-X works.
Back to top
View user's profile Send private message
absinthe
Retired Dev
Retired Dev


Joined: 06 Oct 2002
Posts: 111
Location: San Francisco, CA, USA

PostPosted: Mon Jun 02, 2003 11:15 am    Post subject: Reply with quote

jkcunningham wrote:
They both have Bayes filters (sort of), and they both train on your own spam and your own not-spam.


Once more: just because SpamAssassin has a bayesian mechanism in it doesn't imply it works as well in real-use scenarios. I'll say it again... all bayesian filters are not created equal. There is a great deal of variable results depending on how the engine is tuned and the specific algorithms used.

Properly trained, I will suggest bogofilter is the better solution for most people -- especially people on slower machines who don't have the clock for a perl-based spam engine.

jkcunningham wrote:
For people running small volumes of email - like most people - the speed differences are probably not quantifiable - like I said. SpamAssassin is a decent piece of work, and it was one of the first, and there's really no reason for you to be bad-mouthing it.


1. On slower machines there is an obvious, subjective performance difference even on a light volume of personal mail. It's even noticeable on 1.x GHz Athlons and 2 GHz P4s. You can run 'time' on this if you doubt me. The numbers don't lie.

2. I'm not bad-mouthing SpamAssassin ... SpamAssassin would be the logical choice if more effective solutions didn't exist. To have any proper discussion on methods of handling spam without mentioning a product that is superior in its class is irresponsible.

It's safe to assume that anyone will continue to get more email over time -- not less. Even if SpamAssassin implements the same bayesian algorithms as bogofilter later on -- it will still not be comparable because of the 'speed' issue. The perl implementation will always be subjectively and quantitatively slower. Therefore it is adviseable to give them the solution which performs best. End of story.

Defending SpamAssassin because it was "one of the first" is a weak argument. It seems here like you're trying to defend SpamAssassin without knowing anything about bogofilter. I've run both extensively (and others, not least of which is my own homegrown procmail stuff, and popfile). Furthermore, you accused me of not knowing anything about SpamAssassin 2.50 -- when in fact I was running SA 2.53/razor, very possibly on many more machines than you were.

You know what they say about point-counterpoint: "Without understanding the opposing viewpoint, you don't fully understand your own."

That said, I agree with your post(s) about not retaliating against spammers. Spammers are just like women -- the only true way to drive them crazy is to ignore them.
Back to top
View user's profile Send private message
nerdbert
l33t
l33t


Joined: 09 Feb 2003
Posts: 981
Location: Berlin

PostPosted: Mon Jun 02, 2003 1:14 pm    Post subject: Reply with quote

I don't think there is a good way to fight spam right now - it's nearly impossible to hurt a spammer with a computer.

All we need is better legislation.
First of all, everybody should be able to sue a spammer who obscures his real identity (should be quite easy to find out the real guy behind some spam - try to purchase a "product" from him e.g. a nifty penis enlagement - and you'll know who he is). Such a law already exists in the state of Washington AFAIK - heard that some people made heaps of money by sueing spammers.

My second point is that every spammer should be forced to put "ADV" into their header. One simple filter rule - no spam.

Everybody can call me a lunatic - I know that it is quite hard to make every country on the globe adopt a law, but it is possible (think about IP). I just don't see any other effective way to make spam unatractive to those who cause it.

BTW: Has anybody heard of the (sadly win only) tool spamnet? It's a plugin for outlook. Its users send a fingerprint of their spam to a database, so whenever another user recieves the very same mail it will be deleted right away. Worked better than any bayes for me and I didn't have to worry about false positives...

[edit & ot] I'm just curious Absinthe, what is your favorite brand?
Back to top
View user's profile Send private message
dabooty
Guru
Guru


Joined: 15 May 2003
Posts: 482
Location: Belgium

PostPosted: Mon Jun 02, 2003 3:24 pm    Post subject: anti spammers Reply with quote

i have been working for a company (which i left for its shady tactics btw) who sent off a lot of semi-legit spam, and this is my way to get out

try to give absolutely ZERO sign of life. Try to look like an address that's either false or abandoned.
Addresses get sold for a few cents per verified email adress. adresses that seem te be non-existent get filtered out of the database

here are my tricks (not even sure if they help, but i don't seem to get a lot of spam)
- never download images from internet to my mail client
> spammers keep track of their mails with images being downloaded off their servers. this way they clean up their database.
- never reply to a spam message
- bounce mails that are spam right off the server
- try to get in as little databases as possible, be carefull with your emailadress, try to not post it on a site.
- whenever you need to register with a phony emailadress, use a microsoft email adress :), if somebody has the money to do something about it it's them.

and if you are a webmaster like me, don't give out your emailadresses, try to scramble them with javascript (breaking compatibility with some browsers, but they'll only miss the adress)
Code:
<script language="javascript">var a='be';var b='tom';var
c='m'+'a'+'i'+'l'+'t'+'o';document.write('<a href=\'' + c + ':' + b +
'vergote.linux@subdimension.' + a + '\'>' + b + 'vergote.linux@subdimension.' + a +
'</a>');</script>

produces my (old) email adress
_________________
registered user #284425
get yourself counted
http://counter.li.org
------
#emerge -pv solves a lot of questions beforehand
Back to top
View user's profile Send private message
absinthe
Retired Dev
Retired Dev


Joined: 06 Oct 2002
Posts: 111
Location: San Francisco, CA, USA

PostPosted: Mon Jun 02, 2003 7:00 pm    Post subject: Reply with quote

nerdbert wrote:
All we need is better legislation.

Oh man, I wish it were that simple...

Here in Connecticut (USA), we have a state-government-sponsored "no-call" list when it comes to telephones. That means if you notify them that you don't want calls from telemarketers, you will be placed on a list, and telemarketers must check this list before attempting to market any products or services to residents in Connecticut. If by chance you receive calls despite being on the "no-call" list, each resident can complain and the district attorney's office will pursue any company that did not comply.

Surprisingly it is effective. I never receive any telemarketing phone calls -- except for "non-profit" companies asking for donations to organizations such as the Firefighter's Victim's Fund, etc. Apparently those get through a loophole in the legislation.

Something like this email problem is three-fold.

  1. it reveals some deficiencies in the SMTP protocol to prevent UCE.
  2. it reveals the general lack of courtesy in our society during these times.
  3. it reveals the extent of the buying & selling of personally-identifiable information among companies.

Unfortunately, I doubt the federal government to do anything right. The feds already legislate more than they should, and what they do they typically screw up in some way. Chiefly I don't trust them to put through a bill that should be relatively simple without adding in a bunch of pork. Many times, this pork will hurt citizens by either reducing the privacy rights or trashing basic freedoms.

At least with telephones, it's fairly easy to track down who the telemarketer was. With UCE and the current state of the SMTP protocol, it may be impossible to determine who exactly was responsible for your spam. Very difficult to enforce anything without sufficient evidence linking them to the crime -- or more accurately -- removing any doubt it could be someone else who committed the crime.

Secondarily, as you mentioned -- any telemarketer who actually gets pursued by the law in the US will simply move their operations to a country that has no such law. Unless the US starts imposing tariffs on foreign email (which would require changes to SMTP and would otherwise suck), this is neither feasible nor possible. Any legislation like this is essentially doomed to failure.

nerdbert wrote:

BTW: Has anybody heard of the (sadly win only) tool spamnet? It's a plugin for outlook. Its users send a fingerprint of their spam to a database, so whenever another user recieves the very same mail it will be deleted right away. Worked better than any bayes for me and I didn't have to worry about false positives...

This is commercial now, and has a new name... in typical fashion for anything that starts out free for Windows (ends up commercial).

However, for that system to even be effective they have to have people constantly maintaining it, if anyone hopes to avoid false positives. Otherwise everybody who is on a mailing list will run into problems, for example. I'm not as impressed by the technology used by that system as I am bayesian. With Bayesian, if you feed it right it will treat you right... even on lesser bayesian engines than bogofilter.

Additionally, the spam sent nowadays is specifically written to get around Spam Inspector/SpamNet/etc. Each email is sent to each recipient separately, with some random garbage buried in the email to make it unique.

nerdbert wrote:
I'm just curious Absinthe, what is your favorite brand?

This is a bit of a moving target. The problem: nobody these days truly makes absinthe how it used to be made. There's no real standard for it... and consistency is often a problem (one bottle may be good, the next may be bad...etc)

Deva (Spain) is probably the best overall brand going; it's perhaps also the most consistent, without breaking your bank account.

I haven't tried my hand at homebrewing yet.
Back to top
View user's profile Send private message
nerdbert
l33t
l33t


Joined: 09 Feb 2003
Posts: 981
Location: Berlin

PostPosted: Tue Jun 03, 2003 1:07 am    Post subject: Reply with quote

absinthe wrote:

Oh man, I wish it were that simple...


me too! But I also believe there is no other way to diminish the amount of spam which is being sent every day.
absinthe wrote:


Here in Connecticut (USA), we have a state-government-sponsored "no-call" list when it comes to telephones. That means if you notify them that you don't want calls from telemarketers, you will be placed on a list, and telemarketers must check this list before attempting to market any products or services to residents in Connecticut. If by chance you receive calls despite being on the "no-call" list, each resident can complain and the district attorney's office will pursue any company that did not comply.

Surprisingly it is effective. I never receive any telemarketing phone calls -- except for "non-profit" companies asking for donations to organizations such as the Firefighter's Victim's Fund, etc. Apparently those get through a loophole in the legislation.


I get your point, but I must admit that I'm not that much into this subject, because telemarketers don't show up that often in my country (thanks to our privacy rights). I actually never talked to one.
However, I don't think that those problems are related. On the one side we have a medium which enables everybody to send thousands of messages without any costs. On the other hand we have something which requires you to employ people which actually talk to potential customers. I know this is a problem in the US right now, but I believe that telemarketing will disappear soon (because everybody hates it and companies will realize that it annoys everyone and won't get them any profit).
Telemarketing is a local problem whereas spam is a global problem (just recieved ~20 latin american spam mails - which makes no sense, because I don't speak Spanish BTW)

absinthe wrote:


Something like this email problem is three-fold.

  1. it reveals some deficiencies in the SMTP protocol to prevent UCE.
  2. it reveals the general lack of courtesy in our society during these times.
  3. it reveals the extent of the buying & selling of personally-identifiable information among companies.

Unfortunately, I doubt the federal government to do anything right. The feds already legislate more than they should, and what they do they typically screw up in some way. Chiefly I don't trust them to put through a bill that should be relatively simple without adding in a bunch of pork. Many times, this pork will hurt citizens by either reducing the privacy rights or trashing basic freedoms.


I forgot to mention it in my last post - another way to reduce spam would be a new 'secure' version of smtp. It would IMO also be great to force everybody to have a certificate, which determines his identity. I believe your privacy isn't touched by letting the one who recieves your e-mail know who you are (I also think it is a natural right to know to whom you talk to)
I don't think this is caused by a lack of courtesy in one society, because this is a problem of many individual societies which don't share common values.
absinthe wrote:

At least with telephones, it's fairly easy to track down who the telemarketer was. With UCE and the current state of the SMTP protocol, it may be impossible to determine who exactly was responsible for your spam. Very difficult to enforce anything without sufficient evidence linking them to the crime -- or more accurately -- removing any doubt it could be someone else who committed the crime.

Secondarily, as you mentioned -- any telemarketer who actually gets pursued by the law in the US will simply move their operations to a country that has no such law. Unless the US starts imposing tariffs on foreign email (which would require changes to SMTP and would otherwise suck), this is neither feasible nor possible. Any legislation like this is essentially doomed to failure.

I don't know on which computer the spammer sent his waste into the net, but every spammer wants to make money... if you pretend that you want a penis enlargement the spamer is forced to reveal his identity in order to get the deal.

Spam actually damages the economy, so I see a good point for many nations to enforce strong anti spam laws. It should be legal to sue someone who resides in your country even if he used a smtp server which resides in Taiwan.
absinthe wrote:

This is commercial now, and has a new name... in typical fashion for anything that starts out free for Windows (ends up commercial).

However, for that system to even be effective they have to have people constantly maintaining it, if anyone hopes to avoid false positives. Otherwise everybody who is on a mailing list will run into problems, for example. I'm not as impressed by the technology used by that system as I am bayesian. With Bayesian, if you feed it right it will treat you right... even on lesser bayesian engines than bogofilter.

I must agree that their approach isn't impressive at all, but it is still quite effective. I'm receiving heaps of newsletters every day, but I have never seen one of them in my junk while using spamnet.

absinthe wrote:

Additionally, the spam sent nowadays is specifically written to get around Spam Inspector/SpamNet/etc. Each email is sent to each recipient separately, with some random garbage buried in the email to make it unique.

SpamNet's way of fingerprinting proofed to be quite secure for me (when I was still using win :oops: ). Especially spam which uses terms like "O v e r w e i g h t" seem to cause Bayes serious trouble. On SpamNet someone trustworthy (they've got a rating system) just declares this mail to be spam and nobody else is bothered with it.
I know their stuff just became commercial right now (whis is quite sad), but their beta version is still available for free.

absinthe wrote:

Deva (Spain) is probably the best overall brand going; it's perhaps also the most consistent, without breaking your bank account.

I haven't tried my hand at homebrewing yet.

Deva is quite popular in europe, but my favorite is Ulex. They make this stuff in a manner like it used to be - they even have a freaking ratio of >30mg/kg alc. thujon(don't know how to put this into your measures, but this is as much as you can get over here - Deva has ~3 mg/kg alc.)
Do you know of anybody who has ever successfuly homebrewn this stuff? I didn't even know that it was possible. Please let me know if you have a link to some instructions.
Back to top
View user's profile Send private message
jkcunningham
l33t
l33t


Joined: 28 Apr 2003
Posts: 649
Location: 47.49N 121.79W

PostPosted: Tue Jun 03, 2003 2:52 am    Post subject: Reply with quote

The sad truth of the matter is that both spam and telemarketing do pay off. As you have said, spam costs almost nothing. Telemarketing costs very little. I met a girl recently who was a telemarketer. I asked her why she did it. She said it was a job - paid her share of the rent (barely). She had a stoical personality, didn't like to bother people and said she wasn't persistent with those who said no. I couldn't work up the ire to savage her personally, although there's many the time I've savaged telemarketers who've called my home. I used to have an unlisted number for only that reason. But the telemarkers finally figured out they could reach people like me with sequential dialers. The girl said that a small but surprising number of people like talking to them - probably because they're lonely, maybe shut-in, maybe disfunctional, whatever. But some of these reward conversation with a purchase, and apparently more than enough to make it profitable for the companies who use the services.

And spam has almost no costs associated with it. I used to think no one would actually go for anything from spam, but we've all seen that Nigerian email, right? The one where there's a large sum of money left over from some fallen government that needs to get laundered in the US and they're very polite and going to give someone from 10 to 20 percent - what would amount to a couple million dollars. I never believed there was anyone in the entire world stupid enough to fall for that so far they'd actually pay money into it - not if you disqualify the retarded and insane. Well, I was wrong. I heard on the radio recently that some prosecuting attorney in New York was trying to go after someone involved in that scam after a divorced, middle-aged woman lost over twenty-thousand dollars. Twenty-thousand dollars. Now think about that. Some con-artist hit the frickin' jackpot. That story will make spammer history. The guy's probably a cult hero. In psychology they call it "unscheduled reinforcement" - unpredictable sporadic (large) rewards are much more powerful motivators than regular small ones. After a hit like that, do you think these guys will ever quit? I am very pessimistic. The best thing we have going for us is adaptive filtering techniques (like Bayes), and mainly the fact that most of the Windows world spam handling techniques don't, so we are small potatoes and not worth devoting a lot of energy to cracking. The unix user world is not their market.

And the last thing in the world I want to see is the linux spam tools ported to windows.

-Jeff
Back to top
View user's profile Send private message
absinthe
Retired Dev
Retired Dev


Joined: 06 Oct 2002
Posts: 111
Location: San Francisco, CA, USA

PostPosted: Tue Jun 03, 2003 3:37 am    Post subject: Reply with quote

nerdbert wrote:
Deva is quite popular in europe, but my favorite is Ulex. They make this stuff in a manner like it used to be - they even have a freaking ratio of >30mg/kg alc. thujon(don't know how to put this into your measures, but this is as much as you can get over here - Deva has ~3 mg/kg alc.)

Do you know of anybody who has ever successfuly homebrewn this stuff? I didn't even know that it was possible. Please let me know if you have a link to some instructions.


Some recipies here.
Back to top
View user's profile Send private message
Reciclagem
n00b
n00b


Joined: 17 Mar 2003
Posts: 3

PostPosted: Tue Jun 03, 2003 2:53 pm    Post subject: Reply with quote

Would you like to phone call a spammer ?

OK. I asked to the "africa million dolars spam" to call a phone number (police dept.) they answered: We can't call you, "AS I AM NOT IN A BETTER POSITION TO MAKE CALLS"

But they give two telephone numbers, "please call me at":

234-80-34701033
--(234 -> Nigeria-Africa)


If everyone cheat this moth******er spammer, replying their mail, they could lose their benefit.
Back to top
View user's profile Send private message
jkcunningham
l33t
l33t


Joined: 28 Apr 2003
Posts: 649
Location: 47.49N 121.79W

PostPosted: Fri Jul 04, 2003 1:59 pm    Post subject: Reply with quote

Well, for the benefit of anyone who runs across this thread and is evaluating the merits of bogofilter vs spamassassin I have some experimental data to contribute. Some of the extreme hyperbole about the significant superiority of bogofilter induced me to install it a month ago. I put it in FRONT of spamassassin in my procmail stream, so it got first crack at the spam. I trained it on my entire mail collection - the same mail spamassassin trained on. This includes over over a thousand spam emails and tens of thousands of non-spam emails. And, yes, I know how to feed it the one vs the other.

After running bogofilter for a month here's what I found: Bogofilter has a false negative rate that is about four times as high as spamassassin used to have. That is, it lets about four times as much spam through. The false-alarm rate is about the same (very low).

Bogofilter may be faster, but it sure as hell doesn't work better. Of the spam that it let through, spamassassin caught all but three (the vast majority). The overall rate using the both together is actually better than using either one separately. I'm not recommending anything like that necessarily. Obviously, if spamassassin is too inefficient for high-volume systems, do what makes sense.

-Jeff
Back to top
View user's profile Send private message
absinthe
Retired Dev
Retired Dev


Joined: 06 Oct 2002
Posts: 111
Location: San Francisco, CA, USA

PostPosted: Fri Jul 04, 2003 2:45 pm    Post subject: Reply with quote

jkcunningham wrote:
Well, for the benefit of anyone who runs across this thread and is evaluating the merits of bogofilter vs spamassassin I have some experimental data to contribute.


This is anecdotal. A badly trained database will give bad results even if you say you trained it correctly. I your so-called evaluation is a convenient vehicle for you to attempt to redeem yourself in this conversation.

You know, instead of just respectfully letting people make up their own minds.
Back to top
View user's profile Send private message
jkcunningham
l33t
l33t


Joined: 28 Apr 2003
Posts: 649
Location: 47.49N 121.79W

PostPosted: Fri Jul 04, 2003 3:55 pm    Post subject: Reply with quote

What are you talking about? Redeem myself? For what? I'm merely offering some statistical - not anecdotal - information. Its more than you have offered - everything you have said HAS been anecdotal. How can it be "badly trained"? I have a large collection of email sorted into spam and non spam. I feed bogofilter all of the spam with -s. I feed it all of the non-spam with -n. As far as I've been able to determine, there is no better way to train it other than insufficiently. What is your problem? Wanted to be a psychoanalyst but couldn't make it?

I'm not trying to make up anyone's mind for them - I'm only offering information.
Back to top
View user's profile Send private message
absinthe
Retired Dev
Retired Dev


Joined: 06 Oct 2002
Posts: 111
Location: San Francisco, CA, USA

PostPosted: Fri Jul 04, 2003 7:13 pm    Post subject: Reply with quote

jkcunningham wrote:
I'm not trying to make up anyone's mind for them - I'm only offering information.


Seems to me that you're trying to turn this into some kind of contest, or moreover something to defend SpamAssassin at every opportunity you get.

Frankly, I don't trust your results. I use bogofilter professionally (through a mail gateway for >500 users) as well as personally and have much more success with it than other spam filters. That's my own anecdotal experience.

However, the moral of this tale is that everybody has to do their own evaluation. Pardon me if I don't agree nor accept your analysis as the "final word" on this thread.
Back to top
View user's profile Send private message
jkcunningham
l33t
l33t


Joined: 28 Apr 2003
Posts: 649
Location: 47.49N 121.79W

PostPosted: Sat Jul 05, 2003 4:15 pm    Post subject: Reply with quote

absinthe wrote:
Seems to me that you're trying to turn this into some kind of contest, or moreover something to defend SpamAssassin at every opportunity you get.


Not at all- While it is true that I entered into this fray defending SpamAssassin from what appeared to me to be a rather heavy-handed slam. All along, my only purpose has been to provide other readers - not you - with possibly useful information. That's supposed to be the purpose of these forums, isn't it?

I have no connection with either one of the apps in question other than as a user. But many - perhaps even most - of the readers of this forum are users, not "IT professionals", so my information may be as relevant or even more so than yours.

But I am also a scientist, and when someone makes a claim that does not agree with my understanding of how something works, I experiment with it until I find out what's going on. That was why I went to the trouble of setting up bogofilter when I already had a stable, functioning spam system that worked well. If something works better, I would switch in a heartbeat. And based on your black and white statements and hyperbole, I was actually expecting bogofilter to significantly outperform spamassassin. Had it done so, I would have said so and thanked you. It did not.

I didn't leave it like that because I figured that anyone following this thread might benefit from my experience. Mine, at least, is comparative. Yours, for all your appeals to pyschology, authority and abuse - is still an anecdotal. I don't care if you handle 500,000 users and are president of AOL, if it lets through 3-4 times as much spam, its still not working as well from the standpoint of the users. You may have very good reasons for preferring it. But other than its purported speed, I don't know what they are.

In the interests of getting at the truth, I will point out a criticism of my own findings: they rely on an ergodic assumption at this point. That is, since my comparison involves two different intervals in time (the period where I used only spamassassin, and the period after I installed bogofilter), if the statistical nature of the spam (or ham) changed significantly between those times, my results could be misleading. Now, I controlled for the steady increase in spam through both intervals, but did nothing to examine or account for any differences in the statistical nature of the spam itself (the 'ham' hasn't changed). But I doubt it is significant, if there at all.

I don't care if you trust my results or not. And as for others, here's a simple experiment you can run on your own systems to decide for yourselves (something I am always in favor of): install them both. Train them both on the same collections of mail (spam,ham). Then run all new mail through them both, side by side, collecting the spams into separate mailboxes. Then wait and watch and count up the numbers. Use what works best.

Or - since you already have them both installed - put them in series as I ended up doing, and watch your spam almost completely disappear. Works great.

-Jeff
Back to top
View user's profile Send private message
absinthe
Retired Dev
Retired Dev


Joined: 06 Oct 2002
Posts: 111
Location: San Francisco, CA, USA

PostPosted: Sat Jul 05, 2003 6:23 pm    Post subject: Reply with quote

jkcunningham wrote:
I didn't leave it like that because I figured that anyone following this thread might benefit from my experience. Mine, at least, is comparative. Yours, for all your appeals to pyschology, authority and abuse - is still an anecdotal.

It's amusing how you (on one hand) try to insult me, and then (on the other) try to project the appearance of a reasonable "scientist", all in the same message. Any halfwit with a computer can jump on an internet forum and call themselves whatever they want.

Maybe you are a scientist; it doesn't matter. What matters is your content, not your job title. You don't need to know the job titles I've had the last 14 years in IT, and I don't need to know yours.
jkcunningham wrote:
I don't care if you handle 500,000 users and are president of AOL, if it lets through 3-4 times as much spam, its still not working as well from the standpoint of the users. You may have very good reasons for preferring it. But other than its purported speed, I don't know what they are.

  • I used SpamAssassin for well over a year in the same environments I now use Bogofilter instead. For a fair representation, I have run both in a personal environment and a professional one, so that there are two different sets of requirements to judge both by.
  • I get less false positives, and particularly less false negatives using Bogofilter after some time; compared to my results of using SpamAssassin for >1 year.
  • Timeliness of results depend on how much spam you actually get. I get tons, therefore my database fills rather quickly. With users who don't get as much spam, it could take 30 days or more of ham & spam sorting before you see the false negatives drop off entirely.

    I have not seen any false negatives since April. My personal mailboxes (aggregated) average a few hundred spams per day. My personal spam database is rather small...
    Code:

                           spam   good
    .MSG_COUNT              574   2556

    As you can see, about a 5:1 (good:bad) ratio exists. I fed it an initial load of ham (which I have not since given it any more ham). Since then, I've just fed it whatever occasional spam gets through, which on my personal mail has not happened since April.

    Professionally, the spam database is larger, but not significantly (spam:user count is much lower). We had a mix of false negatives and positives in the first 3 months, but since then the spam problem has gone away almost entirely. I'd say 5-6 emails per week, across >500 users, slip through as spam. We no don't get any false negatives, and the system is incredibly efficient in processing a large volume of data.
    Code:

                           spam   good
    .MSG_COUNT              5413   228

    As you can see the ham:spam ratio is completely the opposite in a corporate environment. But bogofilter does the job either way.
  • SpamAssassin will work better in the short-term. This is not anecdotal, unless you happen to have saved all of your spam for the last 3-4 months and have it ready to feed Bogofilter when you initialize the database.
  • Bogofilter works better in the long term. Sure, this is anecdotal (what isn't?) but humor me. It is significantly faster (an order of magnitude), and scales to meet the needs of a mail gateway-- which SpamAssassin absolutely cannot do. Bogofilter will produce less false positives, which SpamAssassin has a tendency to do with Bayes turned on or off. If it is trained correctly, Bogofilter produces less false negatives over time.
  • Bogofilter has the same weakness that all bayes engines do: bad data in the database will skew your results. It's like water in the gas tank. I have heard various theories as to the right ratios of ham:spam to produce the best results, but I don't think there is any basis in fact. It's simple GIGO. People who are careless when populating the database will get bad results and shouldn't be using a bayesian filter.
  • While it is true that SpamAssassin has a bayes filter, and it works fine, it's not SpamAssassin's core expertise and it doesn't work as well as Bogofilter.

Anyway.

The purpose of this thread, among other things, was to point out the tools available to users. There is a subset of SpamAssassin users who want it to be everyone's default choice. Depending on what your needs are, there are better alternatives.

My conclusions:
  1. SpamAssassin is a decent product. It suits the needs of people who want to drop in a spam filter, and not think about it too much. However- it's a resource hog, and not very effective on a large, varied volume of spam. You can use the bayes engine in SpamAssassin, but if you were to go through the effort to use a bayesian filter, you should just use bogofilter instead.

    That is:

    • Switch to Bogofilter entirely (probably the best), or
    • Put bogofilter behind SpamAssassin for 30-60 days so it can learn the basics. Then put bogofilter in front of SpamAssassin, so that SpamAssassin doesn't need to burn CPU on messages bogofilter can identify more efficiently as spam.
  2. Among Bayesian engines, Bogofilter produces the best results and probably the most efficient.
  3. Bogofilter improves with time and quality data. If you teach Bogofilter when it makes a mistake, it will learn and not make the same mistake again. As time goes on, the probability of Bogofilter making another mistake approaches zero.


Final thought: Anybody can evaluate SpamAssassin in one day. Most will come to the same "works, but slow" conclusion. Bogofilter needs time and care to be evaluated properly, as any purely bayesian filter does. The amount of time really depends on the user, the volume of spam, etc.

Eventually Bogofilter should match SpamAssassin's "day one" results and surpass them. But it requires time. Many people use SpamAssassin as the "training wheels"; once they switch off, I don't know one person who has gone back. The long-term results are amazing.

You can get similar long-term results if you use SpamAssassin with the bayes stuff turned on, sure. But why would you choose that when a much lighter, faster and more effective bayes solution exists?
jkcunningham wrote:
Use what works best.

Certainly we find agreement here.
Back to top
View user's profile Send private message
funkmankey
Guru
Guru


Joined: 06 Mar 2003
Posts: 304
Location: CH

PostPosted: Sun Jul 20, 2003 12:06 am    Post subject: two more ways Reply with quote

1. bubblegum proxypot
2. jackpot

jackpot is java-based but only recommended to run on windoze. played with it back when I used to run the beast, but never long enough for success.

proxypot is a perl script. have been trying it recently and it has been highly amusing...
Back to top
View user's profile Send private message
absinthe
Retired Dev
Retired Dev


Joined: 06 Oct 2002
Posts: 111
Location: San Francisco, CA, USA

PostPosted: Sun Jul 20, 2003 1:05 am    Post subject: Re: two more ways Reply with quote

funkmankey wrote:
1. bubblegum proxypot
2. jackpot

jackpot is java-based but only recommended to run on windoze. played with it back when I used to run the beast, but never long enough for success.

proxypot is a perl script. have been trying it recently and it has been highly amusing...


These are good suggestions. I have not used them myself but I know others who have. Along those same lines, I have run honeypots and tarpits, etc... and it is fun.
Back to top
View user's profile Send private message
Senso
Apprentice
Apprentice


Joined: 17 Jun 2003
Posts: 250
Location: Montreal, Quebec

PostPosted: Mon Dec 01, 2003 1:02 pm    Post subject: Reply with quote

jkcunningham wrote:
Sure they can - but most of them don't, I'll warrant. Most don't even know about it, because their primary mark is a windows client. But you start throttling their pipeline and their going to figure it out damn fast, I guarentee it. Every one of them.


Actually, statistics show that 43% of spammers use Linux. And only 0.7% use Windows.

I use Proxypot to fool spammers into thinking I'm an open relaying proxy. I've blocked thousands of spam emails so far.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Off the Wall All times are GMT
Goto page Previous  1, 2
Page 2 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum