K9 efficiency against SPAM

As you probably already knw, I am also the adminstrator of a web site fighting against SPAM (SpamAnti.net). On this site, I have been promoting actively the use of a statistical mail filtering tool to sort SPAM out of your mailbox: K9 from Robin Keir.

But, a lot of us who tested some of the available tools (some of them are sold at very high prices) quickly observed that their efficiency is often quite poor. Sometimes surprisingly bad. So, how’s K9 going? Very well.

How does it work?

We must first understand how such a tool operates. It is installed between the mail server of your Internet Service Provider (ISP) and your email software reader. It runs on your PC anq requires a small modification of the configuration of your mail reader, then it works silently without help.

Or nearly without help, since during the first days of use, you have to show it the right email messages and the bad SPAM messages (it has some priori knowledge but it is quite crude). Then, it will draw its own conclusions on the criterias to use to clean up your mail box.

Efficiency

To be precise, K9 includes its own statistics collection and reporting. This way it is easier to see clearly its efficiency level. You have it all in front of your eyes.

then, there are two important measurements to be done:

efficiency while continuously operating
efficiency during the training phase (normally, it should be reduced)

For the continuous operation, I only had compliments and kudos during the past: After months of electronic mail (more than 100,000 emails in my mailboxes) the result speaks for itself: 99.95% minimum while I receive SPAMs from absolutely all kinds of origins.

The issue was still open during the training/learning phase (I did not measure it before). But, I recently rebuilt part of my PC email architecture and had to re-install completely K9; So, I could easily observe quietly a full training again. And the good news is that… out of 2994 emails (in a day… Boy! am I spammed?), I had to sort again 9 messages re-classified to “Good” and 5 messages re-classified to “Bad” (or SPAM). this leads to a nice 99,53% efficiency and the training phase is visibly finished well before 1500 received messages.

Of course, I would not advise to then close your eyes completely. But K9 provides an easy way to come back in the past and to identify wrongly classified messages (this is always a possibility): It sorts messages according to their probability of being SPAM (0% is Good, 100% is bad or SPAM). You just have to look at messages near the cut-off limit of 50%. Wrongly identified messages tend to group around that limit and are very visible there.

In the image above (click to enlarge), K9 clearly shows the legal message (blue and with a 2.5% probability of being SPAM) and SPAM messages (in black, with probabilities much higher than 50%).

For non-English speakers

It may be interesting to know that even if the web site of the author, Robin Keir, is in English, and I use the English default version, there are extensions that can be used to translate it into various European languages.

And it is FREE! As in free beer.

How does it work?

Efficiency

For non-English speakers

Comments

Leave a Reply