Recently bogofilter's spam sensing abilities seem to have gone all wrong in evolution.
I was getting way too much span in the inbox (maybe 10% of my spam wasn't getting detected) and even though I was highlighting it and marking it as spam the same sorts of messages kept appearing.
So I
1. Stopped evolution getting mail (using the offline icon) 2. deleted all my spam and emptied the trash 3. closed evolution (and ran evolution --force-shutdown). 4. moved ~/.bogofilter to ~/.bogofilter.orig 5. restarted evolution and went back online.
For a while, bogofilter seemed to be doing a better job filtering spam (much less coming through to the inbox).
However, having a look through the spam, I notice that all my fedore-(test|devel)-list email and a whole bunch of other email is getting filtered as spam. I highlight these and mark and not spam.
Now I'm back where I started.
When I first started using bogofilter is needed a little training (including having put non-spam in the spam box) but after a couple of days it settled and was quite good (maybe 1% of spam got into my inbox and very rarely a message I wanted went to spam), so I'm not sure what's gone wrong.
Are other noticing the same issues?
I prefer bogofilter over spamassassin as the latter takes forever to filter through email, especially when you've been on holidays for a week and have to pull a couple of 1000 messages.
R.
On Thu, 2009-06-04 at 20:01 +1000, Rodd Clarkson wrote:
Recently bogofilter's spam sensing abilities seem to have gone all wrong in evolution.
I was getting way too much span in the inbox (maybe 10% of my spam wasn't getting detected) and even though I was highlighting it and marking it as spam the same sorts of messages kept appearing.
So I
- Stopped evolution getting mail (using the offline icon)
- deleted all my spam and emptied the trash
- closed evolution (and ran evolution --force-shutdown).
- moved ~/.bogofilter to ~/.bogofilter.orig
- restarted evolution and went back online.
For a while, bogofilter seemed to be doing a better job filtering spam (much less coming through to the inbox).
However, having a look through the spam, I notice that all my fedore-(test|devel)-list email and a whole bunch of other email is getting filtered as spam. I highlight these and mark and not spam.
Now I'm back where I started.
When I first started using bogofilter is needed a little training (including having put non-spam in the spam box) but after a couple of days it settled and was quite good (maybe 1% of spam got into my inbox and very rarely a message I wanted went to spam), so I'm not sure what's gone wrong.
Are other noticing the same issues?
I prefer bogofilter over spamassassin as the latter takes forever to filter through email, especially when you've been on holidays for a week and have to pull a couple of 1000 messages.
Experienced everything you did, to include the marking my Fedora messages as spam as well. Just doesn't seem bogofilter and/or evo is not working together like they did in F10.
I thought I was the only one experiencing this.
On Thu, 2009-06-04 at 05:05 -0500, Mike Chambers wrote:
On Thu, 2009-06-04 at 20:01 +1000, Rodd Clarkson wrote:
Recently bogofilter's spam sensing abilities seem to have gone all wrong in evolution.
I was getting way too much span in the inbox (maybe 10% of my spam wasn't getting detected) and even though I was highlighting it and marking it as spam the same sorts of messages kept appearing.
<snip>
Are other noticing the same issues?
I prefer bogofilter over spamassassin as the latter takes forever to filter through email, especially when you've been on holidays for a week and have to pull a couple of 1000 messages.
Experienced everything you did, to include the marking my Fedora messages as spam as well. Just doesn't seem bogofilter and/or evo is not working together like they did in F10.
I thought I was the only one experiencing this.
Filed as: https://bugzilla.redhat.com/show_bug.cgi?id=504112
R.
On Thursday 04 June 2009 11:26:11 Rodd Clarkson wrote:
On Thu, 2009-06-04 at 05:05 -0500, Mike Chambers wrote:
On Thu, 2009-06-04 at 20:01 +1000, Rodd Clarkson wrote:
Recently bogofilter's spam sensing abilities seem to have gone all wrong in evolution.
I was getting way too much span in the inbox (maybe 10% of my spam wasn't getting detected) and even though I was highlighting it and marking it as spam the same sorts of messages kept appearing.
<snip>
Are other noticing the same issues?
I prefer bogofilter over spamassassin as the latter takes forever to filter through email, especially when you've been on holidays for a week and have to pull a couple of 1000 messages.
Experienced everything you did, to include the marking my Fedora messages as spam as well. Just doesn't seem bogofilter and/or evo is not working together like they did in F10.
I thought I was the only one experiencing this.
Filed as: https://bugzilla.redhat.com/show_bug.cgi?id=504112
Spammers are getting a lot more clever/careful these days, using words that won't be detected. I've found that I have to collect spam and 'unsure' ham into folders until I get a reasonable number, then every few days I run
bash /usr/share/bogofilter/contrib/contrib/trainbogo.sh -c -H /home/anne/Maildir/.INBOX.bogotrain_ham/cur/ -S /home/anne/Maildir/.INBOX.bogotrain_spam/cur/
(watch for line-wrap - it's all one line), repeating until the missed spam is down to about 3. I then delete all the tested messages and collect the next batch. I'm still seeing a number of unsures, but bogofilter is definitely learning the new stuff.
If you are seeing ham messages being detected as spam, copy a large number of similar messages, for instance mailing-list messages, into your ham testing folder before the run. Doing this a few times should sort out any mis-training already there. HTH
Anne
On Thu, 2009-06-04 at 12:13 +0100, Anne Wilson wrote:
On Thursday 04 June 2009 11:26:11 Rodd Clarkson wrote:
On Thu, 2009-06-04 at 05:05 -0500, Mike Chambers wrote:
On Thu, 2009-06-04 at 20:01 +1000, Rodd Clarkson wrote:
Recently bogofilter's spam sensing abilities seem to have gone all wrong in evolution.
I was getting way too much span in the inbox (maybe 10% of my spam wasn't getting detected) and even though I was highlighting it and marking it as spam the same sorts of messages kept appearing.
<snip>
Are other noticing the same issues?
I prefer bogofilter over spamassassin as the latter takes forever to filter through email, especially when you've been on holidays for a week and have to pull a couple of 1000 messages.
Experienced everything you did, to include the marking my Fedora messages as spam as well. Just doesn't seem bogofilter and/or evo is not working together like they did in F10.
I thought I was the only one experiencing this.
Filed as: https://bugzilla.redhat.com/show_bug.cgi?id=504112
Spammers are getting a lot more clever/careful these days, using words that won't be detected. I've found that I have to collect spam and 'unsure' ham into folders until I get a reasonable number, then every few days I run
bash /usr/share/bogofilter/contrib/contrib/trainbogo.sh -c -H /home/anne/Maildir/.INBOX.bogotrain_ham/cur/ -S /home/anne/Maildir/.INBOX.bogotrain_spam/cur/
(watch for line-wrap - it's all one line), repeating until the missed spam is down to about 3. I then delete all the tested messages and collect the next batch. I'm still seeing a number of unsures, but bogofilter is definitely learning the new stuff.
If you are seeing ham messages being detected as spam, copy a large number of similar messages, for instance mailing-list messages, into your ham testing folder before the run. Doing this a few times should sort out any mis-training already there. HTH
Thanks Anne,
Sadly, I'm not feeling like being manual about this, and I guess that I just expect my mail client to work well with the spam software and do it for me. After all, my mail client has a great collection of ham and spam so if I can do something like it manually, then surely it can't be hard for the spam software to do it without me having to thing about it.
bogofilter used to work well, and I'm hoping that it can once again be the great spam filter it was, fast and accurate.
R.
On Thursday 04 June 2009 12:53:25 Rodd Clarkson wrote:
Sadly, I'm not feeling like being manual about this, and I guess that I just expect my mail client to work well with the spam software and do it for me.
In the end, it's your decision. But just stop for a moment and think about what you wrote. Is it reasonable to expect any software to learn about new spam methods, which change almost on a daily basis, without any help from you. My training takes me about 3 minutes per week, if that, and in return I get greater accuracy. A good return on my effort, IMO.
After all, my mail client has a great collection of ham and spam
How does it know which is which, if you never tell it? That's all the training command does. It says "when you see one roughly like this, it's ham" and vice versa. You said that there are a few false-positives, so it needs to be told about those, for instance.
so if I can do something like it manually, then surely it can't be hard for the spam software to do it without me having to thing about it.
bogofilter used to work well, and I'm hoping that it can once again be the great spam filter it was, fast and accurate.
Unless you can persuade all spam creators to stop here and now, it will not happen without your help. You need to see this as a matter of self-preservation.
Anne
On Thu, 2009-06-04 at 13:15 +0100, Anne Wilson wrote:
On Thursday 04 June 2009 12:53:25 Rodd Clarkson wrote:
Sadly, I'm not feeling like being manual about this, and I guess that I just expect my mail client to work well with the spam software and do it for me.
In the end, it's your decision. But just stop for a moment and think about what you wrote. Is it reasonable to expect any software to learn about new spam methods, which change almost on a daily basis, without any help from you. My training takes me about 3 minutes per week, if that, and in return I get greater accuracy. A good return on my effort, IMO.
It's fairly easy to automate, anyway - just write a cron job that iterates over the contents of your spam folders and your not spam folders every so often, running the appropriate training command.
On 06/04/2009 07:53 AM, Rodd Clarkson wrote:
Sadly, I'm not feeling like being manual about this, and I guess that I just expect my mail client to work well with the spam software and do it for me. After all, my mail client has a great collection of ham and spam so if I can do something like it manually, then surely it can't be hard for the spam software to do it without me having to thing about it.
bogofilter used to work well, and I'm hoping that it can once again be the great spam filter it was, fast and accurate.
I can't speak specifically to the integration of bogofilter into evolution, since I use a completely home-grown bogofilter integration, which, as shown here http://stuff.mit.edu/%7Ejik/#spam, successfully blocks thousands of spam messages and viruses per day.
However, I do want to reiterate what Anne said. She's right that the spammers are getting smarter. They're /always/ getting smarter -- it's a constant battle for the anti-spammers to keep up with the new ideas that the spammers come up with. Therefore, what worked well enough is no longer good enough.
For bogofilter to be most effective, here's what needs to happen:
1. Incoming email needs to be divided into three categories -- ham, spam, and unsure -- not just into ham and spam. 2. Ham and spam messages needed to be added to the bogofilter database automatically after they are categorized. 3. Unsure messages need to be categorized by the user and then added to the bogofilter database as either ham or spam, depending on the user's categorization. 4. Incorrectly classified messages need to be reclassified when they are detected, e.g., a spam message incorrectly classified as ham needs to first be removed from the database as ham and then added to the database as spam. 5. Bogofilter needs to be tuned periodically using a large collection of known-ham and known-spam messages. 6. The bogofilter database needs to be pruned periodically, i.e., words that haven't been seen in any incoming email in a while (I personally use 180 days as my threshold) need to be removed, preferably before tuning.
All of these are important, but the first four are by far the most important. If the evolution integration doesn't use tristate classification, or if it doesn't make it easy for you to identify and classify unsure messages and reclassify incorrectly classified ones, then it is inevitable that over time, bogofilter's ability to detect spam will degrade.
jik
On Thursday 04 June 2009 13:21:19 Jonathan Kamens wrote:
On 06/04/2009 07:53 AM, Rodd Clarkson wrote:
Sadly, I'm not feeling like being manual about this, and I guess that I just expect my mail client to work well with the spam software and do it for me. After all, my mail client has a great collection of ham and spam so if I can do something like it manually, then surely it can't be hard for the spam software to do it without me having to thing about it.
bogofilter used to work well, and I'm hoping that it can once again be the great spam filter it was, fast and accurate.
I can't speak specifically to the integration of bogofilter into evolution, since I use a completely home-grown bogofilter integration, which, as shown here http://stuff.mit.edu/%7Ejik/#spam, successfully blocks thousands of spam messages and viruses per day.
However, I do want to reiterate what Anne said. She's right that the spammers are getting smarter. They're /always/ getting smarter -- it's a constant battle for the anti-spammers to keep up with the new ideas that the spammers come up with. Therefore, what worked well enough is no longer good enough.
For bogofilter to be most effective, here's what needs to happen:
- Incoming email needs to be divided into three categories -- ham, spam, and unsure -- not just into ham and spam.
- Ham and spam messages needed to be added to the bogofilter database automatically after they are categorized.
- Unsure messages need to be categorized by the user and then added to the bogofilter database as either ham or spam, depending on the user's categorization.
- Incorrectly classified messages need to be reclassified when they are detected, e.g., a spam message incorrectly classified as ham needs to first be removed from the database as ham and then added to the database as spam.
- Bogofilter needs to be tuned periodically using a large collection of known-ham and known-spam messages.
- The bogofilter database needs to be pruned periodically, i.e., words that haven't been seen in any incoming email in a while (I personally use 180 days as my threshold) need to be removed, preferably before tuning.
All of these are important, but the first four are by far the most important. If the evolution integration doesn't use tristate classification, or if it doesn't make it easy for you to identify and classify unsure messages and reclassify incorrectly classified ones, then it is inevitable that over time, bogofilter's ability to detect spam will degrade.
For those who create their own imap server and use procmail, the following tells how to set up bogofilter to work within procmail, automatically filtering into ham, spam and unsure folders. It makes life simple :-)
http://userbase.kde.org/KMail/FAQs_Hints_and_Tips#Spam_filtering_on_an_IMAP_...
Anne