I’m currently trying to get the bayesian filter included in nethserver’s mail server to work. Here are some notes and questions :
First of all, it should be written somewhere in the doc that spam assassin’s Bayes database requires to learn at least 200 spams AND 200 hams before beginning to filter.
To know the status of the bayes database, log yourself as amavis (su -- amavis) then do spamassassin -D --lint and look for bayes related entries :
Dec 9 14:07:37.639 [30801] dbg: bayes: found bayes db version 3
Dec 9 14:07:37.640 [30801] dbg: bayes: DB journal sync: last sync: 0
Dec 9 14:07:37.640 [30801] dbg: bayes: not available for scanning, only 4 ham(s) in bayes DB < 200
Dec 9 14:07:37.640 [30801] dbg: bayes: untie-ing
While training for spam is not too difficult (just copy 200 of your own spams into the spam folder through the IMAP server) (just mark 200 spams using the “mark as spam” function of your mail client, copying a mail into the junk folder is not possible) making the spam filter learn 200 hams looks more difficult, since (following the documentation) the only way to mark a mail as ham is to get it out of the junk folder. I seriously doubt that any end user will accept that 200 hams gets into its spam box without yelling at the sysadmin.
One could try to copy 200 hams (from inbox) to the spam folder and then get it back to the inbox but that’s counter intuitive, and I’m not sure it would work anyway (mail marked learned as spam then as ham)
My proposition : configuring the INBOX folder as ham, plain and simple. That’s how Fastmail does. They even provide an way for the user to decide which folder should be learned as spam or not spam.
First I did that on the INBOX folder. It took a long time, and then I noticed that there were timeouts in maillog when mail were received by postfix.
Therefore I simply took the last 200 mails from my INBOX, copied them into a HAM folder and made spamassassin make its thing on this folder. It works perfectly.
The results are immediately noticeable, the filtering looks much more effective.
I believe that there should be a “Bayesian filter” section in the mail filter tab in the GUI, that help a user to do this without too much hassle.
Please forgive my lazyness… Bayesian filter and how to use it is clearly written on documentation?
Maybe some procedures for “enable spam classification” from Webtop could be useful…
As a matter of a fact, the prerequisites (200 spams AND 200 hams) aren’t written anywhere.
Marking 200 hams must be done from the command line. I’ll update it anyway.
Only in the junk folder. Users have to manually mark 200 messages as ham. That means that in practise they have to mark 200 mails as spam and then mark them back as not spam. It is counter-intuitive. But I updated the documentation
Well, command rspamc learn_ham /var/lib/nethserver/maildir/user@domain/Maildir/cur
should be delivered as soon as possible into documentation for help mail domain migration…
Maybe a “Best Practice” page or “checklist for a nice migration” could be the right way to summarize a way to migrate or create a good mail server.
Unfortunately my english is not sleek and catchy enough for being concise, clear and effective.
This should include (IMVHO)
Data collect for get all the informations needed
Steps to be done to install, add modules, configure data, users and aliases
Steps for migrate mailboxes and preferences (import from files, POP3/IMAP collect, IMAP transfer via client)
Steps for protect server (antivirus check, fail2ban setup, password set)
Steps for start data transfer to the real world (DNS, MX Record, port forwarding)
Steps for improve reputation for not being marked as spam (TLS Certificate, SPF, DKIM, DMARC)
Steps for improve SPAM detection
This should be a “scenario” detailed enough to help conscious and skilled enough sysadmins (even junior ones) to have the full toolbox for adapt it to their environment.
Also, concise enough to be a four A4 printed sheets (plus one for data collect) to kickstart any installation.
(the document should be unrelated to groupware or webmail, but also suggest to look for both of them before take a direction or another)