SPAM training treshold

rolf · December 28, 2018, 11:51am

Hi all,

I use rspamd for spamfiltering. But I’m not happy with it’s work.
I’ve never had any false-positives, and rarely false negatives, but since rspamd both happen very often. (and false positives are the worst, since my (family) users don’t check their spam-folder too often.)

ofcourse, I try to teach rspamd by moving mail to and from spam-folder.
In the docs I read:

The Bayesian tests **are not active until it has received enough information. This includes a minimum of 200 spams AND 200 hams (false positives).

The rspamd webinterface shows me a counter of learned mails, but not specified to spams and hams. Is there a way to check if I already meet the 200+200 requirement?

And will the learning make a big difference? Or is it just a very minor adjustment?

Thanks,
Rolf

filippo_carletti · December 28, 2018, 2:03pm

The common feedback is the opposite: rspamd works quite well.
Anyway, bayesian filters will adapt to your emails, it may prove really helpful.
You could use rspamc stat to see how many spam and ham has been learned (see the last two lines).

You could also look at wrongly classified emails headers to see if some rules fire frequently.
Finally, not the easier tool, rspamd_stats </var/log/maillog could help to understand what’s happening.

stephdl · December 28, 2018, 2:57pm

I tend to confirm, but I could be biased, what are your settings please

config show rspamd

rolf · December 28, 2018, 9:52pm

I haven’t changed much on the defaults:

root@helium:~> $ config show rspamd
rspamd=service
BlockAttachmentClassList=Exec
BlockAttachmentCustomList=doc,odt
BlockAttachmentCustomStatus=disabled
BlockAttachmentStatus=enabled
Password=
RecipientWhiteList=
SenderBlackList=
SenderWhiteList=
SpamCheckStatus=enabled
SpamGreyLevel=
SpamKillLevel=20
SpamSubjectPrefixStatus=enabled
SpamSubjectPrefixString=[SPAM]
SpamTag2Level=5
VirusAction=reject
VirusCheckStatus=enabled
VirusScanOnlyAttachment=false
VirusScanSize=20000000
status=enabled

I’v reached the 200+ hams, but not the 200+ spams. So, I’m still not experiencing the benefits of it:

root@helium:~> $ rspamc stat
Results for command: stat (0.030 seconds)
Messages scanned: 6411
Messages with action reject: 2441, 38.08%
Messages with action soft reject: 0, 0.00%
Messages with action rewrite subject: 826, 12.88%
Messages with action add header: 3, 0.05%
Messages with action greylist: 0, 0.00%
Messages with action no action: 3141, 48.99%
Messages treated as spam: 3270, 51.01%
Messages treated as ham: 3141, 48.99%
Messages learned: 401
Connections count: 0
Control connections count: 122
Pools allocated: 9462
Pools freed: 9430
Bytes allocated: 18.92M
Memory chunks allocated: 300
Shared chunks allocated: 18
Chunks freed: 0
Oversized chunks: 113
Fuzzy hashes in storage “local”: 0
Fuzzy hashes in storage “rspamd.com”: 423377288
Fuzzy hashes stored: 423377288
Statfile: BAYES_SPAM type: redis; length: 2.08M; free blocks: 0; total blocks: 54.84k; free: 0.00%; learned: 128; users: 1; languages: 0
Statfile: BAYES_HAM type: redis; length: 6.12M; free blocks: 0; total blocks: 161.00k; free: 0.00%; learned: 274; users: 1; languages: 0
Total learns: 402