Clamav: result - FAILED with error: "failed to scan and retransmits exceed - score: 0"

Hi there,

Since April 18, clamd seems to choke on a particular email, resulting in rspamd soft rejecting it.

I see that nethserver’s antivirus was auto-updated on April 14, but cannot be sure it’s linked.

The maillog contains this :

Apr 22 11:49:30 cloud rspamd[2397]: <375578>; proxy; rspamd_add_passthrough_result: <6d16dc09-ce12-7e40-4519-f9c96e55cba8@gmail.com>: set pre-result to 'soft reject' (no score): 'Cannot validate the message now. Try again later' from force_actions(0)
Apr 22 11:49:30 cloud rspamd[2397]: <375578>; proxy; rspamd_task_write_log: id: <6d16dc09-ce12-7e40-4519-f9c96e55cba8@gmail.com>, qid: <00CB89E24D>, ip: 209.85.208.48, from: <sender@hidden.be>, (default: F (soft reject): [0.00/20.00] [DATE_IN_PAST(1.00){35;},MANY_INVISIBLE_PARTS(1.00){10;},SPF_REPUTATION_HAM(-0.69){-0.69741185240321;},DMARC_POLICY_ALLOW(-0.50){gmail.com;none;},IP_REPUTATION_HAM(-0.28){asn: 15169(-0.28), country: US(-0.01), ip: 209.85.208.48(0.00);},R_DKIM_ALLOW(-0.20){gmail.com:s=20210112;},R_SPF_ALLOW(-0.20){+ip4:209.85.128.0/17;},MIME_GOOD(-0.10){multipart/mixed;multipart/alternative;text/plain;multipart/related;},MX_GOOD(-0.01){},ASN(0.00){asn:15169, ipnet:209.85.128.0/17, country:US;},CLAM_VIRUS_FAIL(0.00){failed to scan and retransmits exceed;},DKIM_TRACE(0.00){gmail.com:+;},DWL_DNSWL_NONE(0.00){gmail.com:dkim;},FORCE_ACTION_CLAM_VIRUS_FAIL(0.00){soft reject;},FREEMAIL_ENVFROM(0.00){gmail.com;},FREEMAIL_FROM(0.00){gmail.com;},FREEMAIL_TO(0.00){hotmail.com;},FROM_EQ_ENVFROM(0.00){},FROM_HAS_DN(0.00){},HAS_ATTACHMENT(0.00){},HAS_ORG_HEADER(0.00){},MID_RHS_MATCH_FROM(0.00){},MIME_TRACE(0.00){0:+;1:+;2:+;3:+;4:~;5:~;6:~;7:~;...;},PREVIOUSLY_DELIVERED(0.00){recipient@hidden.be;},RCPT_COUNT_THREE(0.00){4;},RCVD_COUNT_THREE(0.00){3;},RCVD_IN_DNSWL_NONE(0.00){209.85.208.48:from;},RCVD_TLS_ALL(0.00){},RCVD_VIA_SMTP_AUTH(0.00){},RECEIVED_SPAMHAUS_PBL(0.00){62.235.236.122:received;},RWL_MAILSPIKE_POSSIBLE(0.00){209.85.208.48:from;},TO_DN_ALL(0.00){},TO_MATCH_ENVRCPT_SOME(0.00){}]), len: 19492102, time: 15276.793ms, dns req: 67, digest: <d008ac0ba39ad244fda1bbb762c76196>, rcpts: <recipient@hidden.be>, mime_rcpts: <(redacted)>, forced: soft reject "Cannot validate the message now. Try again later"; score=nan (set by force_actions)
Apr 22 11:49:30 cloud rspamd[2397]: <375578>; proxy; rspamd_protocol_http_reply: regexp statistics: 0 pcre regexps scanned, 9 regexps matched, 175 regexps total, 79 regexps cached, 0B scanned using pcre, 172.32KiB scanned total
Apr 22 11:49:30 cloud postfix/cleanup[18437]: 00CB89E24D: milter-reject: END-OF-MESSAGE from mail-ed1-f48.google.com[209.85.208.48]: 4.7.1 Cannot validate the message now. Try again later; from=<redacted> to=<redacted> proto=ESMTP helo=<mail-ed1-f48.google.com>
Apr 22 11:49:30 cloud postfix/smtpd[18430]: disconnect from mail-ed1-f48.google.com[209.85.208.48]
Apr 22 11:49:30 cloud rspamd[2397]: <f25f75>; proxy; proxy_milter_finish_handler: finished milter connection
Apr 22 11:50:01 cloud postfix/smtpd[18430]: connect from localhost[127.0.0.1]
Apr 22 11:50:01 cloud rspamd[2397]: <759002>; proxy; proxy_accept_socket: accepted milter connection from /var/run/rspamd/worker-proxy port 0
Apr 22 11:50:01 cloud postfix/smtpd[18430]: NOQUEUE: reject: RCPT from localhost[127.0.0.1]: 504 5.5.2 <localhost>: Helo command rejected: need fully-qualified hostname; from=<redacted> to=<redacted> proto=ESMTP helo=<localhost>
Apr 22 11:50:01 cloud postfix/smtpd[18430]: disconnect from localhost[127.0.0.1]
Apr 22 11:50:01 cloud rspamd[2397]: <759002>; milter; rspamd_milter_process_command: got connection from 127.0.0.1:40162
Apr 22 11:50:01 cloud rspamd[2397]: <759002>; proxy; proxy_milter_finish_handler: finished milter connection
Apr 22 11:50:01 cloud postfix/smtpd[18430]: connect from localhost[127.0.0.1]
Apr 22 11:50:01 cloud rspamd[2397]: <4c62a7>; proxy; proxy_accept_socket: accepted milter connection from /var/run/rspamd/worker-proxy port 0

Gmail sends the mail again every couple of hour. I set rspamd’s log level to debug, hoping to get more information on the problem. I’ll keep you updated.

restarting the server did not help.

There is an issue opened recently with the same kind of problem : Clamav failed to scan, maximum retransmits exceed · Issue #2321 · Mailu/Mailu · GitHub

There are also past discussions on this subject (cc @stephdl)

Meanwhile, I’d like to examine the email in question but cannot find it on the filesystem. Where should I look ?

Thanks for your help.

The offending mail just came again but the debug trace did not reveal anything more. Actually I believe that some lines are missing, there lot of lines “suppressed” by journald. I tried with “info” level.

I noticed that the mail is quite big (20 MB). Maybe clamav takes too much time ? Where can I extend the 15s timeout ?

hum yes if you cannot verify the email by clamav, then you send a try later again

Emails with big attachments that trigger the clamav timeout should be received after a few minutes when the sender retries delivery.
When rspamd stops waiting for an answer from clamav, clamd continues the scan in the background.
After a few minutes, when rspamd asks again to clamav on sender’s retry, it immediately receives an answer.

Having said that, I think that we should increase the default timeout: waiting a few more seconds on first delivery it’s probably better than waiting for the second retry from the sender (some servers retry after 15 minutes or more).

Now the timeout is 15 seconds, I propose 60 or even 90 (SMTP sessions timeout are between 2 and 10 minutes).

If you would like to test a bigger timeout modify /etc/rspamd/local.d/antivirus.conf, line 36, change from:
#timeout = 5;
removing the leading # to:
timeout = 20;

Note: 20 seconds give a timeout of 60, due to the retry number being 2 (see line 37).

Please, let me how it goes.

1 Like

Turns out my assumption was correct : I extended the timeout to 30s / 2 retries (thanks @filippo_carletti) in /etc/rspamd/local.d/antivirus.conf and it looks ok.

Not sure this file isn’t going to be overwritten by the templating system ?

The sender’s server was sensing the mail again and again since some days but it looks like clam never has the time to finish.

The file will be overwritten, see the first line.
Please, use a template-custom, but remember to remove it when the updated package will be released.

This is a bit strange. Can you measure how long it takes to rspamd to check the email?
Save the email to the server and run:

rspamc -t 120 <mail.eml

Here a big email gives:
Results for file: stdin (8.929 seconds)

That’s strange : at first launch, the process timeouts after roughly 20s (didn’t count) :

rspamc -t 120 < 1650637514.M101457P28206.cloud.ecowez.be,S=19230179,W=19494103:2,Scd
Results for file: stdin (20.466 seconds)
[Metric: default]
Action: soft reject
Spam: true
Score: -1.71 / 20.00
Symbol: BAYES_HAM (-3.00)[100.00%]
Symbol: DATE_IN_PAST (1.00)[41]
Symbol: DKIM_TRACE (0.00)[gmail.com:+]
Symbol: DMARC_POLICY_ALLOW (-0.50)[gmail.com, none]
Symbol: DWL_DNSWL_NONE (0.00)[gmail.com:dkim]
Symbol: FREEMAIL_ENVFROM (0.00)[gmail.com]
Symbol: FREEMAIL_ENVRCPT (0.00)[hotmail.com]
Symbol: FREEMAIL_FROM (0.00)[gmail.com]
Symbol: FREEMAIL_TO (0.00)[hotmail.com]
Symbol: FROM_EQ_ENVFROM (0.00)
Symbol: FROM_HAS_DN (0.00)
Symbol: HAS_ATTACHMENT (0.00)
Symbol: HAS_ORG_HEADER (0.00)
Symbol: MANY_INVISIBLE_PARTS (1.00)[10]
Symbol: MID_RHS_MATCH_FROM (0.00)
Symbol: MIME_GOOD (-0.10)[multipart/mixed, multipart/alternative, text/plain, multipart/related]
Symbol: MIME_TRACE (0.00)[0:+, 1:+, 2:+, 3:+, 4:~, 5:~, 6:~, 7:~, 8:~, 9:~, 10:~, 11:~, 12:~, 13:~, 14:~, 15:~, 16:~, 17:~, 18:~, 19:~, 20:~, 21:~, 22:~, 23:~, 24:~, 25:~, 26:~, 27:~, 28:~, 29:~, 30:~, 31:~, 32:~, 33:~, 34:~, 35:~, 36:~, 37:~, 38:~, 39:~]
Symbol: MX_GOOD (-0.01)[]
Symbol: PREVIOUSLY_DELIVERED (0.00)[redacted]
Symbol: RCPT_COUNT_THREE (0.00)[4]
Symbol: RCVD_COUNT_THREE (0.00)[4]
Symbol: RCVD_IN_DNSWL_NONE (0.00)[209.85.218.44:received]
Symbol: RCVD_NO_TLS_LAST (0.10)
Symbol: RCVD_VIA_SMTP_AUTH (0.00)
Symbol: RECEIVED_SPAMHAUS_PBL (0.00)[62.235.236.122:received]
Symbol: R_DKIM_ALLOW (-0.20)[gmail.com:s=20210112]
Symbol: TO_DN_ALL (0.00)
Message-ID: 6d16dc09-ce12-7e40-4519-f9c96e55cba8@gmail.com
Urls: ["www.televesdre.eu","jmdelhaye.wix.com","www.natpro.be","annonces2.wix.com","architectejmdelhaye.twiza.org","batacc.be","www.facebook.com"]
Emails: ["redacted"]
Message - smtp_message: **timeout processing message**

Running the same command a second time result of a much faster process time with a successful result.

# time rspamc -t 120 < 1650637514.M101457P28206.cloud.ecowez.be,S=19230179,W=19494103:2,Scd
Results for file: stdin (0.421 seconds)
[Metric: default]
Action: no action
Spam: false
Score: -1.71 / 20.00

Running it one more time after a reboot gives roughly the same result (0.730 seconds)

Trying with other big emails (up to 38 megs) gives fast result.

Looks like that particular email (containing pictures) is problematic but… not all the time.

Additionally in the recent past the server has correctly handled bigger emails.

Hardware problem ? That’s a CPX21 VPS from Hetzner (3 vCPU / 4 gigs / SSD).

Tried again with the same offending mail :

time rspamc -t 120 < 1650643189.M451463P3342.cloud.ecowez.be,S=19234339,W=19498341:2,Scd
Results for file: stdin (17.372 seconds)
[Metric: default]
Action: no action
Spam: false
Score: -1.71 / 20.00
Symbol: BAYES_HAM (-3.00)[100.00%]
Symbol: DATE_IN_PAST (1.00)[29]
Symbol: DKIM_TRACE (0.00)[gmail.com:+]
Symbol: DMARC_POLICY_ALLOW (-0.50)[gmail.com, none]
Symbol: DWL_DNSWL_NONE (0.00)[gmail.com:dkim]
Symbol: FREEMAIL_ENVFROM (0.00)[gmail.com]
Symbol: FREEMAIL_ENVRCPT (0.00)[gmail.com]
Symbol: FREEMAIL_FROM (0.00)[gmail.com]
Symbol: FREEMAIL_TO (0.00)[ecowez.be, gmail.com]
Symbol: FROM_EQ_ENVFROM (0.00)
Symbol: FROM_HAS_DN (0.00)
Symbol: HAS_ATTACHMENT (0.00)
Symbol: HAS_ORG_HEADER (0.00)
Symbol: MANY_INVISIBLE_PARTS (1.00)[10]
Symbol: MID_RHS_MATCH_FROM (0.00)
Symbol: MIME_GOOD (-0.10)[multipart/mixed, multipart/alternative, text/plain, multipart/related]
Symbol: MIME_TRACE (0.00)[0:+, 1:+, 2:+, 3:+, 4:~, 5:~, 6:~, 7:~, 8:~, 9:~, 10:~, 11:~, 12:~, 13:~, 14:~, 15:~, 16:~, 17:~, 18:~, 19:~, 20:~, 21:~, 22:~, 23:~, 24:~, 25:~, 26:~, 27:~, 28:~, 29:~, 30:~, 31:~, 32:~, 33:~, 34:~, 35:~, 36:~, 37:~, 38:~, 39:~]
Symbol: MX_GOOD (-0.01)[]
Symbol: PREVIOUSLY_DELIVERED (0.00)[--@ecowez.be]
Symbol: RCPT_COUNT_TWO (0.00)[2]
Symbol: RCVD_COUNT_THREE (0.00)[4]
Symbol: RCVD_IN_DNSWL_NONE (0.00)[209.85.221.48:received]
Symbol: RCVD_NO_TLS_LAST (0.10)
Symbol: RCVD_VIA_SMTP_AUTH (0.00)
Symbol: RECEIVED_SPAMHAUS_PBL (0.00)[62.235.236.122:received]
Symbol: R_DKIM_ALLOW (-0.20)[gmail.com:s=20210112]
Symbol: TO_DN_ALL (0.00)
Message-ID: 44d8daa4-0de7-65a3-184b-e5542504cf42@gmail.com
Urls: ["www.televesdre.eu","jmdelhaye.wix.com","www.natpro.be","annonces2.wix.com","architectejmdelhaye.twiza.org","batacc.be","www.facebook.com"]
Emails: ["-----"]


real    0m17.413s
user    0m0.011s
sys     0m0.027s

Takes a lot of time again but passes.

4gigs… if it goes short on RAM, goes on swap. Which is faster on SSD, but… not as fast as RAM.

I run NS on a 2gigs hetzner vps machine for 3 years. Mail server, nextcloud, some simple web apps and even Plex. It works perfectly provided you disable the “default” clam rules (don’t remind exactly the name they use, that’s an option in cockpit).

Salut @pagaille

Run? Correct english would be:
I let a NS crawl on a 2GB Hetzner VPS…

A NethServer with 4 GB or less RAM really doesn’t “run”, it crawls…


I also use Hetzner - not VPS - but:

The Proxmox is a fully NVME based Hypervisor… (Extremly fast IO!)
Here, the term “running” is more appropriate…

:slight_smile:

My 2 cents
Andy

Both clamd and rspamd have caches.
Clamd in ram, expired on restart or rule update.
Rspamd in redis, expired by number of messages (there is a reset button in rspam UI).

Right. But why does rspamc timeout when the process is exceeding 20s (as far as I can see), since I set it to 30s / 2 retries ?

That’s still an open question. From my observation, it doesn’t work like you describe. The clamav timeout generates a 451 4.7.1 Cannot validate the message now. Try again later received by the sender.

On the 4 gigs machine :

image

On the 2 gigs machine :

image

It is true that the latter is swapping but I really can’t see any performance issue for my needs.
(BTW I just remarked that there was a hung process (cpu 100%), thank you :blush: )

@pagaille

If the 2GB mashine hits 2 GB Swap (= all RAM & Swap used up), processes will be terminated immediately, resulting in possible data loss. What process is terminated, is more or less chance… :frowning:

Don’t forget, that these mashines probably use a lot more RAM during backups, where Databases are dumped, entire folders are compressed (zipped), etc.

That’s when you should be checking usage, etc…

My 2 cents
Andy

I can’t reproduce.
See my logs, I searched for the msg id in my big test email: 1302271294.5012.1650457645290@nethservice.nethesis.it

First try (soft reject):

Apr 23 15:42:40 ns7 rspamd[6056]: <a6b3e4>; proxy; rspamd_task_write_log: id: <1302271294.5012.1650457645290@nethservice.nethesis.it>, qid: <CF50C1B0F69>, ip: 192.168.122.1, from: <no-reply@neth.net>, (default: F (soft reject): [10.94/20.00]

Second try 30 seconds later, I was monitoring clamd, I knew it completed scanning:

Apr 23 15:43:10 ns7 rspamd[6056]: <c64661>; proxy; rspamd_task_write_log: id: <1302271294.5012.1650457645290@nethservice.nethesis.it>, qid: <BDF631B0F69>, ip: 192.168.122.1, from: <no-reply@neth.net>, (default: T (add header): [10.94/20.00]

:confused:

That’s a complete standard installation with no customisation except the timeout delay I changed yesterday.

Let me know if I can do anything to help debugging. I send you a full maillog in private just in case.

Check why you have a do hight load on your 2gigas machine, 7 loads on your 1 cpu vm is too hight