Restic backup failed

Elleni · February 25, 2021, 11:30pm

restic 0.11.0 compiled with go1.15.3 on linux/amd64

Edit to add that a second check with --read-data and another prune shows no error message anymore.

Elleni · October 4, 2021, 6:48am

Sorry to bring that up again. Occasionally the restic job fails and we get a warnding email.

Backup: job_name
Backup started at 2021-10-03 23:45:01
Pre backup scripts status: SUCCESS
using parent snapshot xyz
Save(<data/xyz>) returned error, retrying after 552.330144ms: Sync: sync /mnt/backupname/data/xyz: input/output error
Save(<data/xyz>) returned error, retrying after 1.080381816s: OpenFile: open /mnt/backupname/data/xyz: file exists (repeated x times)
Save(<data/xyz>) returned error, retrying after 507.606314ms: Sync: sync /mnt/backupname/data/xyz: resource temporarily unavailable (repeated 2x)
Save(<data/xyz>) returned error, retrying after 1.080381816s: OpenFile: open /mnt/backupname/data/xyz: file exists (repeated x times)
Fatal: unable to save snapshot: OpenFile: open /mnt/backupname/data/xyz: file exists
Backup failed
Action ‘backup_job_name’: FAIL
Backup status: FAIL

Log file: /var/log/backup/backup_job_name-202110032345.log

I do not bother too much, as when I see this mail in the morning and re-start the job manually in webinterface it successfully finishes.

On the other hand it would be nice to avoid such failure mails, so what do we need to analyse the source of these failures in order to eliminate them?

Looking in my mailhistory the failed job happens approximatelly once a month…

Elleni · October 4, 2021, 7:01am

Some further diggin in the /var/log/messages of the nethserver providing the share at the time the backup runs and actually failed for this backup shows:

Oct 2 23:45:00 hostname qemu-ga: info: guest-ping called
Oct 2 23:45:00 hostname qemu-ga: info: guest-fsfreeze called
Oct 2 23:45:00 hostname qemu-ga: info: executing fsfreeze hook with arg ‘freeze’
Oct 2 23:45:01 hostname qemu-ga: info: executing fsfreeze hook with arg ‘thaw’
Oct 2 23:45:33 hostname smbd_audit: [2021/10/02 23:45:33.453443, 0] …/…/lib/param/loadparm.c:784(lpcfg_map_parameter)
Oct 2 23:45:33 hostname smbd_audit: Unknown parameter encountered: “profile acls”
Oct 2 23:45:33 hostname smbd_audit: [2021/10/02 23:45:33.453486, 0] …/…/lib/param/loadparm.c:1843(lpcfg_do_service_parameter)
Oct 2 23:45:33 hostname smbd_audit: Ignoring unknown parameter “profile acls”

Though I don’t know how related those are as the qemu-ga happens every 5 mins and the smbd_audit thingy happens regularly too…

I hope this can give us some hint on why it fails once a month.

mrmarkuz · October 4, 2021, 9:07am

To me it looks like it’s a connection/network issue.
Maybe autoupdate restarting samba server or heavy load on backup server?