Restic backup failed

NethServer Version: NethServer 7.9.2009
Module: restic

Yesterday the backup failed.
Corresponding log:
Pre backup scripts status: SUCCESS
umount: /mnt/path_to_distination: not mounted
Backup directory is not mounted
Can’t initialize restic repository
Action ‘backup-data-restic job_name’: FAIL
Backup status: FAIL

As it’s possible that I rebooted the server during backup, I tried to start the job today once again, but it failed with this log:

Backup: job_name
Backup started at 2021-02-25 08:08:44
Pre backup scripts status: SUCCESS
Save(<data/55b38741ee>) returned error, retrying after 552.330144ms: Sync: sync /mnt/backup-job_name/data/55/55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: input/output error
Save(<data/55b38741ee>) returned error, retrying after 1.080381816s: OpenFile: open /mnt/backup-backup-job_name/data/55/55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: file exists
Save(<data/55b38741ee>) returned error, retrying after 1.31013006s: OpenFile: open /mnt/backup-backup-job_name/data/55/55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: file exists
Save(<data/55b38741ee>) returned error, retrying after 1.582392691s: OpenFile: open /mnt/backup-backup-job_name/data/55/55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: file exists
Save(<data/55b38741ee>) returned error, retrying after 2.340488664s: OpenFile: open /mnt/backup-backup-job_name/data/55/55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: file exists
Save(<data/55b38741ee>) returned error, retrying after 4.506218855s: OpenFile: open /mnt/backup-backup-job_name/data/55/55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: file exists
Save(<data/55b38741ee>) returned error, retrying after 3.221479586s: OpenFile: open /mnt/backup-backup-job_name/data/55/55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: file exists
Save(<data/55b38741ee>) returned error, retrying after 5.608623477s: OpenFile: open /mnt/backup-backup-job_name/data/55/55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: file exists
Save(<data/55b38741ee>) returned error, retrying after 7.649837917s: OpenFile: open /mnt/backup-backup-job_name/data/55/55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: file exists
Save(<data/55b38741ee>) returned error, retrying after 15.394871241s: OpenFile: open /mnt/backup-backup-job_name/data/55/55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: file exists
Fatal: unable to save snapshot: OpenFile: open /mnt/backup-backup-job_name/data/55/55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: file exists
Backup failed
Action ‘backup-data-restic backup-job_name’: FAIL
Backup status: FAIL

What would you suggest to fix this? I am quite sure that deleting the whole restic folder and job, and recreating it from scratch would fix this, but I am not sure if thats necessary. On the other side - as it is an important server in production I also want to be sure that the backups are consistent.

Thanks in advance for your valuable advice :slight_smile:

@Elleni

Hi

As it seems that some mounts were left (not unmounted correctly) you probably just need a simple reboot to fix your issue…

The mounts will become “clean” again.

My 2 cents
Andy

1 Like

Hi Andy :slight_smile:

reboot was the first thing I done before posting. But I can try again - not now but in the evening as its a prod server. What will be my options just in case the problem persists?

Manually (if possible) clean that folder completly before reboot.
All folders inside are created with the backup jobs (temporarily) and normally removed afterwards, AFAIK…

1 Like

Hi Andy, thats what I meant in my opening post - in there the whole backups are stored - about 100 Gig in our case. So its the backup itself, not only temporary files, right? And I would like to avoid loosing all backups taken since mid september last year. But I will do so and create a new full backup as last resort if there is no other solution that can ensure a consistent backup.

@Elleni

AFAIK, the backups are stored externally, when that folder is mounted.
A backup locally is NOT a backup!
But it fill’s it up with data temporarily, or locally, because it could not be mounted. Are you using restic with rsync?

The backups must be somewhere else besides on NethServer, on a NAS?

True, the path is not local but externally - on another nethserver’s samba share, this is the path configured as destination for the restic backup job.

And if it can’t mount, the local folders are filled.
You could rename those, and place them “outside” of that mount folder…

But it is mounted… I doublechecked by creating a folder from the backuping nethserver and then browsing the destination fileshare on a windows client and seeing the created folder

mount shows among others:
//destination_other_nethserver/backupedserver_backup_jobname on /mnt/backup-backupedserver_restic type cifs (rw,relatime,vers=default,cache=strict,username=backupuser,domain=OURDOMAIN,uid=0,noforceuid,gid=0,noforcegid,addr=192.168.57.21,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)

Yes, but not by the latest backup job.
Probably by the previous job, and not unmounted.

The job demands to be able to mount the share…
If it’s already mounted, that fails…

unmount it by hand (you must be outside of the folder) check it’s unmounted! and test backup…

1 Like

done. Now the /mnt/job folder shows:
ls -l /mnt/backup-hostname_restic/
insgesamt 4
drwx------ 2 root root 4096 8. Nov 01:33 locks
[root@hostname ~]# ls -l /mnt/backup-hostname_restic/locks/
insgesamt 56
-rw------- 1 root root 158 8. Nov 01:23 0a19009e7351ca710f338770718116941ab0fd0bfeea30627cfe46ae7da0ea06
-rw------- 1 root root 158 8. Nov 00:43 10730912bd62053d2029ae8003bbab163e99a6aae09315690947a331edae009a
-rw------- 1 root root 158 8. Nov 00:48 1ce40ab9f5ddfe0fee5b6b41c79d6c6d1000f09baab429179dc2e77072cab514
-rw------- 1 root root 158 8. Nov 00:33 3a1edec117fc6ee5a67175c8c335e3a0e9b44b75186c705451e0f4c571205342
-rw------- 1 root root 158 8. Nov 01:03 4a558a312c7ede38b8c7e3d157f2034df671fa109cdf4ea44f4fb096817c8461
-rw------- 1 root root 158 8. Nov 00:58 4c3ef186da4ecd80d73153a0708b26e56ad495111ca26a1e3cbd85e3332ea9fa
-rw------- 1 root root 158 8. Nov 01:13 68d5b4870070d3f0685b0328256a357723164042c3b194f3ae6ce1f4863ae2d1
-rw------- 1 root root 158 8. Nov 00:28 7b81299ee587e48ac8ee10f7aacbae36629ac13792bc82eae1cd012ed2d61e18
-rw------- 1 root root 158 8. Nov 01:33 7d3361eecb706b7e48b837c6c05761234731cf04415098b6618bc96178d5ec0e
-rw------- 1 root root 158 8. Nov 01:08 836709a1f92d39124cd14f3548ae95aac623fe6cc9df1ba65b237ca65e15c542
-rw------- 1 root root 158 8. Nov 00:38 b6ddad653aa30995a40e1544c73e3dd484f59abadce3d8fcc4a67a57930b7f6f
-rw------- 1 root root 158 8. Nov 01:18 d1d1516d42912656bc1fe67ccc69a5983779d9def3f94312eba656b58f267770
-rw------- 1 root root 158 8. Nov 00:53 e4c23cc9381757948099d69f93c11a48f237cd2fd6f7f61b40a8ce0baf9e2330
-rw------- 1 root root 158 8. Nov 01:28 ea258298061d2a042430937444000ba17278d7e45229ea24b89b00d4530b54cb

Do these have to remain there or should locks folder be deleted?

Anything older than present can be deleted…
(check unmounted first!)

1 Like

@Elleni

PS: If possible, use NFS instead of SMB/CIFS - it’s faster for UNIX to UNIX and uses less CPU resources…
Neth can do that - Plugin from Stephdl…

You also answer faster in swiss-german than in english! :slight_smile:

NFS is a native language for UNIX, SMB is a foreign language…

Hi Andy. It is unmounted, no worries. And all those files are dated 8. november…

About NFS - I also would like to use it but so it was considered but is not an option as I want to avoid community packages on a production server. I can’t answer faster as I am supporting / phoning with our clients. :slight_smile:

You can do it by hand, simple editing the nfs share is enough… No comunity stuff. And NFS is provided by RHEL/Centos…

Yep, but don’t forget I have an unexperienced (linux) colleague and as it is not configurable within cockpit - we thought about and finally refused to use nfs. No problem though - I can live with cifs :slight_smile:

The backup finisehed successfully now but I still doubt, the backup is consistent. There is the possibility to check its integrity - else I maybe decide to drop the history of the backups and prefer to let run a new full backup during weekend to assure it really is consistent. I am asking as when I had to do a restore some time ago, I had problem with integrity for ex. nextcloud calendars were not restored and so on. Fortunatelly I still had the old server around and did a new full restic backup. On the second restore - it was successfull and all calendars and everything else was restored correctly - well almost - had some wrong ownage of files in the users mailboxes (root instead of vmail) but that was easy to correct.

So to summarize the problem was that it was still mounted? Dont get it somehow as I had rebooted the server in between. However - I deleted the content of locks in unmounted mountpoint but not the folder locks itself. Is that correct or can the locks folder itself be removed too?

However - thanks for the pointer that the mounted destination was blocking the backup. But I still would like to check the integrity of the backuped data. I read here that I could do restic -r /mnt/share check --read-data but it demands a password, and its apparently not the backup user password that is needed to mount the cifs share. I have to find out the password for the repository. Found it here: /var/lib/nethserver/secrets/

I get some of these: pack 9b4663f2: not referenced in any index
100 additional files were found in the repo, which likely contain duplicate data.

I guess thats normal and prune will take care of this as configured in the job, right? Still waiting the check to finish. Could take a while :slight_smile:

Edit: That sounds promising:

check snapshots, trees and blobs
no errors were found

now second restic check run but this time with --read-data

1 Like

100 additional files were found in the repo, which likely contain duplicate data.
You can run restic prune to correct this.
check snapshots, trees and blobs
read all data
[3:08:15] 100.00% 20955 / 20955 packs
no errors were found

So I guess I am good and no need to mistrust this restic repo.

While pruning I got:
Load(<data/55b38741ee>, 591, 4240132) returned error, retrying after 552.330144ms: open /mnt/mountpoint/data/55/55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: no such file or directory
Load(<data/55b38741ee>, 591, 4240132) returned error, retrying after 1.080381816s: open /mnt/mountpoint/data/55/55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: no such file or directory
Load(<data/55b38741ee>, 591, 4240132) returned error, retrying after 1.31013006s: open /mnt/mountpoint/data/55/55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: no such file or directory
Load(<data/55b38741ee>, 591, 4240132) returned error, retrying after 1.582392691s: open /mnt/mountpoint/data/55/55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: no such file or directory
Load(<data/55b38741ee>, 591, 4240132) returned error, retrying after 2.340488664s: open /mnt/mountpoint/data/55/55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: no such file or directory
Load(<data/55b38741ee>, 591, 4240132) returned error, retrying after 4.506218855s: open /mnt/mountpoint/data/55/55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: no such file or directory
Load(<data/55b38741ee>, 591, 4240132) returned error, retrying after 3.221479586s: open /mnt/mountpoint/data/55/55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: no such file or directory
Load(<data/55b38741ee>, 591, 4240132) returned error, retrying after 5.608623477s: open /mnt/mountpoint/data/55/55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: no such file or directory
Load(<data/55b38741ee>, 591, 4240132) returned error, retrying after 7.649837917s: open /mnt/mountpoint/data/55/55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: no such file or directory
Load(<data/55b38741ee>, 591, 4240132) returned error, retrying after 15.394871241s: open /mnt/mountpoint/data/55/55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: no such file or directory
pack file cannot be listed 55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: open /mnt/mountpoint/data/55/55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: no such file or directory

[22:06] 100.00% 21055 / 21055 packs
repository contains 21054 packs (368112 blobs) with 100.310 GiB
processed 368112 blobs: 1913 duplicate blobs, 343.250 MiB duplicate
load all snapshots
find data that is still in use for 111 snapshots
[3:50] 100.00% 111 / 111 snapshots
found 365846 of 368112 data blobs still in use, removing 2266 blobs
will remove 0 invalid files
will delete 7 packs and rewrite 186 packs, this frees 500.621 MiB
[3:11] 100.00% 186 / 186 packs rewritten
counting files in repo
Load(<data/55b38741ee>, 591, 4240132) returned error, retrying after 507.606314ms: open /mnt/mountpoint/data/55/55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: no such file or directory
Load(<data/55b38741ee>, 591, 4240132) returned error, retrying after 985.229971ms: open /mnt/mountpoint/data/55/55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: no such file or directory
Load(<data/55b38741ee>, 591, 4240132) returned error, retrying after 803.546856ms: open /mnt/mountpoint/data/55/55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: no such file or directory
Load(<data/55b38741ee>, 591, 4240132) returned error, retrying after 1.486109007s: open /mnt/mountpoint/data/55/55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: no such file or directory
Load(<data/55b38741ee>, 591, 4240132) returned error, retrying after 2.070709754s: open /mntmountpoint/data/55/55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: no such file or directory
Load(<data/55b38741ee>, 591, 4240132) returned error, retrying after 3.67875363s: open /mnt/mountpoint/data/55/55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: no such file or directory
Load(<data/55b38741ee>, 591, 4240132) returned error, retrying after 4.459624189s: open /mnt/mountpoint/data/55/55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: no such file or directory
Load(<data/55b38741ee>, 591, 4240132) returned error, retrying after 6.775444383s: open /mnt/mountpoint/data/55/55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: no such file or directory
Load(<data/55b38741ee>, 591, 4240132) returned error, retrying after 15.10932531s: open /mnt/mountpoint/data/55/55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: no such file or directory
Load(<data/55b38741ee>, 591, 4240132) returned error, retrying after 13.811796615s: open /mnt/mountpoint/data/55/55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: no such file or directory
pack file cannot be listed 55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: open /mnt/mountpoint/data/55/55b38741eececdff8c386dbe6c22668e339a889996a19e5c4bb2ac31bf5756bd: no such file or directory
[10:02] 100.00% 20955 / 20955 packs
finding old index files
saved new indexes as [5a80d41b 6d499386 bcb56941 d0cecb96 e9fa51a5 346a23aa 39203b1a]
remove 217 old index files
[0:11] 100.00% 217 / 217 files deleted
remove 193 old packs
[0:07] 100.00% 193 / 193 files deleted
done

Do I have to worry and/or better delete and consider a new full backup?

What is the output of:

restic version