SFTP backup is deleted on disabling when backup target is unreachable

Hm. My data-backup configuration vanished. The standard data-backup config was used to backup data to a remote host.
I restored the machine with another configuration, that wrote with duplicity to an usb stick.

data-backup had been missing sometimes before while restoring the server, but after last configuration restore it was there (and deactivated).
Now I tried to activate it. That threw an error and the complete configuration vanished.

Please help me restore the data-backup config. I would like to resume my restic backups at the same location as before.

logs; I think this was the time, backup-data disappeared

Apr 12 22:36:03 neth cockpit-bridge: ERROR: You should not set a config record without a type (key was backup-data).
Apr 12 22:36:03 neth /usr/libexec/nethserver/api/system-backup/update[5313]: /var/lib/nethserver/db/backups: OLD backup-data=restic|B2AccountId||B2AccountKey||B2Bucket||BackupTime|30 0 * * *|CleanupOlderThan|364D|FullDay|0|NFSHost||NFSShare||Notify|error|NotifyFrom||NotifyTo|root@localhost|Program|duplicity|Prune|0|S3AccessKey||S3Bucket||S3Host|s3.amazonaws.com|S3SecretKey||SMBHost||SMBLogin||SMBPassword||SMBShare||SftpDirectory|/bergPool-/Backup/xxx/restic/|SftpHost|remoteBackup|SftpPort|9822|SftpUser|gsrbackup|Type|incremental|USBLabel||VFSType|sftp|VolSize|2|WebDAVLogin||WebDAVPassword||WebDAVUrl||status|disabled
Apr 12 22:36:03 neth /usr/libexec/nethserver/api/system-backup/update[5313]: /var/lib/nethserver/db/backups: NEW backup-data=|B2AccountId||B2AccountKey||B2Bucket||BackupTime|30 0 * * *|CleanupOlderThan|364D|FullDay|0|NFSHost||NFSShare||Notify|error|NotifyFrom||NotifyTo|root@localhost|Program|duplicity|Prune|0|S3AccessKey||S3Bucket||S3Host|s3.amazonaws.com|S3SecretKey||SMBHost||SMBLogin||SMBPassword||SMBShare||SftpDirectory|/bergPool-/Backup/xx/restic/|SftpHost|remoteBackup|SftpPort|9822|SftpUser|gsrbackup|Type|incremental|USBLabel||VFSType|sftp|VolSize|2|WebDAVLogin||WebDAVPassword||WebDAVUrl||status|disabled
Apr 12 22:36:03 neth cockpit-bridge: Use of uninitialized value in print at /usr/libexec/nethserver/api/lib/backup_functions.pl line 50.
Apr 12 22:36:03 neth cockpit-bridge: ERROR: You should not set a config record without a type (key was backup-data).
Apr 12 22:36:03 neth cockpit-bridge: ERROR: You should not set a config record without a type (key was backup-data).
Apr 12 22:36:03 neth cockpit-bridge: ERROR: You should not set a config record without a type (key was backup-data).
Apr 12 22:36:03 neth cockpit-bridge: ERROR: You should not set a config record without a type (key was backup-data).
Apr 12 22:36:03 neth cockpit-bridge: ERROR: You should not set a config record without a type (key was backup-data).
Apr 12 22:36:03 neth cockpit-bridge: ERROR: You should not set a config record without a type (key was backup-data).
Apr 12 22:36:03 neth cockpit-bridge: ERROR: You should not set a config record without a type (key was backup-data).
Apr 12 22:36:03 neth cockpit-bridge: ERROR: You should not set a config record without a type (key was backup-data).
Apr 12 22:36:03 neth cockpit-bridge: ERROR: You should not set a config record without a type (key was backup-data).
Apr 12 22:36:03 neth cockpit-bridge: ERROR: You should not set a config record without a type (key was backup-data).
Apr 12 22:36:03 neth cockpit-bridge: ERROR: You should not set a config record without a type (key was backup-data).
Apr 12 22:36:03 neth cockpit-bridge: ERROR: You should not set a config record without a type (key was backup-data).
Apr 12 22:36:03 neth cockpit-bridge: ERROR: You should not set a config record without a type (key was backup-data).
Apr 12 22:36:03 neth cockpit-bridge: ERROR: You should not set a config record without a type (key was backup-data).
Apr 12 22:36:03 neth /usr/libexec/nethserver/api/system-backup/update[5313]: /var/lib/nethserver/db/backups: OLD backup-data=|B2AccountId||B2AccountKey||B2Bucket||BackupTime|30 0 * * *|CleanupOlderThan|364D|FullDay|0|NFSHost||NFSShare||Notify|error|NotifyFrom||NotifyTo|root@localhost|Program|duplicity|Prune|0|S3AccessKey||S3Bucket||S3Host|s3.amazonaws.com|S3SecretKey||SMBHost||SMBLogin||SMBPassword||SMBShare||SftpDirectory|/bergPool-/Backup/xxx/restic/|SftpHost|remoteBackup|SftpPort|9822|SftpUser|gsrbackup|Type|incremental|USBLabel||VFSType|sftp|VolSize|2|WebDAVLogin||WebDAVPassword||WebDAVUrl||status|disabled
Apr 12 22:36:03 neth cockpit-bridge: ERROR: You should not set a config record without a type (key was backup-data).
Apr 12 22:36:03 neth cockpit-bridge: ERROR: You should not set a config record without a type (key was backup-data).
Apr 12 22:36:03 neth cockpit-bridge: ERROR: You should not set a config record without a type (key was backup-data).
Apr 12 22:36:03 neth cockpit-bridge: ERROR: You should not set a config record without a type (key was backup-data).
Apr 12 22:36:03 neth /usr/libexec/nethserver/api/system-backup/update[5313]: /var/lib/nethserver/db/backups: NEW backup-data=restic|B2AccountId||B2AccountKey||B2Bucket||BackupTime|30 0 * * *|CleanupOlderThan|364D|FullDay|0|NFSHost||NFSShare||Notify|error|NotifyFrom||NotifyTo|root@localhost|Program|duplicity|Prune|0|S3AccessKey||S3Bucket||S3Host|s3.amazonaws.com|S3SecretKey||SMBHost||SMBLogin||SMBPassword||SMBShare||SftpDirectory|/bergPool-/Backup/xxx/restic/|SftpHost|remoteBackup|SftpPort|9822|SftpUser|gsrbackup|Type|incremental|USBLabel||VFSType|sftp|VolSize|2|WebDAVLogin||WebDAVPassword||WebDAVUrl||status|disabled
Apr 12 22:36:03 neth /usr/libexec/nethserver/api/system-backup/update[5313]: /var/lib/nethserver/db/backups: OLD backup-data=restic|B2AccountId||B2AccountKey||B2Bucket||BackupTime|30 0 * * *|CleanupOlderThan|364D|FullDay|0|NFSHost||NFSShare||Notify|error|NotifyFrom||NotifyTo|root@localhost|Program|duplicity|Prune|0|S3AccessKey||S3Bucket||S3Host|s3.amazonaws.com|S3SecretKey||SMBHost||SMBLogin||SMBPassword||SMBShare||SftpDirectory|/bergPool-/Backup/xxx/restic/|SftpHost|remoteBackup|SftpPort|9822|SftpUser|gsrbackup|Type|incremental|USBLabel||VFSType|sftp|VolSize|2|WebDAVLogin||WebDAVPassword||WebDAVUrl||status|disabled
Apr 12 22:36:03 neth /usr/libexec/nethserver/api/system-backup/update[5313]: /var/lib/nethserver/db/backups: NEW backup-data=restic|B2AccountId||B2AccountKey||B2Bucket||BackupTime|30 0 * * *|CleanupOlderThan|364D|FullDay|0|NFSHost||NFSShare||Notify|error|NotifyFrom||NotifyTo|root@localhost|Program|duplicity|Prune|0|S3AccessKey||S3Bucket||S3Host|s3.amazonaws.com|S3SecretKey||SMBHost||SMBLogin||SMBPassword||SMBShare||SftpDirectory|/bergPool-/Backup/xxx/restic/|SftpHost|remoteBackup|SftpPort|9822|SftpUser|gsrbackup|Type|incremental|USBLabel||VFSType|sftp|VolSize|2|WebDAVLogin||WebDAVPassword||WebDAVUrl||status|enabled
Apr 12 22:36:03 neth esmith::event[5314]: Event: nethserver-backup-data-save backup-data /tmp/vszfnzZRhP
Apr 12 22:36:03 neth esmith::event[5314]: expanding /etc/backup-data.d/nethserver-backup-data.include
Apr 12 22:36:03 neth esmith::event[5314]: expanding /etc/cron.d/backup-data
Apr 12 22:36:03 neth esmith::event[5314]: expanding /etc/davfs2/davfs2.conf
Apr 12 22:36:03 neth esmith::event[5314]: expanding /etc/davfs2/secrets
Apr 12 22:36:03 neth esmith::event[5314]: expanding /etc/ssh/ssh_config
Apr 12 22:36:03 neth esmith::event[5314]: Action: /etc/e-smith/events/actions/generic_template_expand SUCCESS [0.104597]
Apr 12 22:36:04 neth esmith::event[5314]: Action: /etc/e-smith/events/nethserver-backup-data-save/S30nethserver-restore-data-clean-list SUCCESS [0.043259]
Apr 12 22:36:04 neth esmith::event[5314]: /usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/var/lib/nethserver/backup/backup.key.pub"

After connection time-out, the backup-data configuration gets deleted

Apr 12 22:40:18 neth esmith::event[5314]: 
Apr 12 22:40:18 neth esmith::event[5314]: /usr/bin/ssh-copy-id: ERROR: ssh: connect to host xxx.xxx.org port 9822: Connection timed out
Apr 12 22:40:18 neth esmith::event[5314]: 
Apr 12 22:40:18 neth esmith::event[5314]: Action: /etc/e-smith/events/nethserver-backup-data-save/S40nethserver-backup-data-ssh FAILED: 1 [254.565762]
Apr 12 22:40:18 neth esmith::event[5314]: Event: nethserver-backup-data-save FAILED
Apr 12 22:40:18 neth /usr/libexec/nethserver/api/system-backup/update[5313]: /var/lib/nethserver/db/backups: DELETE backup-data=restic|B2AccountId||B2AccountKey||B2Bucket||BackupTime|30 0 * * *|CleanupOlderThan|364D|FullDay|0|NFSHost||NFSShare||Notify|error|NotifyFrom||NotifyTo|root@localhost|Program|duplicity|Prune|0|S3AccessKey||S3Bucket||S3Host|s3.amazonaws.com|S3SecretKey||SMBHost||SMBLogin||SMBPassword||SMBShare||SftpDirectory|/bergPool-/Backup/xxx/restic/|SftpHost|remoteBackup|SftpPort|9822|SftpUser|gsrbackup|Type|incremental|USBLabel||VFSType|sftp|VolSize|2|WebDAVLogin||WebDAVPassword||WebDAVUrl||status|enabled
Apr 12 22:40:18 neth esmith::event[6096]: Event: nethserver-backup-data-save backup-data
Apr 12 22:40:18 neth esmith::event[6096]: expanding /etc/backup-data.d/nethserver-backup-data.include
Apr 12 22:40:18 neth esmith::event[6096]: expanding /etc/cron.d/backup-data
Apr 12 22:40:18 neth esmith::event[6096]: expanding /etc/davfs2/davfs2.conf
Apr 12 22:40:18 neth esmith::event[6096]: expanding /etc/davfs2/secrets
Apr 12 22:40:18 neth esmith::event[6096]: expanding /etc/ssh/ssh_config
Apr 12 22:40:18 neth esmith::event[6096]: Action: /etc/e-smith/events/actions/generic_template_expand SUCCESS [0.09804]
Apr 12 22:40:18 neth esmith::event[6096]: Action: /etc/e-smith/events/nethserver-backup-data-save/S30nethserver-restore-data-clean-list SUCCESS [0.040818]
Apr 12 22:40:18 neth esmith::event[6096]: Action: /etc/e-smith/events/nethserver-backup-data-save/S40nethserver-backup-data-ssh SUCCESS [0.001946]
Apr 12 22:40:18 neth esmith::event[6096]: Event: nethserver-backup-data-save SUCCESS

When I do

db backups show backup-data

I do get no output at all, it is all wiped out.

I think this is a bug. It should not be deleted when a connection error appears. It simply should be (kept) deactivated.

Maybe the restic data backup was done without having current configuration backup(s) of the restic backup settings?
So the configuration backups restored by the data backup didn’t inlcude the restic backup settings so it’s not possible to recover them.
Did you try to recreate the restic data backup manually?

I have not recreated the restic data backup yet.

restic data backup settings had been restored. You can see in the logs, that it was existing with custom settings. I assume that restoring the resic backup settings worked.
Its status was “disabled”, and that was correct as I disabled this backup before saving the server configuration for the migration.

The bug I ran into happend when I tried to activate this backup on the new machine. Activation failed because the ovpn tunnel it relies on wasn`t connected yet. Now instead of keeping this backup disabled, it got deleted.

If it is a bug, you should be able to reproduce it by

  • creating a restic-ssh backup configuration (connection must be working on creation)
  • deactivating this backup in cockpit
  • (physically) disconnecting the ssh target destination
  • activating the backup in cockpit

I can imagine, it is not at all about restoring, restic or ssh, but generally the wrong command (DELETE) instead of setting backup status to disabled whenever activation of a backup strategy fails.

The bold part is reported only in this post or… am i wrong?

I couldn’t reproduce.
But maybe we’re missing a step, like restore config/data…

Ok, good if it is not a general bug, but of course harder to track.

The backup-data settings shown several times in the logs look fine to me, except that weird “missing type” error. But I cannot say if this is enough proof for a successful restore.

@mrmarkuz: What were the differences in your log when you tried to reproduce? What happend after ssh-copy-id failed?

@pike: yes, I had to dig into it. Ironically I had deactivated the backup before migration because I wanted to have the ovpn tunnel alive before resuming backups…

@sternkrabbe I can understand and respect the goal for having the tunnel up and backup on the other side of the tunnel…
It’s an offsite backup (so the installation can be nuked without too much hassle)
It delivers an encrypted connection (so data want be sniffed via internet)
It saves data or power consumption (which can ease a lot of money if the installation is hardware hosted or VM hosted outside the premises).

This is a really nice workaround for not publish a WebDAV/SFTP/SMB server if not desired. But

As a sysadmin (or a supposed one like me) IMVHO should have better notes and homework done for managing own installation. Is a PITA writing down, updating, reviewing, correcting them if they are wrong. But every minute spent to make that kind of documentation about your setup counts up to 10x when it’s time to troublueshoot (and you cannot guess the obvious/immediate reason), digging up a disaster recovery, restore some important data.

In some mission critical environments, some activities like simulate a failure (shutting down something) and evaluate procedures and time spent restoring the failed piece are scheduled, audited, reported and evaluated/corrected for improvement.

The funk-up fairy (typo quite intended) is always waiting for a drone to cast the spell… Link above are my proof of dumbness. Hoping that someone will laugh my bad experiences with something new learned :wink:

Tip: consider a bandwith rule with timing for having enough bandwidth for the restore… and enough bandwidth for everything else when someone is using that connection (the one hosting the backups)

2 Likes

I accidentally tested CIFS restic backup before, now I tested with SFTP and I could reproduce the issue. I moved the relevant posts to a new thread.
The rsync SFTP backup is affected too.

  • Create SFTP restic/rsync backup
  • Stop SSH service on backup machine
  • Disable backup - throws error
  • Refresh browser

The backup isn’t shown in cockpit anymore and db backups show also shows nothing.

I found following old issue. It’s about deleting backups when the save event fails but in our case an already existing backup is deleted when disabling it.

After commenting out the relevant code at lines 60-62 in /usr/libexec/nethserver/api/lib/backup_functions.pl, the backup is still there after disabling but we should delete failed backups so I don’t know how to fix it. Maybe exporting a variable or create a file on disabling so the save event can check if an existing backup was disabled and doesn’t delete it?

        # rollback: delete backup record and clean expanded templates
        # my $b = $db->get($name);
        # $b->delete();
        # system("/sbin/e-smith/signal-event nethserver-backup-data-save $name");

@giacomo do you have an idea?

1 Like

My question is… why? Why a “failed backup” should be deleted?
The bug you liked specify the record for the DB… so the backup procedure into the database.

Little experience…
A customer do not wake up its fileserver/backup server (a NAS) if the premises are not supposed to be opened/with people inside. This happens during saturday and sunday.
I cannot pick as schedule “only these days par week”. So i decided to go daily, full on friday,… and eat the error/fails for saturday and sunday. So if any fails should delete the backup procedure from the entry, i would not be able to arrange something better than once par week. And this is not good enough…

IMVHO a disabled backup procedure… won’t be scheduled into cron. But that’s the end of the game.

1 Like

I think I expressed myself badly…not the backup files itself will be deleted, only a wrong backup config.

If you create/edit/disable a backup, the save event (signal-event nethserver-backup-data-save) deletes the backup in the backups db (and respectively in cockpit) if the event fails. This is ok and wanted in most cases except if the backup already existed before.
In other words: If you create a new backup and there is an error in save event it’s absolutely ok to remove this backup.

It’s more an issue how to code it to cover both cases in one event.

1 Like

I think there isn’t a clean fix for it.
I propose to revert my PR: backup api: rollback in case of error by gsanchietti · Pull Request #158 · NethServer/nethserver-cockpit · GitHub
I’m pretty sure this was a request from the support team because of multiple tickets, but I also know that current behavior raised also some tickets.

Personal opinion: I’d prefer to keep the configuration database untouched and see backups keep failing until the admin fix it.

Since is support-related problem, let’s ask also @nrauso and @filippo_carletti

1 Like

I agree with @giacomo: it’s better to keep the database untouched and let the backup fails until the admin fix it.
This is the unique scenario in which the configured backup disappears in case of unreachable backend, in any other possibile backup configuration this does not happen.
So I have a doubt: there should be a good reason if we choosed this behaviour in this specific case, but I cannot find out why.
What am I missing? :thinking:

2 Likes

The creation of a “non-working” backup? for having it deleted if the admin cannot configure a working backup?

But why delete a non-working-ssh-backup and preserve a non-working-cifs-backup? That´s the question… (nrauso said: “in this specific case?”)

Forced disabling the backup is IMHO the best option for an admin who made a minor mistake and now has to fix it. It is clear that this backup is not working, no false premises made.

And talking about ssh to a remote host, a connection could fail at any time, without the admin being involved or aware at all.

Furthermore the save-event is not only called on creation but also on editing, disabling and enabling of a backup. So it affects setups that are known to have been working before.

The code doesn’t check the backup type, but only the save event exit code.
So maybe with CIFS the event is not failing.

I have no idea :smiley: Let’s hope in @filippo_carletti memory!

1 Like

Vain hope

1 Like

What about editing /etc/e-smith/events/actions/nethserver-backup-data-ssh line 44 to:

sshpass -v -f $PASSWORD /usr/bin/ssh-copy-id -i $KEY".pub" -p $PORT $USER@$HOST || true

This way that line throws no error anymore and therefore the save event succeeds.
And just SSH backup is affected by the change.

It’s true, but then you will not be able to catch the error anymore.

What do you prefer @filippo_carletti @nrauso ? Revert or silence the error?

1 Like

What is “the error”?
Failed to run?
Failed to register the backup operation?