Several issues or strange behavior when restoring files from "rsync" backups

yummiweb · February 9, 2022, 2:29am

I’ve noticed several issues or strange behavior when restoring files from “rsync” backups.

The problems occur almost identically on different machines.

A NethServer release 7.9.2009 (final) is used in each case
with kernel version 3.10.0-1160.53.1.el7.x86_64

Basic condition:
SFTP/ rsync backup set up via “System / Backup”.
The SFTP user on the backup host has limited rights but full access to the backup directory.

Error 1:
The backups are running without errors, after several days only errors are displayed (Nethserver backup log via email) that files in the root user’s directory cannot be deleted/updated.

“rm: unable to remove '/mnt/usb_backup/zdvasrv_zdva_de/pushed_rsync/2022-02-04-023109/root/”$some_file_from_root"': no permission"

This is the first suspected bug (but not the biggest problem).

Error 2:
Now a file should be restored from one of the regular rsync backups via “Applications / Restore data”. The corresponding backup configuration is selected under “Choose backup”. Only one backup is shown under “Choose date”, although a backup goes through every day (perhaps the second error).

This backup is selected and a file is searched for from a home directory using the search function. If the found file is marked and restored, the following happens:

Error 3:
If the restore is performed without checking the “overwrite” checkbox, the restored file (prefix .restore-datumsstring) is owned by the user and group “root”. File permissions are now “-rw-r–r–” instead of “-rwxrwx—” as before. There are no other abnormalities.

Error 4:
If the restore is carried out with a tick in the “overwrite” checkbox, the effects are completely different. The recovered file is owned by a user and group that does not exist on the system. Therefore, only the IDs are displayed. The IDs are identical to the SFTP user IDs on the backup host. It is interesting that the file restored in this way (overwritten) at least has the correct access rights “-rwxrwx—”.

Error 5:
However, the recovery problem does not only extend to the recovered file! Also, the entire home directory where the file was restored is now owned by the SFTP user and group, which does not exist on the system. As a result, the user can no longer access their own files.

An attempt with other files from file shares or e-mail folders was not carried out.

ADDITION:
As i see now, the WHOLE folders in path to the user home folders (“/var/”, “/var/lib/”, “/var/lib/nethserver”, “var/lib/nethserver/home/”) has wrong ownerships after restoring a single file from backup!!!

The problem first appeared on a machine that had previously had recovery issues.

On a completely different machine - different location, different network, different backup host but comparable target system and comparable restrictions of the SFTP user - the problems could be reproduced identically.

I hope my description was useful.

Greetings Yummiweb

mrmarkuz · February 9, 2022, 3:52am

I think you need unrestricted SSH access on the backup target machine to get a working SFTP backup, see also Backup over SFTP - #9 by pagaille

yummiweb · February 10, 2022, 2:35am

Mhhh…

Does that mean the backing up SFTP account needs greater rights on the target machine than just full access to the target path? I do not understand that.

On the one hand, because my previous (manual) manual rsync had an SMB share as the target and the associated SMB user could save the users and group properties on the target even without higher or even root rights.
EDIT:
This aspect is not correct, I must have remembered that incorrectly, I’m sorry.

On the other hand, because that would mean that you could only back up one Nethserver per backup host, because other machines would otherwise have full access to the backups of the other machines - or vice versa. That would be very impractical.

If root rights should actually be required on the target in order to save the ownership correctly, then that would mean that the ownership is derived from the ownership of the backup when restoring. Is that right?

In this case I don’t understand the principle either. On the one hand permissions are read from the ACL file and restored, on the other hand from the ownership of the backed up files?

In the previous backup with limited access to the backup target, all target files get the same ownership. I would understand if the current procedure is then hampered by this.

But I still don’t understand why - in my case - not only the files restored in the user directory get the ownership from the backup, but the entire path from the root to the file is compared with the ownership from the backup? What is it good for?

And I also don’t understand why the method behind it works differently depending on whether the recovered file is made as copy or original. If the rights and ownership are derived from the backup, it shouldn’t make any difference on the target, except that the file in the copy gets a prefix.

Can someone please explain to me how the restore routines work in terms of restoring original ownership?

The way it is logical for me at the moment, something is still not working as it should.

Greetings Yummiweb

yummiweb · February 10, 2022, 11:55am

I can think of a few options, I’d like to know what you think of them:

Set the sticky bit on the rsync program on the backup host.
It works technically, but then everyone who is allowed to use rsync on the backup host (including other ssh accesses) still has access to all files on the backup host and thus to the files of others. Can this be restricted in any way? I guess not.
Allow specific ssh users to use sudo for rsync. Technically works, but without limitations with the same problems as above (of course would also have to be incorporated into the rsync line of the netserver backup routine).

Possibly one could limit the “sudo” permission for rsync on the target host to a very special rsync command. Of course, you would have to know this command string in order to be able to enter it in /etc/sudoers on the target host. The only question is what exactly this command looks like on the target host? It’s probably not identical to the command string on the source host?

Do you have any ideas?

Andy_Wismer · February 10, 2022, 12:09pm

Hi @yummiweb

Why should NethServer cater for very bad planning?

Can you give ANY valid reason, why in a SME environment “anyone” or any non-admin users should have access with rsync (Not a common user environment tool) at all on a “backup server”?

NethServer is specifically for SME environments…

My 2 cents
Andy

yummiweb · February 10, 2022, 12:46pm

Well, in the event of a third-party takeover (i.e. a worst case), the backup client would not only have access to “its” backup (and could manipulate or delete it accordingly) but would also have full access to the entire backup host, which would also compromise it. Even if the backup host is appropriately separated so that it is not suitable as a stepping stone, the backups should also be considered compromised. In my opinion, that’s reason enough.

The question of having several different nethservers backed up on one backup host arises specifically for me as a (remote) backup target for several nethserver instances.

Of course I also use “pull” strategies for backups. But it somehow doesn’t make sense to run a separate backup host for each backup type. There are already several backup targets anyway, because “one backup is not a backup”. The backup strategies “built into” the nethserver are interesting because they can (should) be used to restore individual files and, in case of doubt, the entire system (intentionally or unintentionally). I hope I haven’t been rude so far, if so then it wasn’t intentional and I apologize. . Due to time constraints, my language skills are currently limited to those of Google.

Andy_Wismer · February 10, 2022, 12:58pm

Hi @yummiweb

Your reasoning for backup scenarios are quite acceptable, although some do go beyond the intended scope of NethServer. A bunch of SMEs with NethServers is not really a typically SME environment…

And to be honest, your english skills aren’t as bad as you claim. I’ve seen worse “Bablefish” translations - hardly any coherent sentences or grammer…

But it may be time to look into running your servers virtualized, like I do for my 30 odd clients…
All run on Proxmox, PBS gives a very good reliable and very fast backup for disaster recovery and single file restore (Or anything in between…).

Permissions? Proxmox default template is scaled for large enterprises - and it works extremly well, if you take the short time needed to it read up. Your use cases would all be covered, and, if all is using VPN, like I am doing, there’s hardly any chance of compormising the backup host!..

A completly hardware independant restore is possible on almost any hardware suitable for Proxmox / Debian within about 20 minutes and restore of the backup. NethServers backup/restore will not go quite this far, as Centos7 is somewhat aged compared to Debian11, the basis of Proxmox. Even using AMD or Intel CPUs do not make a difference.

Other, interesting features include:

Full live backups for any OS, no additional costs.
Fast Migration Cluster to full HA Cluster, no additional costs.
ZFS or CEPH Cluster from boot.
Fast and extremly reliable, solid!
Much more…

My 2 cents
Andy

giacomo · February 10, 2022, 1:13pm

Yes. Rsync preserves the name of the owner.

ACLs are extra metadata on top of normal UNIX permissions. This is why there is a separate dump for them. Also, not all filesystems support them.

I’ve never seen such behavior: usually permissions are restored correctly.

It depends on the backup engine. After the data have been restored, the post-restore-data event is called. Among other things, it also restore the ACLs on Samba file shares.

Rsync is launched using [rsync-time-backup[(GitHub - laurent22/rsync-time-backup: Time Machine style backup with rsync.) wrapper.
You can find more info on the linked repo.

yummiweb · February 10, 2022, 1:22pm

Thanks for your recommendations, apart from the PBS, I’m pretty much running this (and elsewhere) pretty much identically. The PBS is definitely the next target, but I still have to look at that before I “roll it out” somewhere else.

I have always been quite satisfied with my previous backup strategy. So far, the PVE has backed up the machines as a whole, the backup host (in my case this is the PVE) uses rsync to get file backups from the machines (e.g. nethserver).

As a precaution in the case of hardware problems etc., virtualization is something fine, also to take snapshots before major upgrades. Unfortunately, the backups vpm PVE are quite large, so that they can hardly be distributed remotely (don’t know if the PBS can do that)

So far I have always used rsync (pulled) for the most distributed backup possible at different locations. The restoration of a network server, for example, is not necessarily easy, but it is feasible in the event of a real fault.

For the quick recovery “in between” of e.g. individual files I have been happy to rely on the Nethserver’s own methods, e.g. Duplicity. Unfortunately, Duplicity also generates very large backups and is hardly suitable for remote backup (and it regularly throws errors for me when I rotate the local backup disks).

So I was quite happy that rsync via SFTP is now also possible on the Nethserver, which suited me very well. However, I did not expect the limitations observed.

On closer inspection, perhaps the problems were to be expected (if you know the process of restoring), but I didn’t expect restoring a file to leave such changes in the system. So I thought it was important to write this “error message” here.

Unfortunately I don’t have enough knowledge to be able to contribute as a developer to this really great project “Nethserver”. But I can describe my observations and in this way contribute to the project, which I like to do.

But somehow I have the feeling that my bug reports are not well received anymore. Do I find too much?

Andy_Wismer · February 10, 2022, 1:33pm

You REALLY need to look at PBS…!!!

It gives you:

Fast, incremental Backups!
A Backup taking an hour on Proxmox without PBS wil take about the same time on it’s first run. The next backup will take a minute or less!

Due to compression and deduplication, it’s easy to store a years worth of backups in a very small space… Deduplication factors of 20-30x aren’t rare!

Offsite Storage for Disaster Recovery
A PBS can be configured to “collect” Proxmox Backups on another PBS. PBS uses “pull”… Using filtering, keeping versions down, this is practically only needed for disaster recovery, and then you want the latest one!
As this is also “incremental” offsite Backups of large VMs are finally a reaility…
A PBS with 1.2 TB of Backups (In the cloud) is downloaded twice to my home - this takes about 15-30 minutes each run…
My Home Internet is cable with about 800 down and 100 up. For this PBS offsite I only need download, and I have limited this to a tenth of my available bandwidth - I don’t notice any delays in Internet!

A client has a 1.2 TB NethServer VM. There are other VMs, including a Windows ERP server.
The PBS has for VM Backups 2 x 4 TB SATA Disks. We have about 35 Versions of the NethServer alone, going back 7-8 months now. Planned retention is for one year.

Retention Options in PBS:

My site in the “Cloud”:

A “Global” view. The site “APU” on the right represents my home, where a dedicated PBS collects this stuff. I have anther PBS for my Home / Lab.

A good friend of mine works for a company. They also use Proxmox, with PBS, all servers equipped with SSDs. The main Windows VM fileserver, a member of NethServers AD is about 1.6 TB, about 1 TB used. The hourly backup takes 10 seconds!!!

As to this:
But somehow I have the feeling that my bug reports are not well received anymore. Do I find too much?

Don’t worry too much, your bug reports are welcome. However, it is a busy time, Fosdem will be replaced by a chat end of month (NethServer 8), and there’s so many things going on…

My 2 cents
Andy

Good to know about PBS:

For my first PBS (My own first test) I used an 8 year old Intel i5 Dualcore PC with 8 GB RAM, equipped with a 120 GB SSD (System) and two 2 TB Disks all in ZFS. The test went on to be productive at home for nearly a year. It’s replaced now with a box with bigger and faster Disks…
This is now a HP Proliant Microserver Gen 8, 16 GB RAM, 120 GB SSD (System), 2 x 4TB SATA Disks in ZFS mirror for Backups…

yummiweb · February 10, 2022, 3:52pm

Thank you for your valuable tips! I am sure that I will deal with it shortly.

Regarding my error message, you should take a closer look at my observation that restoring individual files differs in their permissions depending on the “overwrite” or “as a copy” restore option. The permissions are independent of the owner, so in my opinion this difference shouldn’t occur unless it’s intentional.

Good luck with your projects!

dnutan · February 10, 2022, 8:52pm

Not sure it is relevant for the issue you describe but maybe gives some tip:

github.com/NethServer/dev

Restore of a whole directory

opened 09:32AM - 04 Dec 19 UTC

closed 02:01PM - 05 Dec 19 UTC

filippocarletti

bug verified

Restoring (overwrite) a directory from a restic backup leads to wrong permission…s for the directory. See https://community.nethserver.org/t/cockpit-backup-and-restore-data-not-working-correctly/13850/28 for details. **Steps to reproduce** - Delete a directory from an ibay - Restore it from a restic backup selecting overwrite **Expected behavior** Directory permissions set like before deletion. **Actual behavior** Dir perms are root:root ``` [filippo@neth.net@ns77-com ~]$ ls -la dir/ total 4 drwxr-xr-x 2 filippo@neth.net locals@neth.net 18 Dec 4 10:20 . drwx------ 4 filippo@neth.net locals@neth.net 254 Dec 4 10:20 .. -rw-r--r-- 1 filippo@neth.net locals@neth.net 5 Dec 4 10:20 file [filippo@neth.net@ns77-com ~]$ rm -rf dir/ [filippo@neth.net@ns77-com ~]$ ls -la dir/ ls: cannot open directory dir/: Permission denied [filippo@neth.net@ns77-com ~]$ logout [root@ns77-com filippo]# ls -la dir/ total 4 drwx------ 2 root root 18 Dec 4 10:30 . drwx------ 4 filippo@neth.net locals@neth.net 254 Dec 4 10:30 .. -rw-r--r-- 1 filippo@neth.net locals@neth.net 5 Dec 4 10:20 file ``` **Components** restic-0.9.6-1.ns7.x86_64 ---- Thanks to Mario (@trentatre) and Marc (@dnutan)