Duplicity replacement --> rsync Time machine-like backups

backup

(Matthieu Gaillet) #1

Preliminary note : this is work in progress and comes without any warranty.

Why

While I like the general concept of nethserver’s backup strategy, I found that duplicity was not the best engine especially when nethserver is used as a file server with muli-terra files to be backuped :

  • Heavy CPU load
  • Awful Data throughput
  • Files stored in multi-pieces archives, making it hard to restore manually and more prone to break if only one piece of the archive gets corrupted
  • Necessity (why ?) to make a full backup every week at least.
  • backing up 2 To of data necessitate > 24h during which the server is on his knees.

What

Coming initially from the OSX world, I found a nice script that mimics Apple Time Machine’s way of doing backups.

  • Each backup is on its own folder named after the current timestamp. Files can be copied and restored directly, without any intermediate tool.
  • It is possible to backup to/from remote destinations over SSH.
  • Files that haven’t changed from one backup to the next are hard-linked to the previous backup so take very little extra space.
  • Automatically purge old backups - within 24 hours, all backups are kept. Within one month, the most recent backup for each day is kept. For all previous backups, the most recent of each month is kept.
  • The script is mature and very nicely written. There are some helpers that make possible to have multiple backup profiles / destinations.

All in all, I personally can only see advantages against duplicity that was “never built to handle big volumes of data” (sic!).

I wanted to implement this script as a simple in-place replacement of duplicity as backup engine, without breaking the rest of the backup logic of nethserver. Thanks to the modular conception of nethserver and its backup-data module, this wasn’t difficult.

No existing file has to be modified. Only one configuration property has to be changed.

How

  1. Install the backup-data module and configure it as needed, then disable it.

  2. Download the main backup engine script, I choose to put it in /root
    cd /root
    git clone https://github.com/laurent22/rsync-time-backup.git

  3. Download my script used to interface nethserver with the main script :
    cd /etc/e-smith/events/actions/ git clone https://github.com/pagaille/nethserver-backup-data-rsync_tmbackup.git
    Review the script to ensure that the options fits your configuration (it should be the case)

  4. Modify the configuration db to make nethserver use the new script
    db configuration setprop backup-data Program rsync_tmbackup

  5. add a cron job to launch the backup as often as you want by adding something like 0 23 * * * /usr/sbin/e-smith/backup-data to the file /var/spool/cron/root

That’s all. You may launch backup-data and watch the magic happen through the usual log files (see backup module documentation). Everything works except the GUI : pre-backup tasks (sql dumps, config-backup, etc), post-backup tasks, custom files included or excluded, mail notifications, and even the dashboard’s backup status gets updated.

Performance Comparison with duplicity

Tests ran on a very basic 2 x Intel® Core™2 Duo CPU E6550 @ 2.33GHz with an USB2 hard disk attached for backups.

Runs almost twice as fast on big files :

Requires roughly as much CPU power, but less IO-related, which makes possible to run the process as “nice” (low priority) :

What’s next / TO DO

  • a GUI should obviously be created i.e to handle the ssh ability of rsync for remote storage. Sadly I’m far from even thinking about doing this myself.
  • The restore script have to be modified to ensure automated restoration works. Right now I guess that simply copying the saved files from the latest folder using something like rsync -aP /path/to/backup/latest / onto a reinstalled nethserver should do it.
  • The “restore” tab for the GUI should also be updated.
  • Currently the backup script backups everything each time it is lauched. That requires a great deal of ressources to dump the sql tables, etc… It could be interesting to make two cron jobs : one to make a full configuration backup, and one to backup only the file shares.

(Rob Bosch) #2

I love initiatives like this. Making NethServer stronger every day!


(Matthieu Gaillet) #3

Thanks Robb. I hope that it will get some attention from the @dev_team :slight_smile: I really believe that this rsync-based script is a real plus for those that use Nethserver as a (big) file server.


(Alessio Fattorini) #4

How can we help? Sounds like a great initiative. What do you need?


(Matthieu Gaillet) #5

Well, everything I cannot do alone :slight_smile:

  1. Discuss and possibly recognise the advantages of that rsync script. Why was duplicity choose at dev time ? What are its advantages against a well-thought rsync script ?

  2. Develop a new UI. Actually it may be easier than I first thought, there is a lot that could be reused. I’ll check.

  3. Test the full backup / restore scenario extensively before shipping


(Filippo Carletti) #6

Backup is a complex subject. :slight_smile:
rsync needs a filesystem and the same space of data to backup
duplicity compresses the backup
duplicity can encrypt
duplicity has integrity checks
Restoring old files is easier with duplicity.

I’d like to replace duplicity, but with something like borg (https://github.com/borgbackup/borg).


(Dan) #7

This is pretty important. Something with some parity would be nice too.


(Matthieu Gaillet) #8

Not sure it’s a real issue / feature nowadays. BTW, rsync DOES compress, but only between the local and remote machine to save bandwidth if remote target are used.

Sure. But (target) file systems too, there are multiple ways to do this, the easiest being probably setting up an encrypted LVM volume, using cryptsetup.

rsync too ! rsync always uses checksums to verify that a file was transferred correctly.

Really I don’t see something easier than cp /mnt/backup/machine/YYYYMMDDHHMM/my lost.file /my/destination

I really feel like rsync is the most easy/efficient/allround solution for a standard backup like nethserver’s backup-data module.

I’ll investigate that borg thing. The name itself makes it interesting :slight_smile:


(Markus Neuberger) #9

I like rsync but rsync alone is “just” a syncing tool, the backup work(versioning, encryption,…) is not included so you need scripts like rsync-time-backup for that.
But the feature of borg seems to be deduplication.

Installed borgbackup, just yum install borgbackup:

[root@testserver ~]# borg create -v --stats /root/repo::test1 /usr/share
Enter passphrase for key /root/repo:
------------------------------------------------------------------------------
Archive name: test1
Archive fingerprint: 738bb5e66e389f8a8f3209f3e1781fc8180bbd545713b2465311528016391aca
Time (start): Fri, 2018-01-05 23:11:43
Time (end):   Fri, 2018-01-05 23:12:17
Duration: 34.02 seconds
Number of files: 32659
Utilization of max. archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              486.52 MB            290.14 MB            283.61 MB
All archives:              559.50 MB            327.39 MB            302.28 MB

                       Unique chunks         Total chunks
Chunk index:                   37430                45084
------------------------------------------------------------------------------
[root@testserver ~]# borg create -v --stats /root/repo::test2 /usr/share
Enter passphrase for key /root/repo:
------------------------------------------------------------------------------
Archive name: test2
Archive fingerprint: 400a1a1919ce001fa0d6d7432707da1a2221eef7db143d0c378f572039b53386
Time (start): Fri, 2018-01-05 23:12:37
Time (end):   Fri, 2018-01-05 23:12:46
Duration: 9.87 seconds
Number of files: 32659
Utilization of max. archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              486.52 MB            290.14 MB              1.86 MB
All archives:                1.05 GB            617.54 MB            304.14 MB

                       Unique chunks         Total chunks
Chunk index:                   37464                77803
------------------------------------------------------------------------------

(Rob Bosch) #10

If you are using ‘some’ cloud service to backup to, it is VERY important that the backup client encrypts before it copies the data to the backup storage.


(Giacomo Sanchietti) #11

I agree with @filippo_carletti about pro and cons of rsync.
But I still like the idea to use rsync as backup, but probably only experienced admins will be able to use it.

I personally backup almost our infrastructure machines with custom rsync scripts :slight_smile:

I’d start with a package with a couple of props before going deep into the UI.
But first, let’s how many people would use the how to.


(Saito Benkei) #12

Me too, and my customers’s servers too (rsync on removable disks/RDX or via smb share or via ssh on another server).


(Matthieu Gaillet) #13

I don’t understand. With a proper UI, wanna be admins could enjoy it’s power, or at least the way it is currently implemented : a simple, reliable and allround configuration.


(Rob Bosch) #14

AI agree with @pagaille. If there is a good (web)gui around the script, you don’t need to have too much knowledge to be able to use it.
Keeps the need for pre encrypted backup data. If that can be accomplished through rsync-time-backup I would love to see it in NethServer.


(Markus Neuberger) #15

What about using BackupPC, it supports rsync AFAIK?

https://wiki.nethserver.org/doku.php?id=module:backuppc


(Matthieu Gaillet) #16

Well, I don’t know. I’m far from a security expert, but if the data is encrypted end to end by rsync and then written to a encrypted media, where is the flaw ?

BackupPC is made to backup remote workstations, not the server itself, isn’t it ?


(Markus Neuberger) #17

You are right but you may change direction and use a remote repo but only unix like nfs. But it does the whole versioning and recovering stuff. I just was looking for a solution already there but there are also some rsync/rsnapshot web UIs:

http://furier.github.io/websync/

I’ll have to test the timesyncscript you recommended and if there is an easy way of implementation…


(Matthieu Gaillet) #18

There is. I did it :smiley:


(Markus Neuberger) #19

Awesome, it’s only the web UI part missing. I’ll test it asap…


(Giacomo Sanchietti) #20

Don’t get me wrong, I don’t want to stop the development of anything! Contributions like this are always very welcome! :smiley:

I was just saying that it will be a little hard to develop such web interface, especially for the restore part, but I will gladly try it out!