Well… may I be puzzled about this?
Use locally hardware storage for backup can be understandable, however having multiple devices for storing data outside the container makes sense on a systemistic point of view (bigger an cheaper storage for space-eager application) but on the other hand…
Can you afford the lost of that data if storage fails?
Are you going to use a RAID solution for your MinIO endpoint?
When you’ll move container from your current dock to the next one… which will be the migration path?
The paradigm shift from one server with services to container orchestrator increase the necessity for system design and systemistic tasks for the end user (sysadmin)… so why a multiple container location is not already in your current setup?
One for speed intensive containers.
One for space intensive containers.
That is exactly what I want to accomplish. As this storage can be mounted remotely from a storage server, I fail to understand your concerns regarding backup (raid storage server with snapshots, backup etc. can be used) or migration of the container to another host (just mount the storage on the other host). Why wouldn’t that work?
Still, I ask for this as an option, so noone would be forced to use it. But in certain scenarios (storage server is already there, for example), I think the feature could be beneficial.
Ok, what I want to achieve w.r.t the user data (e.g. many TBs of data in nextcloud) is the following backup concept:
one copy of the data on a machine A, including previous versions (e.g. snapshots). Might contain the live data.
another copy on a machine B, also including previous versions. This copy is a pure backup and should not contain any live data
an offsite backup of important files/directories
With my current setup, I do the following:
live data is on a NAS (machine A, e.g. Truenas), mounted via NFS to the server running my services such as nextcloud. It is automatically snapshotted on the NAS, so I have a version history.
the complete data is mirrored on another NAS, machine B, including the snapshots.
machine B does an offsite backup of crucial contents.
Storage requirements are thus 2x the user data (1x on machine A, 1x on machine B) plus the demands for the snapshots.
With NS8, I need to do the following:
Setup a nethserver machine/VM with enough storage to hold all user data in /home.
do a full backup of NS to another machine (machine A). This backup can hold previous versions of the data.
do a copy of this backup to a machine B.
As a consequence, I need to provide storage capacity for 3x the user data (1x for the nethserver, 1x for machine A, 1x for machine B), plus the demands for the snapshots.
Additionally, restore seems much more complex, since in my current setup, if e.g. a user accidentally deletes a file in nextcloud, I can directly restore it from the zfs snapshots on machine A. In case of the nethserver solution, a full restore of the S3 backup seems to be required in order to get access to the single file. In case of several TB of data, this is not an efficient process (and requires storage space for yet another full copy of the data…).
Summarized, I see that with NS8
storage demands increase (3x capacity for all user data needed, compared to 2x if live data is directly on e.g. Truenas)
usability gets worse
Maybe I just did not really understand your concept , but for me it looks like the backup/data-handling concept in NS8 comes with a lot of drawbacks / added complexity. Can you give advice how this is meant to be in NS8?
PS: I am aware that I did not talk about config/state data of the services like e.g. the nextcloud db. I left it out, because a) this data is rather small and I do not care so much about additional copies, and b) I want to understand the concept of NS8 first before going into too much detail.
I think a copy to machine B is not strictly required, as you already have a local backup copy in machine A and an off-site backup copy. Restic repositories provide a configurable number of snapshots. It does not look so different from your current setup and does not need a remote filesystem.
I think comparing ZFS with a backup software is not really fair.
Yes, we are missing a tool to do a selective restore with ease. We know the Restic engine can list backup contents and do a selective restore. However in practice this kind of feature is difficult to handle: I must know what I want to restore, where it was stored at file system level. This involves the knowledge of how the application organize data in the filesystem, thus it cannot solve the general problem easily.
As described in the Backup and restore — NS8 documentation page, the NS8 backup supports multiple repositories (locations). For each repository you can schedule one or more backups with a specific snapshot retention policy.
The full state of every application can be backed up and restored separately. It is not designed to work on single files. For this purpose, at least for the Mail module, Steph is working on a mail archiving prototype based on Piler.
In my point of view, ZFS snapshots is part of the competition nethserver has to face. When it comes to backing up/restoring of files for file sharing services (such as samba or nextcloud), ZFS does this really well.
Of course, the snapshots do not include the server config etc., which has to be backed up separately, so it is not a full backup solution. But the config data is small, compared to the volume of the files in the file sharing services, and changes less frequently. That’s why I would like to separate between config backups and data backup. And I would really like to use snapshot technology (which today basically any NAS provides) for the data part, because it is efficient.
Snapshot is a tool. A really powerful one.
If it’s made by the filesystem, the database, or the whole application, can be used to create a polaroid of the system/file/softare to revert if needed, and when it’s time, needs to be correcly managed (consolidation) or it will eat up storage space if the snapshot are kept forever.
But like RAID and virtualization and containerization, is not backup, because cannot be “extracted” from environment, which is needed to properly work as intended.
If this strategy (snapshot) and the requirements (ZFS) serves you as you wish and design, that’s fine, but only because you find the lathe an effective tool to open a food cans does not mean that can openers are worthless because cannot to all the things a lathe is useful to.
All I am asking for is that nethserver considers to provide alternatives, because there are use cases where alternatives are valueable. It is really confusing me that this leads to such long discussions.
What is wrong with providing such an optional feature, as NS8 already has it for minio, also for other apps where it could be useful?
Well I digged Podman documentation and I found a recipe that should work for any rootless module, more or less.
I assume the disk where we want to store the module data has been already formatted, configured in /etc/fstab and mounted on /mnt/disk00
In general, after installation (creation) module instances are in a stopped state. They require an additional configuration step to start. In this state they still have not created the volumes where persistent data is stored.
In this case, it is possible to create the expected volume in advance, providing the configuration that bind-mounts an arbitrary path of the node.
Let’s make an example with Dokuwiki. When it is started for the first time it creates a dokuwiki-data volume. Let’s bind it to /mnt/disk00.
# module must have full access to the disk, like its home directory:
chown dokuwiki1:dokuwiki1 /mnt/disk00
chmod 700 /mnt/disk00
# create the named volume, with the name Dokuwiki wants
runagent -m dokuwiki1 podman volume create --opt=device=/mnt/disk00/ --opt=type=bind dokuwiki-data
Now complete the configuration of Dokuwiki from the UI as usual.
It seems straightforward so far, but what happens if I have data in the disk and I want to attach it to the container? For instance, data coming from another Dokuwiki?
In this case there can be a disalignment of uid/gid numbers in the filesystem and a full remap of files ownership is required. This is a common problem with containers because of uid/gid namespaces and it is an open issue in this scenario
Thank you for the information! I tried it today and it seems to work.
Do you have an idea how the volumes of the app can be displayed when it is unconfigured, such that it becomes clear which volumes can be created as a bind to another location?
I tried runagent -m APPNAME podman volume ls, but it is only working after the app has been configured and is running.
After doing the configuration, it can be checked with runagent -m APPNAME podman volume inspect VOLUMENAME, which displays the bind volume. The nethserver UI displays the volume just normal.