NethServer rsync-based procedures

davidep · April 16, 2018, 4:03pm

I want to propose the unification of the available methods to synchronize two ns7 instances and to upgrade ns6 to ns7 via rsync.

We developed both Upgrade with rsync and HotSync modules over Disaster recovery and the rsync command. By looking at how they are used, and how they are implemented I see they have a lot in common and I think we can make them one. For instance HotSync is said to be designed to quickly recover an hardware failure:

HotSync aims to reduce downtime in case of failure, syncing your NethServer with another one, that will be manually activated in case of master server failure.

…However I see it is used also as a tool to migrate to a different hardware. This is the same use case of ns6 upgrade via rsync procedure: migrate a ns6 system to a new ns7 system, on a different (newer) hardware. This is what the manual says about the rsync upgrade procedure

The process is much faster than a traditional backup and restore, also it minimizes the downtime for the users.

Furthermore if the ns6 hardware is still good I see somebody uses the two procedures together, following a two-steps upgrade path:

temporarily upgrade ns6 to ns7 via rsync on a spare system
use hotsync to move back the spare ns7 system to the ns6 hardware

This is actually equivalent to a live upgrade of ns6 to ns7, which is still under development.

By unifying the two procedures we can simplify both of them and reduce the risk of error due to the great number of manual operations and checks that they require today. I think we’re still too far from “simplifying the sysadmin’s life” on this field

What do you think? Can it be useful? /cc @dev_team

stephdl · April 16, 2018, 10:16pm

The affa rise like feature is really nice, I love it, also the rsync way to migrate a server from ns6 to ns7, I guess we could have less problem (at the dev level) to do a migration path than with a migration script where a lot of parameters must be analysed and debugged.

I do not know enough to help you at the moment to decide which one should stay alive, probably my heart could go to hotsync for two reasons

nethserver-cockpit-hotsync
rise feature

davidep · April 17, 2018, 8:21am

Thanks for your feedback Stéphane, really appreciated!

I don’t know Affa, it seems an old project: is it still maintained?

I like a separate command for the “raise” operation, too, like hotsync does. I’d like a clear distinction between the two main operations:

sync (runs multiple times)
raise (runs once)

Both ns6upgrade and HotSync are really complex procedures that deal with running systems. They aim to minimize services downtime. They are run in a critical situation for a sysadmin: system downtime. I think our mission is to ease the life of the sysadmin in such critical situations, with a simple and reliable tool – no matter its complexity from developer’s point of view.

I want to address specific situations that make the life difficult during the raise operation, with automation, pre-condition checks, and additional setup.

IP duplication: during “raise” the source and destination have the same IP. Result is destination fails to set up the network
Network card renaming: due to different kernel release, network devices have different names from 6 to 7, but also could be different due to hardware differences. Manual intervention required
(specific to ns6upgrade): the Accounts provider may require manual AD setup during the “raise” procedure. Many forgot it!
post-backup-data event must be manually signaled at the end of the procedure. Many forgot it!

Most of the issues above affects also the Disaster recovery procedure. Maybe we can start by improving it!

About the sync phase I prefer the ns6upgrade approach over the HotSync one. The destination (aka “slave”) host has the control of everything, and no additional set up is required on the source system. The only exception is: installing a SSH key. I don’t know why HotSync runs a separate rsyncd service. I heard from @Stll0 and @alep that originally HotSync was over SSH too. I’d like to use SSH. What do you think?

stephdl · April 17, 2018, 8:33am

Affa is no more maintained, it is a huge rsync program like hotsync. It was at the epoch really well implemented in SME Server, hence the rise feature.

Stop the master, launch the rise feature on the backup, and in a few minutes your server is back and functional…the dream of any sysadmin. However you have now a ton of issues because the project is down from the main developer, a frenchy took back the project now, but I don’t know much on it today

stop the server before to rise…ideally you want the same IP, so shut down the master

m.traeumner · April 17, 2018, 8:37am

A question from a non developer:
Can’t you take another IP for the new server and after shutting down the old one starting a script which changes the IP at the new server to the old one?

It would be also nice for a permanent syncing.

Create a script with IP settings of the first server
Copy it to the second one
If the first server dies, the admin only has to run the script and it does the changes.

danb35 · April 17, 2018, 9:11am

Not necessarily–there’s a very limited set of circumstances where this is the case, and a much larger set where your target machine is going to have a different IP. Hotsync (or whatever it becomes) really needs to account for this.

davidep · April 17, 2018, 9:39am

To avoid a conflict a simple IP ping check could save the job… As alternative, disable the free IP check in CentOS “network” script (is it possible?)

This is what the current procedures already do! We must ensure the old server is down before changing the IP.

Can you make some examples?

filippo_carletti · April 17, 2018, 9:56am

Yes.
ARPCHECK=no

danb35 · April 17, 2018, 12:59pm

Sure, my own setup for example. My Neth server is on a VPS instance, and it hotsyncs to a physical machine at my location. The IP for the VPS won’t work at home. The same is the case if I use the hotsync script to migrate to a different VPS provider.

davidep · April 17, 2018, 1:10pm

Never thought about using HotSync in cloud Nice idea though…

Did you ever try to promote the slave?

giacomo · April 17, 2018, 1:14pm

And this is a non-supported and bad usage scenario, we shouldn’t push it further

I agree that manual steps should be removed in favor of a wizard inside the UI which examine the machines and takes preliminary steps. Something like: what do you want as IP of the target? Do you want to install AD on the target? Etc …

I don’t agree at all with this: more complexity means more bugs, more bugs mean that users think that the project is not stable enough. We already did the same mistake by joining ns6upgrade inside the back restore procedure, I don’t want to do it again.

Because different scenarios use different solutions We had a previous implementation using SSH but it was very tricky and much much less secure.

I’d prefer a very different approach:

add a better UI to the hotsync procedure with a list of checks before configuring the server
get rid of rsync-upgrade and implement minimal in-place upgrade

davidep · April 17, 2018, 1:20pm

Did you mean “ns6upgrade inside backup/restore”?

giacomo · April 17, 2018, 1:28pm

Yes, thank you! Edited

danb35 · April 17, 2018, 1:30pm

How is it a bad scenario? It’s literally what hotsync is designed to do–move the running Neth instance (in as close to real time as possible) onto a different machine.

giacomo · April 17, 2018, 1:57pm

The hotsync is designed for NS 7 to NS 7 and not, in a restricted scenario. For example, green + red configuration are not really supported inside the hotsync.
It supports a limited usage scenario, narrower than NS 6 upgrade procedure.

Every time I saw it used for NS6 migration, users did’t thought about differences between procedures and failed.
Of course you can use hotsync for any type of crazy stuff, but the user must have too much hig :(h skills.

This is why I prefer to leave things separated.

davidep · April 18, 2018, 11:17am

I think we are in a situation similar to this picture

We shall improve it!

This is the core of the problem, and it’s outside of rsync-based procedures. It affects the core of them: backup/restore, which implements also restore from ns6 (upgrade).

The ns6upgrade from backup was discussed here: Upgrade paths to ns7

If we implement a live-upgrade path, we’ll rely on that anyway.

In other words, a development effort is required on backup/restore to simplify both ns7-ns7 and ns6-ns7 procedures.

We recently tried to improve the gateway case:

It’s surely more critical than a standalone server but I expect it’s widely spread because NS is “all-in-one”.

Anything that is different from cloning by backup/restore becomes making an “instance of” a template, obtaining a new system with slightly different settings (IP? hostname?).

I don’t see the need of supporting this kind of transformations, if we have a procedure that correctly clones a system. It can be tweaked at a later step manually. In other words, if backup/restore (also by rsync) gives back a running system, its “details” can be adjusted manually.

Do I miss something?