Hi mates
I recently had a new waves of curiosity with proxmox and how to avoid a SPOF (single point of failures). Indeed we could backup or rsync our data, the time to bring up a new server could be really long. This is a bug.
I am a long time user of PROXMOX, since years now, I use to have two instances, one at home for dev purpose (even if I migrated to virtualbox for dev, it is really fast on SSD) and one online to have some VM behind (debian, SME Server, NethServer). This is running smootly but even if I have some backup and some rsync scripts to go the data out, a recovery could take days.
The power of PROXMOX right now is snapshot, I trust in it, before to make an upgrade, do a snapshot, if something is wrong then go back to the former state. It is incredible, but I am not sure that a lot of people uses it professionally, I bet/worry/fear that still IT guys install NethServer directly on bare server.
So my purpose was to test some HA with Proxmox, of course during my holidays
this is what I read or testing it myself
-
Poor man HA (easy)
use two proxmox nodes in a cluster on ZFS array and replicate the VM on the other node. You can setup the sync for each minutes if you want, this is what you will loose, the data since the last sync. The cons is that if you have lost the node with the running VM, you cannot have the HA and the VM on the replicated host won’t pop up, because the HA must run with three nodes to avoid the split-brain (the same VM running simultaneity on two nodes). So either you have to issue three commands line to start the replicated VM, or use a Qdevice (debian based due to corosync3 with proxmox6), it is a third host that even you could probably run on a raspberry PI to honor the quorum. You can also have a third proxmox node, but the cost will increase. -
Use a shared storage (good but not satisfying)
Use three nodes of proxmox with a remote shared storage by the network like NFS, this introduce a single point of failure, even if the HA can migrate online the VM, if you have lost the storage, all is down. The network is the bottleneck. -
Use a distributed storage like cephs or drbd (need good skills)
This is real HA but the cost will increase massively, you have to get at least three real servers with two NIC, and the storage NIC should be in 10GB, because the storage is distributed between all hosts, it is a kind of network RAID, you loose one host, the VM can still run on the other nodes, and you do not have lost your data. This is really interesting but you need a deeply understanding of what you are doing and good trainings in sysadmin. Cephs is well implemented in proxmox, DRBD too by its developers
I am not a sys admin guy, how do you use proxmox to avoid a SPOF/HA