Proxmox HA chat

stephdl · January 1, 2020, 4:05pm

Hi mates

I recently had a new waves of curiosity with proxmox and how to avoid a SPOF (single point of failures). Indeed we could backup or rsync our data, the time to bring up a new server could be really long. This is a bug.
I am a long time user of PROXMOX, since years now, I use to have two instances, one at home for dev purpose (even if I migrated to virtualbox for dev, it is really fast on SSD) and one online to have some VM behind (debian, SME Server, NethServer). This is running smootly but even if I have some backup and some rsync scripts to go the data out, a recovery could take days.

The power of PROXMOX right now is snapshot, I trust in it, before to make an upgrade, do a snapshot, if something is wrong then go back to the former state. It is incredible, but I am not sure that a lot of people uses it professionally, I bet/worry/fear that still IT guys install NethServer directly on bare server.

So my purpose was to test some HA with Proxmox, of course during my holidays

this is what I read or testing it myself

Poor man HA (easy)
use two proxmox nodes in a cluster on ZFS array and replicate the VM on the other node. You can setup the sync for each minutes if you want, this is what you will loose, the data since the last sync. The cons is that if you have lost the node with the running VM, you cannot have the HA and the VM on the replicated host won’t pop up, because the HA must run with three nodes to avoid the split-brain (the same VM running simultaneity on two nodes). So either you have to issue three commands line to start the replicated VM, or use a Qdevice (debian based due to corosync3 with proxmox6), it is a third host that even you could probably run on a raspberry PI to honor the quorum. You can also have a third proxmox node, but the cost will increase.
Use a shared storage (good but not satisfying)
Use three nodes of proxmox with a remote shared storage by the network like NFS, this introduce a single point of failure, even if the HA can migrate online the VM, if you have lost the storage, all is down. The network is the bottleneck.
Use a distributed storage like cephs or drbd (need good skills)
This is real HA but the cost will increase massively, you have to get at least three real servers with two NIC, and the storage NIC should be in 10GB, because the storage is distributed between all hosts, it is a kind of network RAID, you loose one host, the VM can still run on the other nodes, and you do not have lost your data. This is really interesting but you need a deeply understanding of what you are doing and good trainings in sysadmin. Cephs is well implemented in proxmox, DRBD too by its developers

I am not a sys admin guy, how do you use proxmox to avoid a SPOF/HA

robb · January 1, 2020, 6:27pm

When going the cluster path for Proxmox we enter the more professional setup for a virtualization platform. And I like that!..
As soon you have multiple nodes for HA purposes, you also need a quorum data storage. This quorum disk is absolutely necessary if you want to do live migration of VM’s from 1 server to another.
Reading a bit on proxmox forums and wiki, it shows their documentation needs some serious updating…
this discussion might be of value here: https://forum.proxmox.com/threads/pve5-and-quorum-device.37183/
Unfortunately I can’t try ot test HA since I don’t have a 2nd server… :-/

fausp · January 1, 2020, 7:52pm

You could install a virtual 3-Node HA-cluster with ceph storage (HCI), just for testing. I did it for my A-Level exam… You have to configure nested virtualization on the one Proxmox Host who run the three other proxmox server (the cluster)…

edit: PROXMOX VE WITH CEPH – HYPER-CONVERGENCE

flatspin · January 2, 2020, 10:15am

I absolutely appreciate that topic, as I’m also using proxmox.
I’m using 2 real servers. One is a little Fujitsu T100 and the other is a newer X200 S8.
On the T100 runs the slave and on the X200 runs the production machine.
Sync intervall is 15 min. To loose the last 15 min is o.k. in our case.

I had to do a disaster recovery and had some troubles, but within about 2 hours I had a rudimental working environment back on the T100. No data was lost. Some hour later and with a little help I had also the vmail back and Sogo and some other stuff. That environment worked for 3-4 days until I had the X200 repaired. Than synced everything back and got the main machine back in production.
Real downtime only about 2 hours. For my use this is really o.k. in relation to the costs.

But some time ago there was @wahmed active in our community. I think he can give the most profound insight to this topic, as he is the author of “Mastering Proxmox”.

robb · January 2, 2020, 10:33am

At the proxmox forums and wiki I saw they used a rpi as 3rd server so full blown failover services could be added…

https://pve.proxmox.com/wiki/Raspberry_Pi_as_third_node

stephdl · January 2, 2020, 12:09pm

Yes but the warning do not use a raspb in production environment is not pushing me in that way

Like @fausp wrote everything can be tested on the proxmox itself, the server must be able about the nested virtualisation and all drivers/hardware must be in virtio

I know that proxmox has made a cluster simulator that you might install to test it

wahmed · January 7, 2020, 12:29am

@flatspin, I still keep a tab on Nethserver forum although I dont say much.

Proxmox HA came very long way since I first started using Proxmox years ago. It is so much easier now. But I have to say i am personally not fan of it in a large environment. It causes too much unnecessary shifting of VMs. Of course I am strictly talking about the Proxmox HA feature. If I am not mistaking, with the latest Proxmox release there is no need for a 3rd node for fencing.

As for general storage HA which @stephdl talking about I am understanding it right, Ceph in my opinion is the best storage distributed, shared storage for VMs. Again, works really great in large environment. For small environment, as low as 2 nodes Proxmox cluster i believe Replication works great. No need for shared storage, extremely budget friendly. Downtime is also extremely minimal and it works out of the box. Since the replication can be schedule every 5 minutes, the amount of lost work is minimized extremely. Using RaidZ2/Z3 locally on 2 nodes and Replication SPOF is greatly reduced.

We use Proxmox Snapshot strictly versioning purpose as it never gets backed up with a Proxmox full backup. Mostly for testing updates/patches on live VMs.

flatspin · January 7, 2020, 8:00am

Thanks for your reply. Good to know that you’re still around and hearing us.

fausp · February 29, 2020, 11:57am

Hi,
A few days ago I asked abt a 2 node cluster (2 Server and 1 NAS) for small networks/budges…
For HA you need always 3 nodes, the 3rd node doesnt have to be a full server. It just have to run the corosync-qnetd deamon for the quorum.

Another thing is the single point of failure when the vm-hdds are on the NAS. It is possible to use replication on the two servers but you have to use ZFS on it. I hope my translation from german to english is good enough

Please have a look:
Cluster mit 2 nodes?

stephdl · February 29, 2020, 12:10pm

Yes cephs is the key but it is three real servers

fausp · February 29, 2020, 12:20pm

Yes, a 3rd Server and the knowledge abt ceph that keep me in distance…

stephdl · February 29, 2020, 12:33pm

In that case I think that a node of two proxmox with a zfs file system and the replication is the easier way, you can even find an old desktop/laptop to make the second node

Andy_Wismer · February 29, 2020, 12:50pm

Hello Guys

As most know, I am a longtime user now of Proxmox. I use it mostly for my clients, all my NethServers out there are acting as AD, but are all running now in Proxmox. None are running native. I also have one at home for using, and one for dev stuff.

Just to warn people:

The 3 Node count for Proxmox HA is still valid - and that means three nodes up and running at least. For a node to be able to fail, and still have full HA abilities, you NEED 4 nodes!

With three nodes, if one fails - or you shut it down for maintenence (increasing RAM), you’ll find out that your HA Cluster has no Quorum anymore, and NO modifications are allowed anymore.
You can’t start any VM, nor can you stop any.
To do so, you need to use the sledgehammer method:

pvecm expected 1

This sets the expected Quorum count to one, which will result on that host to be usable again.
However, your cluster will now be split - the other nodes will not get any infomation anymore.

Proxmox uses a cluster filesystem in cluster mode. All nodes have exactly the same contents, status etc. This cluster filesystem is now only current on one Host.

Anyone can imagine that is not really where you want to go…

Proxmox HA Stats:

Running a full 4 node Cluster, on 7 year old HP Proliants ML110 G7. Local Disks are ONLY for Proxmox OS, all storage is on a NAS (Synology DS1819+ with 4 Disks RAID10 - WD Red Pro).
The NAS is connected with BONDING, the 4 Proxmox only have one NIC used each. Not ideal, but for a productive test.

A node failover (For a Guest OS running in Proxmox) takes roughly 90 Seconds!

This can be Windows, Linux or almost anything.

The real issue for Migration/Fail-Over is the amount of RAM allocated to that VM, and if it’s running (RAM content must be moved between both Proxmox Hosts). These values can be massively improved by using more current hardware, SSD Disks in the NAS (And Server, though they’re only used for Proxmox OS there…), BONDING the main LAN connection of Proxmox, separating the Cluster Network and maybe a separate Storage Network. Faster 10 GB NICs, and BONDING would be really nice!

Cluster or HA Cluster?

I would always do a cluster if more than one Proxmox server, and if at least 4 available, I’d use a HA Cluster.

Proxmox is great, and using it with or without HA has massively improved my clients uptime!

PS: NAS Storage can also be replicated, Synology has HA also built in. That would eliminate the storage SPOF on NAS (Both use the same IP).
However, I’d personally prefer a ZFS or CEPH Cluster. I’d fear file corruption on a NAS which doesn’t “know” WHAT those files are being used for!

My 2 cents
Andy

stephdl · February 29, 2020, 5:24pm

Yes the single point of faillure of your configuration is the NAS, (whatever synologie like or Freenas). I always figured what issue I would solve with HA and probably the hardware issue is not the worse, I mean four nodes to run the VM, I can loose one, but if I loose the NAS and morover the Data by a failling array, everything goes down. In French, touché coulé.

So for me the real HA needs a distibuted storage, did you implement it ?

However your testimonial is really interesting, you made funny things with Proxmox

Andy_Wismer · February 29, 2020, 5:50pm

Hi Stéphane

I have two clients who use modern, fast and powerful servers: 3 HP Proliant DL380 G10, two of them have 256 GB RAM, one has 128 GB RAM.

At the time, CEPH was still flakey, so i used ZFS.

The Proxmox Backup is done via a Storage Network to Synology NAS, which itself does a selfbackup to another NAS and to local USBs. Backup is at least 7 daily generations, and on three separate hardware each night, so Backup is covered! Next step would be VM Backups also Out of House, like the Data Backup.

There is a separate Cluster Network. We have two large managed Switches, redundant Main Power and two UPS, controlled by two Raspberries.
Each UPS power half a server…

We’ve had hardware issues, but NO headaches. One Server had the second batch of RAM delivered later, we could shut that server down without thinking, put in RAM and fire it back up.
We did have to move back the VMs manually, but with 60 seconds live migration times, no one notices.

The following may anwser a lot of questions users have when running Proxmox…

When using Proxmox and shared storage, what happens when one of them reboots (say a Synology with automatic updates…)?

About 15 clients of mine use this kind of setup: one or two, rarely three Proxmox, storage on a Synology, Backup on another Synology. Both Synologys are prepared to take over “Prome Storage”. And both have automatic updates…

So what actually happens?
I mean: the VMs hard disks just disappears for a minute or two?

Yes, but I can assure everyone here: Proxmox caches the Data locally VERY intensively. Depends on RAM and Swap of that server, too. The short 2-3 minutes a Synology takes to update and reboot is NO Problem!!!

Even SQL Servers are consistant.

But don’t try rebooting a VM at that moment your Storage is down!
Then it’s a goner, and you verify how good your backups are!

In Short: 3 Years with Synology as Shared Storage, roughly 3-4 Updates and Reboots / Year and NEVER any Problems!

I’d NEVER want to miss Live Migration again!
Even without HA, it’s worth making a cluster!

You have NO idea!!!

But it’s amazing WHAT one can do!
Didn’t succeed with setting up an old Novell Netware 6.5, the last version which runs very well on ESXi and Xen. But as it seems, no one did so far…

My 2 cents
Andy

danb35 · February 29, 2020, 6:30pm

Do you know of a “for dummies”-level guide on this? I’ve been running Proxmox for a few years now on two separate systems (well, two blades of a Dell C6100 system), but it’s just two separate instances, both using a combination of local ZFS storage and NFS to a FreeNAS box. It’d be nice to have them working together.

Andy_Wismer · February 29, 2020, 6:45pm

@danb35

Hi Dan

No

is actually needed!

The real advantage is fast live migration, a second one is Storage and stuff like Backups need only be defined once. And redundant Web Interface, you can login to any Proxmox, and have full command of your cluster!

Actually, you only need to setup the cluster. There are no further requirements, except that both Proxmox have to “see” each other on the network. A form of Shared Storage helps to achieve fast live migration times, and also for Backups…

Here’s the code needed, according to my personal “cheat sheet” for Proxmox:

Note: VMs on the first cluster member can remain. Remove all nodes off the node2 with backup, or read below…

Create the cluster

Login via ssh to the first Proxmox VE node. Use a unique name for your cluster, this name cannot be changed later.

Create:

hp1# pvecm create YOUR-CLUSTER-NAME

To check the state of cluster:

hp1# pvecm status

Adding nodes to the Cluster

Login via ssh to the other Proxmox VE nodes. Please note, the nodes cannot hold any VM. (If yes you will get conflicts with identical VMID¥s - to workaround, use vzdump to backup and to restore to a different VMID after the cluster configuration).

WARNING: Adding a node to the cluster will delete it’s current /etc/pve/storage.cfg. If you have VMs stored on the node, be prepared to add back your storage locations if necessary. Even though the storage locations disappear from the GUI, your data is still there.

Add the current node to the cluster:

hp2# pvecm add IP-ADDRESS-CLUSTER

For IP-ADDRESS-CLUSTER use an IP from an existing cluster node.

To check the state of cluster:

hp2# pvecm status

A reboot of both nodes helps!
That’s it!

After reboot, you can click on ANY host, and use migrate…
Note: The target Proxmox should have RAM and space for the VM… (!)

The rest works as you’d expect from Proxmox.

Removing a node (defective, old age, replacement) should be done strictly according to the Proxmox manuals - not forgetting the “nodes” folder in /etc/pve…
(That causes the removed node to still show up in the GUI)

A few more tips:

This also should be set:
nano /etc/vzdump.conf
tmpdir: /tmp

→ Improves Backup times to NAS

This may be needed in case your cluster isn’t quorum (like removing a node…)

pvecm expected 1

If quorum gets lost, the cluster filesystem becomes read only.
Then the cluster filesystem becomes read/write again!

In Cluster mode, Backups “wait” for another backup running on the same host. Different hosts can do simultaneous backups - if your NAS is fast enough. Otherwise it still works, can be a bit slower, though FreeNAS should be ok from a performance view. Depends also on Disks or SSDs inside…

Another good idea is using say 1xx VMIDs for VMs, and 2xx VMIDs for Linux Containers.

My 2 cents
Andy

danb35 · February 29, 2020, 6:57pm

What if the VM IDs don’t overlap? The first system has IDs of 101+; the second start with 201.

Also, as I mentioned, most of my VMs are on local ZFS storage. I’m using the pve-zsync script to back them up to my FreeNAS box. If I implement/use “a form of shared storage”, I’m pretty sure I wouldn’t be able to continue backing up the VMs in that way.

And speaking of shared storage, given that I’m using a FreeNAS box for that storage, is there any good option other than NFS? As I understand it, an iSCSI extent can’t be safely used by more than one client machine at a time.

Andy_Wismer · February 29, 2020, 7:11pm

I’d do a backup of all VMs on node 2 BEFORE entering it into the cluster.
Better to have one backup too many, than one too little!

The storage config of node2 is lost, but all data are kept in situ, so you only need to redo the config files for the VMs. I do like being lazy here, and simply restore them with a VMID i can choose (Often these installations weren’t planned to the extent they grow to, this gives an option to consolidate VMIDs).

NFS NAS connection:
I’d keep fingers away from “shared” iSCSI. It’s possible, but you’d need SAN grade Storage, not NAS. And I just don’t have that kind of budget for that!
Besides which:

A few stats for comparison, the so called “Reality check”!

At a doctors place, we set up a new HP Server ML380 Gen10 with 64 GB RAM as main Proxmox, the old Supermicro with 32 GB RAM is a second node, not HA.
Storage is on a 4-Bay Synology, with 4 2TB SSDs in RAID10. (Cold Spare in cupboard) 8GB RAM.
Backups are on a 6-Bay Synology, in RAID5 with Hot-Spare. 32 GB RAM. This one also syncs Backups home to the doctors, where another Synology is waiting.
NAS are 2-NIC Bonded, the new HP has bonded NICs to the Switch. Switch is a managed Cisco.

Installing Win10 from an ISO Image just for fun (We had golden masters ready), just because the system was so fast i wanted to see HOW fast.
To be honest, it set up Win10 on the Storage NAS faster than a new Notebook with SSD local storage!!!
OK, the Notebook didn’t have an ISO Image via NFS, it had a DVD in the local drive.

As usual: It’s NEVER a good idea to have Storage and Backups on the same Disks!
But if I have 2 NAS, I prepare both for Storage AND Backups.
With more than one Proxmox and NAS, Backups can be done parallized…

As to clustering your Home Proxmox setup: I’d say invest those 15 minutes! Set up your Cluster and enjoy a fast migration, even if it’s just for testing!

Don’t let Synology Backup touch the DiskImage folder! Use ONLY Proxmox Backup - if you value non-corrupt VM images!

NFS is actually quite performant…

My 2 cents
Andy

danb35 · February 29, 2020, 7:24pm

Well, it’ll be more than 15 minutes’ work in any event, but that doesn’t mean it can’t or won’t be done. Both systems have 10G network connections, as does the FreeNAS box.

But this leads to another follow-up question: Currently, these nodes run most of the VMs locally–the primary one has a two-disk ZFS mirror; the second node has a single-disk ZFS pool. If I’m going to cluster them, and put all the virtual disks on shared storage, I don’t have any need for that much local storage, so I might as well (as you’ve done) put the OS on a SSD rather than on spinning rust. But if I’m going to do that, I’m thinking I’d be just as well off (if not better) backing up all the VMs, doing a clean install to SSD for node 1, then node 2, then restore. Thoughts on this idea?