Proxmox HA chat

Andy_Wismer · April 24, 2020, 9:14am

Life is learning. As long as you live - ideally!
And there’s always room for improvement.

“It’s carved in rock” is a saying, and it’s good for eg archives for long term storage.
Our present digitized storage doesn’t last even ten years, the old egyptians actually carved a lot of know how in stone, and it’s still mostly all there. You only need to be able to read ancient egyptian!

But other stuff in life is a constant adaption and improvement…

My 2 cents
Andy

Andy_Wismer · April 24, 2020, 9:20am

@oneitonitram

Snapshots: Not necessarily on the main server, as @danb35 said, the snapshot is where the VM “lives”, this can and often is on shared storage (NAS, SAN, CEPH…).

Also: Shared Storage for VMs are usually faster than storage for Backups…

Andy_Wismer · April 24, 2020, 9:27am

@oneitonitram

Live Migration on Proxmox:

2 clicks in GUI
90 Seconds real time
Still accessible in that time - users don’t notice!

What Headache?

@danb35 just confirmed it at his home setup!

Proxmox requires 1-2 reboots a year due to updates. With a HA setup, all VMs are migrated live, using rules, groups, fencing and what’s available (It is quite easy and fast to setup).
Not even need to think about rebooting a Proxmox host!

danb35 · April 24, 2020, 9:33am

Now I’m reaching a point of ignorance–given that I have a three-node cluster, is there anything else I need to do to make HA work? Can I just reboot node 1 and have everything migrate off it before it reboots, and then back onto it after it’s up again? Or is there other configuration needed to make this happen?

Andy_Wismer · April 24, 2020, 9:42am

@danb35

That’s what high availability is all about. Besides maintenence, the main reason are hardware failures. At 02:00 in the morning, when Murphy plans most outages, there’s usually no one at the helm. The system is on Autopilot, so to say.

Autopilot needs a few simple rules, to define the when & what happens (Live migration if planned downtime) when a node goes down.

You need to think about it - more if your Proxmox differ in RAM / CPU power. In your case, all 3 are equal, so saves trouble and planning. Your Fencing rules can be kept simple…

Fencing

Fencing is an essential part for Proxmox VE HA (version 2.0 and later), without fencing, HA will not work. REMEMBER: you NEED at least a fencing device for every node. Detailed steps to configure and test fencing can be found here.

Another option is grouping stuff, like in priorities, or resource requirements…

This is one of the most important Settings in achieving this:

Bildschirmfoto 2020-04-24 um 11.39.45

The delay / timeout prevent overloading IO at host boot time, and timeout is for shutting down (think UPS!) when an App is hanging or blocking the shutdown…

Have a look at the options, I think you’ld get along with this really fast.

You’re driving a Ferrari now, Dan, and it’s time to shift to fourth gear!

See also:
https://pve.proxmox.com/wiki/High_Availability_Cluster
Note: It will work well with only one network, but a separate cluster network IS better!

Andy_Wismer · April 24, 2020, 10:08am

BIOS in Proxmox VMs

The BIOS feature in Proxmox offers more features than just the simple choice of MBR boot…

You can use a Proxmox VM as a test for PXE boot environments, you can integrate any VM in FOG-Project easily with this - if eg imaging/cloning outside of Proxmox is needed.

The usual BIOS Options are available, if needed.

This works very well, I use a VM for testing and developping PXE environments…

Andy_Wismer · April 24, 2020, 10:18am

Proxmox and HP ROK Servers (Or Dell)

If you buy a new HP or Dell Server, there’s often the option to get a discounted MS Server thrown in. If needed, this can be a good price discount.

HP calls this ROK, Dell has it’s own name.
Proxmox can support this quite easily!

The Trick is installing dmidecode.
apt-get install dmidecode

dmidecode > dmi.txt
(This will give you the HW license, to copy paste into Proxmox)
TIP: search for UUID…

Handle 0x008A, DMI type 1, 27 bytes
System Information
Manufacturer: HPE
Product Name: ProLiant ML350 Gen10
Version: Not Specified
Serial Number: CZJ123456H
UUID: 36373738-3132-5A43-4A39-123456789548
Wake-up Type: Power Switch
SKU Number: 812345-123
Family: ProLiant

The info is all there and labeled.

Note: this works very well!

danb35 · April 24, 2020, 2:45pm

…and playing with the LXC containers has me thinking about having one or more exposed to the Internet, which has me playing with HAProxy on my pfSense box… So much to learn, so little time…

royceb · April 24, 2020, 2:54pm

Best thing since sliced bread for me with their wild card Let’s Encrypt and acme dns based authentication & auto-renewal (no exposed port 80) all from the web gui. I’ve been introduced to tons of cool technology by Lawrence Systems like XOA/XCP-NG and he recently produced a video specifically about this subject.

danb35 · April 24, 2020, 2:58pm

Yes, I know–I haven’t watched that one yet, but I have watched its predecessor. I tend to avoid wildcard certs, though (although I use DNS validation for almost everything), preferring individual certs for specific services–and the method he described in the prior video just didn’t do the job–but I think I have that working now. Just not clear why I can access one of the systems from inside my LAN, but not the other–if I VPN to somewhere outside my LAN, I can access both easily.

Even stranger: openssl s_client connects to both and negotiates TLS without a problem, but a web browser only connects to one.

mrmarkuz · April 25, 2020, 1:36am

I really appreciate your work and that you share your experience about proxmox (and many other stuff). I installed it some time ago due to your recommendation and now I tend to replace my ESXi with another Proxmox to get a second node.

What do you think about opening a wiki page about your Proxmox tutorials? This way we have the whole information on one page and you can easily add new things on demand.
If someone has a question you can give a link to a wiki paragraph and if needed add a new one.

I am afraid that much of your expert knowledge gets splitted up and lost in the deepness of the forum.

Andy_Wismer · April 25, 2020, 2:30am

@mrmarkuz

Good idea!
As NethServer also uses KVM underneath, like Proxmox, a lot of info is also valid for NethServer Virtualization.

How to start?

Thx
Andy

mrmarkuz · April 25, 2020, 2:43am

I’d recommend a howto topic here or a page in the wiki. It depends on your favourite wiki syntax.

https://www.dokuwiki.org/wiki:syntax

Andy_Wismer · April 25, 2020, 2:45am

@mrmarkuz

Need to read thru a bit, but that’s for the morning, after coffee!

Andy_Wismer · April 25, 2020, 3:01am

@mrmarkuz

Hi Markus

Do put both Proxmox into a cluster, that way you get with two nodes a Live Migration Cluster when using shared storage.
Live Migration time, depending on RAM size of VM, ca 90 seconds!
Note: This value is for a Proxmox with a single (or Bonded) 1 GBe Network, the NAS (Synology) Bonded. It gets better if using separate cluster network.

Dan is now using a full Proxmox cluster, I think he loves the fast live migration and is in the process of moving to full Proxmox HA Cluster.

My second Proxmox at home is in operation, this is dedicated to testing (like a second NethServer environment where NethServer is my firewall…)

Andy

sektor · April 26, 2020, 10:42pm

Hey Andy hope you are having a good weekend so I believe what I can do based on what I am reading in ceph documentation I can prepare part of a disk using ceph-disk prepare and just specify a partition then the remaning disk tie it to the vm. Although I did run into something interesting on one of my nodes and I wasn’t seeing one of my ssd’s when I go to create an OSD, but if I do an fdisk -l I see it so I am looking into that.

Andy_Wismer · April 26, 2020, 11:12pm

@sektor

Hi David

Doing a P2V migration right now - and since neither real nor virtual PCs have a crank to make it run faster, i’m just letting the 4 VMs “do their work”. Using Clonezilla for this project, step one, cloned to NAS, now 3 restores to Proxmox are done, removing unneeded drivers now from the first 3. Number 4 is still restoring. As it’s nighttime here, the Proxmox are also doing backups, so everything is a bit slow…

At your end you’ve got now Proxmox on all 3 hosts, and also a “virtualized” proxmox (Node 4)?

The weekend was great, only Sunday evening got a bit stormy - not much rain hereabouts yet, in the mountain area much more!

Multitasking:

Resizing C Disk
Cleaning up unneeded drivers / software
Booting Win10
Restoring with Clonezilla

Andy

sektor · April 27, 2020, 12:50am

Aren’t P2V migrations fun, so I am a little stumped right now my physical host 2 saw all hard drives during install but it was marked as not being used when I created a zfs pool just like the other hard drives.

For some on reason no where in the proxmox console is the drive useable. See below
/dev/sdc is missing
/dev/sdc is missing

Output from fdisk -l /dev/sdc
Disk /dev/sdc: 447.1 GiB, 480103981056 bytes, 937703088 sectors
Disk model: KINGSTON SUV5004
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Picture from the disks section of the console. Notice how the usage is partitions, but as noted as above there is nothing on it.

Andy_Wismer · April 27, 2020, 12:52am

try a reboot?

sektor · April 27, 2020, 1:15am

Yes I did I even reinstalled the whole OS. A night owl much, being it is about 03:30 there and we had bad weather early Friday morning.

Oh just an fyi I’ve had proxmox on all 3 physical hosta working on installing it as a virtual, but then ran into this conundrum with the one ssd on my 2nd host, so I reloaded proxmox on the server not only because of that but it was behaving a little weird.

Now this is interesting, I read on proxmox forum that someone had a similar issue and this is what lsblk shows.

lsblk /dev/sdc
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdc 8:32 0 447.1G 0 disk
└─sdc2 8:34 0 4.3G 0 part

Figued it out apparently it was on old atari signature for some odd reason, so once I wrote to the disk without creating a partition using fdisk then the disk was useable. Now onto figuring out ceph, a decent document would be useful. LOL.