VM start fails after Proxmox update

capote · June 22, 2020, 6:01pm

Hello, I have installed the update to pve-manager/6.2-6/ee1d7754 on two nodes (Community Edition). After that, the VMs won’t start anymore. Neither automatically nor manually.
But inside the Task Manager the status is “ok”

Node 1 with subscription remained untouched on pve-manager/6.2-4/9824574a.
What can I do to get the VWs running again?
Which information is still needed?

Best regards, MArko

fausp · June 22, 2020, 6:19pm

You could boot into an older Proxmox kernel until you/we can find the problem?

Andy_Wismer · June 22, 2020, 6:30pm

@capote

Hi Marko

I have a few “No-Subscription” Porxmox around, and my official Home-Proxmox (With Subscription). Versions same as yours, 6.2-6 on Non-Subscription, 6.2-4 with Subscription.

At the moment, all VMs start as expected.

If you move a non-starting VM (For testing) to the Proxmox 6.2-4, will the VM start without issues?

Are there any relevant log entries for Proxmox / QEMU?

My 2 cents
Andy

capote · June 22, 2020, 8:17pm

Hi Andy,
simply and stupid… I don’t know how to move without migrating. Migrating is not possible.
I tried to delete the VM100 - no success and no error messages.
Then I tried to restore the VM100 (Zabbix-Server) VM from Backup. That produced the error “unable to restore VM 100 - can’t lock file ‘/var/lock/qemu-server/lock-100.conf’ - got timeout (500)”
The second attempt immediately after the error message restored the VM and the VM100 started well.

In the next step I tried to restore VM200; finally successful but with some hickups:

restore vma archive: lzop -d -c /mnt/pve/VZDump-Backup/dump/vzdump-qemu-200-2020_06_16-00_00_03.vma.lzo | vma extract -v -r /var/tmp/vzdumptmp4661.fifo - /var/tmp/vzdumptmp4661
CFG: size: 701 name: qemu-server.conf
DEV: dev_id=1 size: 34359738368 devname: drive-scsi0
CTIME: Tue Jun 16 00:00:04 2020
error during cfs-locked 'storage-Disk-Images' operation: got lock timeout - aborting command
Formatting '/mnt/pve/Disk-Images/images/200/vm-200-disk-0.qcow2', fmt=qcow2 size=34359738368

I am curious to know what the cause might have been in order to prevent a repetition.
Thank you, Marko

capote · June 22, 2020, 8:27pm

Hi fausp,
I operate my nodes headless. How can I remote boot into another kernel?

Andy_Wismer · June 22, 2020, 8:27pm

@capote

Hi Marko

High time, that you activate this…
-> Fast Migration Cluster

Requirement: All Proxmox are in the same Network (No other requirements!)

All three should be at the same update level, or at least very close.
Next step is to create the cluster.

Here from my personal Proxmox Cheat-List:

Create the cluster

Login via ssh to the first Proxmox VE node. Use a unique name for your cluster, this name cannot be changed later.

Create:

hp1# pvecm create YOUR-CLUSTER-NAME

pvecm create PVE-CLUST

To check the state of cluster:

hp1# pvecm status

Adding nodes to the Cluster

Login via ssh to the other Proxmox VE nodes. Please note, the nodes cannot hold any VM. (If yes you will get conflicts with identical VMID¥s - to workaround, use vzdump to backup and to restore to a different VMID after the cluster configuration).

WARNING: Adding a node to the cluster will delete it’s current /etc/pve/storage.cfg. If you have VMs stored on the node, be prepared to add back your storage locations if necessary. Even though the storage locations disappear from the GUI, your data is still there.

Add the current node to the cluster:

hp2# pvecm add IP-ADDRESS-CLUSTER

For IP-ADDRESS-CLUSTER use an IP from an existing cluster node.

To check the state of cluster:

hp2# pvecm status

Do this for all cluster members.

If a VM is locked, you can unlock it with:

qm unlock VMID
(Number)

Now you have a cluster which can live migrate any VMs!
A live migration, depending on RAM, takes about 90 seconds if using shared storage (VMs are on NAS / SAN) !!!

Try it, it’s that easy!

Andy

capote · June 22, 2020, 8:33pm

I did, actually:
root@proxmox:~# pvecm status
Cluster information
-------------------
Name: pvecluster
Config Version: 8
Transport: knet
Secure auth: on

Quorum information
------------------
Date:             Mon Jun 22 22:29:25 2020
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          1.2c0
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   4
Highest expected: 4
Total votes:      3
Quorum:           3  
Flags:            Quorate 

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.3.200 (local)
0x00000002          1 192.168.3.201
0x00000003          1 192.168.3.204
root@proxmox:~#

I have also activated fencing (I guess…).

That would have been the decisive clue:
qm unlock VMID
(Number)

“Unfortunately” it’s all back on now, but I’ll remember that.

Now you have a cluster which can live migrate any VMs!
A live migration, depending on RAM, takes about 90 seconds if using shared storage (VMs are on NAS / SAN) !!!

Yes, normally yes, but this time no migration could be triggered, nothing happened.
Danke Dir trotzdem, Marko

Andy_Wismer · June 22, 2020, 8:38pm

@capote

Hi Marko

This is also important to know in cluster operation:

Any time a node is not working in a three node cluster, you cluster loses “Quorate” status, meaning the cluster is no more synchronized to all nodes.
You can’t start any VM anymore, nor can you migrate…

Use this command:

pvecm expected 1

This sets the required votes for Quorate-Status to one.
Now you can edit locked files in PVE cluster config, boot VMs and migrate.

My 2 cents
Andy

capote · June 22, 2020, 8:43pm

Ok, I did it.
Know I get …

root@proxmox:~# pvecm status
Cluster information

Name: pvecluster
Config Version: 8
Transport: knet
Secure auth: on

Quorum information

Date: Mon Jun 22 22:41:32 2020
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000001
Ring ID: 1.2c0
Quorate: Yes

Votequorum information

Expected votes: 4
Highest expected: 4
Total votes: 3
Quorum: 3
Flags: Quorate

Membership information
Nodeid      Votes Name
0x00000001 1 192.168.3.200 (local)
0x00000002 1 192.168.3.201
0x00000003 1 192.168.3.204
root@proxmox:~#

Where is the difference?

Andy_Wismer · June 22, 2020, 8:49pm

You ONLY need to use that command when you have a cluster problem, or one or more nodes are not working correctly.

If you need to edit files in /etc/pve, an unlock will not help, but “pvecm expected 1” will help!

This should only be used if the cluster is no more “Quorate”.
See Proxmox “Datacenter” -> Summary:

In your current case, I’d reboot ALL 3 Proxmox, just to make sure your cluster is working as it should. Eliminating the “pvecm expected 1” override!

My 2 cents
Andy

capote · June 22, 2020, 8:54pm

ok, I understand now. Thank you very much.
But I didn’t realize/recognize that my cluster condition should have been bad.

Andy_Wismer · June 22, 2020, 9:00pm

It CAN happen (very rarely) during an update.

In January 2020, our national provider (think T-Online for Germany), had two major outages. Even emergency police / ambulance and other services were disrupted. The only service working halfway well was the mobile network. The police and other emergency organizations even published a mobile number, as the normal emergency number could not be reached…

Without knowing this (Found out the next day…) I did an update on the Network shown (SHZG), and the NethServer and one Proxmox got corrupted updates.

The next day I had to go there and fix the problem…

Andy

fausp · June 23, 2020, 1:46pm

Sorry for the late answer… I think you already got it

m.traeumner · June 24, 2020, 12:53pm

@capote and @Andy_Wismer
Could you please also translate your german parts for not german speaking people. I think it’s interesting for others too.

Andy_Wismer · June 24, 2020, 1:06pm

@m.traeumner

Hi Michael

Corrected - sorry, sometimes get carried away, forgetting we’re not in a private chat!
Sh*t happens!
At least I clean up my mess myself!

Andy

VM start fails after Proxmox update

root@proxmox:~# pvecm status Cluster information

Quorum information

Votequorum information

Membership information

root@proxmox:~# pvecm status
Cluster information