Create New cluster and cluster restore not possible

Trying several times to restore cluster and create new cluster on new Proxmox qcow2 NS8 install results in same error

<7>sed -i -e ‘/cluster-localnode$/c\10.5.5.1 cluster-localnode’ /etc/hosts
<7>systemctl stop wg-quick@wg0.service
<7>systemctl start wg-quick@wg0.service
Task cluster/add-module run failed: {‘output’: ‘’, ‘error’: ‘<7>podman-pull-missing Package metrics · GitHub\nTrying to pull ghcr.io/nethserver/metrics:1.1.1…\nGetting image source signatures\nCopying blob sha256:1cda4ccc42d0d050a730a462ecb56e14c725d623124297403b1741dc547adf19\nCopying config sha256:8e492aece7101cbaeaf6cce4efc175f693d772e25d2eca1106150736e1b161f2\nWriting manifest to image destination\n8e492aece7101cbaeaf6cce4efc175f693d772e25d2eca1106150736e1b161f2\n<7>extract-ui Package metrics · GitHub\nExtracting container filesystem ui to /var/lib/nethserver/cluster/ui/apps/metrics1\nui/index.html\n157fde6a8ebb7c032ea96d93d70337199c5d353d15b11f19f0a507a03d90a862\nAssertion failed\n File “/var/lib/nethserver/cluster/actions/add-module/50update”, line 196, in \n agent.assert_exp(create_module_result[‘exit_code’] == 0) # Ensure create-module is successful\n’, ‘exit_code’: 2}
Assertion failed
File “/var/lib/nethserver/cluster/actions/create-cluster/50update”, line 96, in
agent.assert_exp(add1_module_failures == 0)

=============

<7>podman-pull-missing Package metrics · GitHub
Trying to pull ghcr.io/nethserver/metrics:1.1.1…
Getting image source signatures
Copying blob sha256:1cda4ccc42d0d050a730a462ecb56e14c725d623124297403b1741dc547adf19
Copying config sha256:8e492aece7101cbaeaf6cce4efc175f693d772e25d2eca1106150736e1b161f2
Writing manifest to image destination
8e492aece7101cbaeaf6cce4efc175f693d772e25d2eca1106150736e1b161f2
<7>extract-ui Package metrics · GitHub
Extracting container filesystem ui to /var/lib/nethserver/cluster/ui/apps/metrics1
ui/index.html
157fde6a8ebb7c032ea96d93d70337199c5d353d15b11f19f0a507a03d90a862
Assertion failed
File “/var/lib/nethserver/cluster/actions/add-module/50update”, line 196, in
agent.assert_exp(create_module_result[‘exit_code’] == 0) # Ensure create-module is successful

=============

Add to module/metrics1 environment PROMETHEUS_IMAGE=quay.io/prometheus/prometheus:v3.3.1
Add to module/metrics1 environment ALERTMANAGER_IMAGE=quay.io/prometheus/alertmanager:v0.28.1
Add to module/metrics1 environment GRAFANA_IMAGE=docker.io/grafana/grafana:12.0.2
Add to module/metrics1 environment ALERT_PROXY_IMAGE=Package alert-proxy · GitHub
<7>podman-pull-missing quay.io/prometheus/prometheus:v3.3.1 quay.io/prometheus/alertmanager:v0.28.1 docker.io/grafana/grafana:12.0.2 Package alert-proxy · GitHub
Trying to pull quay.io/prometheus/prometheus:v3.3.1…
Error: initializing source docker://quay.io/prometheus/prometheus:v3.3.1: pinging container registry quay.io: received unexpected HTTP status: 504 Gateway Time-out
Traceback (most recent call last):
File “/usr/local/agent/bin/podman-pull-missing”, line 35, in
subprocess.run([‘podman’, ‘pull’, image_url]).check_returncode()
File “/usr/lib64/python3.11/subprocess.py”, line 502, in check_returncode
raise CalledProcessError(self.returncode, self.args, self.stdout,
subprocess.CalledProcessError: Command '[‘podman’, ‘pull’, 'quay.io/prometheus/prometheus:v3.3.1’]’ returned non-zero exit status 125.
Traceback (most recent call last):
File “/usr/local/agent/actions/create-module/05pullimages”, line 48, in
agent.run_helper(‘podman-pull-missing’, *images).check_returncode()
File “/usr/lib64/python3.11/subprocess.py”, line 502, in check_returncode
raise CalledProcessError(self.returncode, self.args, self.stdout,
subprocess.CalledProcessError: Command '(‘podman-pull-missing’, ‘quay.io/prometheus/prometheus:v3.3.1’, ‘quay.io/prometheus/alertmanager:v0.28.1’, ‘docker.io/grafana/grafana:12.0.2’, 'ghcr.io/nethserver/alert-proxy:1.1.1’)’ returned non-zero exit status 1.

“Error: initializing source docker://quay.io/prometheus/prometheus:v3.3.1: pinging container registry quay.io: received unexpected HTTP status: 504 Gateway Time-out”

Not able to get prometheus container ?

(Did it last days several times and no issues then)

Yes, today there are issues at quay and docker hub, see also Hey boss I cannot work, I return to home :D

Quay status: https://status.redhat.com/

Docker hub status: https://www.dockerstatus.com/

EDIT:

It seems AWS is down, see for example Major AWS outage takes down Fortnite, Alexa, Snapchat, and more | The Verge

EDIT2:

Redhat confirms AWS outage:

Update - The impact of the incident is currently limited to the AWS us-east-1 region. Oct 20, 2025 - 09:28 UTC

Quay image download seems working again…

[root@node ~]# podman pull quay.io/prometheus/node-exporter:v1.9.0
Trying to pull quay.io/prometheus/node-exporter:v1.9.0...
Getting image source signatures
Copying blob 60a2ff285504 skipped: already exists  
Copying blob 1617e25568b2 skipped: already exists  
Copying blob 9fa9226be034 skipped: already exists  
Copying config aaa0ee0c23 done   | 
Writing manifest to image destination
aaa0ee0c2359a6ef3c49728250e0bddf4952f6f542348b013ffde053b39b5dfe

Also docker image download seems working again:

[root@node ~]# podman pull docker.io/library/node:lts
Trying to pull docker.io/library/node:lts...
Getting image source signatures
...

But the services are still degraded → slower performance
Download is working again with normal speed

EDIT3:

It seems it was a DNS issue that also hit Amazon DynamoDB, see also https://health.aws.amazon.com/health/status

3 Likes