Nethserver 8 Cluster joining error

NethServer Version: NS8
Module: Cluster

Now that I have a NS8 with at least mail, webtop working, wanted to install a second NS8 and join that to the cluster to see the feature in action.
I have installed the second NS8 on a new VM with identical resource config on Debian+BTRFS.

After the install I clicked on Join Cluster. copy/pasted the Join code from the leader node. No matter how many times I tried and how long I waited for the task to finish, the screen sits with spinning wheel as following:

I did not think I needed any further configuration on the leader node for the cluster feature to work. Have I missed something? I should mention, I tried to join the node with both TLS validation enabled and disabled but same result.

1 Like

I really can’t understand why there are that much issues. Are the system requirements fulfilled?
I’m sorry to repeat myself but…in logs we trust. Please share logs so we’re able to check what’s going on.
Could you please also share some hardware information (like RAM, CPU and disk type) and if there’s a special proxmox vm config (not default)?
Is this VM stored on Ceph?
Which Debian install image did you use? The installation should really be minimal, see Minimum Requirements To Run NethServer 8 - #6 by mrmarkuz

Some ideas:

I assume you disabled the firewall for the virtual network device of the NS8 VM in Proxmox.

Let’s check the nodes FQDNs:

hostname -f

Can the future worker node reach the cluster leader HTTPS port? (gives back “Found” when working)

curl -k https://nodename.domain.tld/cluster-admin

I tested NS8 on Debian 12 with ext4 and xfs and there are no issues. I think that’s possibly the wrong direction as podman seems to work with most filesystems.
There are for example known issues regarding NFS mounts, see Ns8 merged error - #15 by davidep

No, I don’t think so, there’s no additional configuration needed.

3 Likes

Indeed it is odd why my instance having so many issues. I do not consider my virtual environment unique as far as architecture goes. Here are the details of the VM for NS8:
Hypervisor: Proxmox 7.4-13
VM Type: KVM
Core/RAM/VM Disk Type: 4/8GB/VirtioIO
Storage: Initially NS8 was on Ceph. Currently running on local ZFS storage
No special Proxmox configuration. No Proxmox firewall

Which log would be helpful to find the issue of not being able to join new node? Log from the leader or slave?
If you need log from the slave, where is the location of log store? Since I cannot login to GUI without taking care of cluster creation or joining, I cannot pull the log via cluster admin gui.

The new node just return Found for the curl command with FQDN of leader.

2 Likes

Thanks for the infos.

I guess both. To get the logs you could also use journalctl on CLI.

To export a time range of the logs to a file:

journalctl --since "2024-04-28 11:30" --until "2024-04-28 11:40:00" > mylogs.log

That’s good news.

1 Like

After few attempts i was able move past the cluster joining page. Leader node now actually recognizes that a node was added to the cluster. However it sees as offline but the need is online and accessible.

After ths my leader node GUI is now struggling. It became very slow and mostly sits with these gray placeholders.

I have included the logs from both nodes below. Hopefully they can shed some light.
NS8 leader node error

NS8 slave node error log

1 Like
Apr 28 12:37:43 ca0401nth02 agent@cluster[7550]: Error: NAME_CONFLICT: new_service(): 'ns-wireguard'
Apr 28 12:37:43 ca0401nth02 agent@cluster[7550]: task/cluster/42a1ae52-d05d-43a6-a66a-1fb1d87d710e: action "join-node" status is "aborted" (26) at step 20wgboot

There’s a similar issue:

To list the services in the firewall:

firewall-cmd --list-services --zone public

To remove the wireguard firewall service before retrying to join:

firewall-cmd --remove-service=ns-wireguard

I made one last attempt to get this to work.

I created 2 fresh VMs with identical specs (2 vCPU, 8GB RAM) and installed Debian 12.
I installed NS8 following the guide using curl command on both VMs.
On node 1, I created a new cluster and nothing else, no app, no configuration. Even after waiting for hours and refreshing many times, the GUI never fully loads. Always missing something and looks like unfinished GUI page as following image:

On node 2, I clicked on Join Cluster, copy/pasted the join code from node 1. The join never finishes. The GUI shows the spinning wheel indefinitely as following:

I am all out of ideas to try on this. How much more basic I could get than clean, freshly installed OS?

Any debian user out there who has a fully functioning NS8 which loads GUI every time?