Task cluster/join-cluster failed

leader node NS8 - ns8.ad.domain.local
worker node NS8 - mail.ad.domain.local
https://docs.nethserver.org/projects/ns8/en/latest/cluster.html#add-a-node



To disable the TLS certification validation option before clicking the join button.
Task cluster/join-cluster failed

/usr/local/agent/pyenv/lib64/python3.11/site-packages/urllib3/connectionpool.py:1045: InsecureRequestWarning: Unverified HTTPS request is being made to host 'ns8.ad.domain.local'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  warnings.warn(
Leader response is successful: the new node ID is node/7!
<3>leader_endpoint error: [Errno -2] Name or service not known DATA {'ip_address': '10.5.4.7', 'leader_endpoint': 'ldap.domain.local:55820', 'leader_ip_address': '10.5.4.1', 'leader_public_key': 'lkjlOErwZ39VpH2C0VcSDXG+N+CtRt9orOSYenNzSXM=', 'network': '10.5.4.0/24', 'node_id': 7}
<5>After the issue is solved, remove node 7 before running a new join attempt.
Traceback (most recent call last):
  File "/var/lib/nethserver/cluster/actions/join-cluster/50update", line 91, in <module>
    socket.getaddrinfo(peer_hostname, peer_port, proto=socket.IPPROTO_UDP)[0][4][0]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/socket.py", line 962, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
socket.gaierror: [Errno -2] Name or service not known

Did you configure DNS entries for ns8.ad.domain.local and mail.ad.domain.local? See also System requirements — NS8 documentation

Yes
On ns8 installed samba5
Ns8=samba5=IP192.168.0.16=DNS192.168.0.14(GW)
Mail=IP 192.168.0.17=DNS 192.168.0.16

Both NS8 should be able to resolve their names.

Does nslookup work on both NS8?

On ns8.ad.domain.local:

nslookup mail.ad.domain.local

On mail.ad.domain.local:

nslookup ns8.ad.domain.local

EDIT:

Here the leader name shows ldap.domain.local but shouldn’t it be ns8.ad.domain.local?

NS8 mail - just installed and launched for the first time
[root@mail admin]# nslookup google.com
bash: nslookup: command not found

It seems not installed.

You could also test using ping or curl.

To install nslookup and dig on Rocky:

dnf install bind-utils

EDIT:

Please also check the hostname on the NS8 leader:

hostname -f

On the NS8 leader:

[root@ns8 admin]# nslookup mail.ad.domain.local
Server:         192.168.0.16
Address:        192.168.0.16#53

Name:   mail.ad.domain.local
Address: 192.168.0.17
[root@ns8 admin]# hostname -f
ns8.ad.domain.local

On the MAIL node:

[root@mail admin]# ping ns8.ad.domain.local
PING ns8.ad.domain.local (192.168.0.16) 56(84) bytes of data.
64 bytes from 192.168.0.16 (192.168.0.16): icmp_seq=1 ttl=64 time=0.463 ms
64 bytes from 192.168.0.16 (192.168.0.16): icmp_seq=2 ttl=64 time=0.477 ms
64 bytes from 192.168.0.16 (192.168.0.16): icmp_seq=3 ttl=64 time=0.488 ms
[root@mail admin]# ping google.com
ping: google.com: Name or service not known

New node MAIL does not resolve external addresses.
MAIL=IP 192.168.0.17=DNS 192.168.0.16(NS8)

/usr/local/agent/pyenv/lib64/python3.11/site-packages/urllib3/connectionpool.py:1045: InsecureRequestWarning: Unverified HTTPS request is being made to host 'ns8.ad.domain.local'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  warnings.warn(
Leader response is successful: the new node ID is node/8!
<3>leader_endpoint error: [Errno -2] Name or service not known DATA {'ip_address': '10.5.4.8', 'leader_endpoint': 'ldap.domain.local:55820', 'leader_ip_address': '10.5.4.1', 'leader_public_key': 'lkjlOErwZ39VpH2C0VcSDXG+N+CtRt9orOSYenNzSXM=', 'network': '10.5.4.0/24', 'node_id': 8}
<5>After the issue is solved, remove node 8 before running a new join attempt.
Traceback (most recent call last):
  File "/var/lib/nethserver/cluster/actions/join-cluster/50update", line 91, in <module>
    socket.getaddrinfo(peer_hostname, peer_port, proto=socket.IPPROTO_UDP)[0][4][0]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/socket.py", line 962, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
socket.gaierror: [Errno -2] Name or service not known

Does it work when using the GW DNS? (192.168.0.14)

There are different names of the leader node, that could lead to issues.

EDIT:

After every missed join the node tries to join using a new VPN IP address.
Maybe it makes sense to start over, check that DNS and connectivity are working and then join to the cluster.

Yes, but in this case the domain addresses will not be resolved.

On the NS8 leader:

[root@ns8 admin]# ping mail.ad.domain.local
PING mail.ad.domain.local (192.168.0.17) 56(84) bytes of data.
64 bytes from 192.168.0.17 (192.168.0.17): icmp_seq=1 ttl=64 time=0.612 ms
64 bytes from 192.168.0.17 (192.168.0.17): icmp_seq=2 ttl=64 time=0.503 ms

On the MAIL node:

[root@mail admin]# ping ns8.ad.domain.local
PING ns8.ad.domain.local (192.168.0.16) 56(84) bytes of data.
64 bytes from 192.168.0.16 (192.168.0.16): icmp_seq=1 ttl=64 time=0.463 ms
64 bytes from 192.168.0.16 (192.168.0.16): icmp_seq=2 ttl=64 time=0.477 ms
/usr/local/agent/pyenv/lib64/python3.11/site-packages/urllib3/connectionpool.py:1045: InsecureRequestWarning: Unverified HTTPS request is being made to host 'ns8.ad.domain.local'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  warnings.warn(
Leader response is successful: the new node ID is node/9!
<3>leader_endpoint error: [Errno -2] Name or service not known DATA {'ip_address': '10.5.4.9', 'leader_endpoint': 'ldap.domain.local:55820', 'leader_ip_address': '10.5.4.1', 'leader_public_key': 'lkjlOErwZ39VpH2C0VcSDXG+N+CtRt9orOSYenNzSXM=', 'network': '10.5.4.0/24', 'node_id': 9}
<5>After the issue is solved, remove node 9 before running a new join attempt.
Traceback (most recent call last):
  File "/var/lib/nethserver/cluster/actions/join-cluster/50update", line 91, in <module>
    socket.getaddrinfo(peer_hostname, peer_port, proto=socket.IPPROTO_UDP)[0][4][0]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/socket.py", line 962, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
socket.gaierror: [Errno -2] Name or service not known

Where does the name ldap.domain.local come from? Is the name resolvable by DNS and points to 192.168.0.16?

[root@ns8 etc]# cat /etc/hosts
::1     localhost       localhost.localdomain   localhost6      localhost6.localdomain6

127.0.0.1       localhost       localhost.localdomain   localhost4      localhost4.localdomain4
# commented by set-fqdn


# commented by set-fqdn

192.168.0.16 cluster-leader
10.5.4.1 cluster-localnode
127.0.1.1 ldap.domain.local ldap

Found ldap.domain.local in hosts file
Commented out 127.0.1.1 ldap.domain.local ldap but the error is the same.

The name in /etc/hosts should match the result of hostname -f and the DNS entry on the DNS server.

Please try to set just one FQDN for the leader that can be resolved by DNS.

You can change the FQDN at the nodes page.

1 Like

Made one FQDN for the leader and add MAIL to cluster:

On the NS8 leader:

On the MAIL node, when entering the web user interface there is an endless refresh:

To uninstall NS8 MAIL, execute:

bash /var/lib/nethserver/node/uninstall.sh

Start the installation procedure as root:

curl https://raw.githubusercontent.com/NethServer/ns8-core/ns8-stable/core/install.sh | bash

1 Like