LDAP stopped working

TheManWithOnlyOneArm · July 25, 2025, 6:13pm

NethServer Version: 8
Module: ?

All of a sudden, my LDAP stopped, preventing users from logging on to the system(SOGo, Nextcloud, etc) as well as “Domain users & groups” from working correctly.

When accessing the Domain users & groups i get the following error:

Traceback (most recent call last):
  File "/var/lib/nethserver/cluster/actions/list-domain-groups/50list_groups", line 33, in <module>
    groups = Ldapclient.factory(**domain).list_groups()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/agent/pypkg/agent/ldapclient/__init__.py", line 29, in factory
    return LdapclientAd(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/agent/pypkg/agent/ldapclient/base.py", line 37, in __init__
    self.ldapconn = ldap3.Connection(self.ldapsrv,
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/ldap3/core/connection.py", line 363, in __init__
    self._do_auto_bind()
  File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/ldap3/core/connection.py", line 389, in _do_auto_bind
    self.bind(read_server_info=True)
  File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/ldap3/core/connection.py", line 607, in bind
    response = self.post_send_single_response(self.send('bindRequest', request, controls))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/ldap3/strategy/sync.py", line 160, in post_send_single_response
    responses, result = self.get_response(message_id)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/ldap3/strategy/base.py", line 370, in get_response
    raise LDAPSessionTerminatedByServerError(self.connection.last_error)
ldap3.core.exceptions.LDAPSessionTerminatedByServerError: session terminated by server

As well as:

Traceback (most recent call last):
  File "/var/lib/nethserver/cluster/actions/list-domain-users/50list_users", line 33, in <module>
    users = Ldapclient.factory(**domain).list_users(extra_info=True)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/agent/pypkg/agent/ldapclient/__init__.py", line 29, in factory
    return LdapclientAd(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/agent/pypkg/agent/ldapclient/base.py", line 37, in __init__
    self.ldapconn = ldap3.Connection(self.ldapsrv,
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/ldap3/core/connection.py", line 363, in __init__
    self._do_auto_bind()
  File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/ldap3/core/connection.py", line 389, in _do_auto_bind
    self.bind(read_server_info=True)
  File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/ldap3/core/connection.py", line 607, in bind
    response = self.post_send_single_response(self.send('bindRequest', request, controls))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/ldap3/strategy/sync.py", line 160, in post_send_single_response
    responses, result = self.get_response(message_id)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/ldap3/strategy/base.py", line 370, in get_response
    raise LDAPSessionTerminatedByServerError(self.connection.last_error)
ldap3.core.exceptions.LDAPSessionTerminatedByServerError: session terminated by server

It seems that it happened before the last system update.

Using self-signed CERTs, could it be that they are expired?

Anybody having the same issue?

davidep · July 25, 2025, 7:10pm

The openLDAP domain provider doesn’t use TLS, we can exclude it.

Do you have a single node cluster?

Try to restart the ldapproxy module session:

systemctl restart user@$(id -u ldapproxy1)

As alternative, simply reboot the node.

If the error persists, look at Logs page to find some clue.

TheManWithOnlyOneArm · July 25, 2025, 7:45pm

Correct, single node cluster.

Restarting the ldapproxy or restarting the entire node, didn’t change anything.

Logs from ldapproxy:

2025-07-25T20:49:23+02:00 [1:ldapproxy1:ldapproxy] 2025/07/25 18:49:23 [info] 29#29: *2896539 client 127.0.0.1:59048 connected to 127.0.0.1:20002
2025-07-25T20:49:23+02:00 [1:ldapproxy1:ldapproxy] 2025/07/25 18:49:23 [info] 25#25: *2896541 client 127.0.0.1:59052 connected to 127.0.0.1:20002
2025-07-25T20:49:23+02:00 [1:ldapproxy1:ldapproxy] 2025/07/25 18:49:23 [error] 29#29: *2896539 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: 127.0.0.1:20002, upstream: "192.168.30.5:636", bytes from/to client:0/0, bytes from/to upstream:0/0
2025-07-25T20:49:23+02:00 [1:ldapproxy1:ldapproxy] 2025/07/25 18:49:23 [error] 25#25: *2896541 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: 127.0.0.1:20002, upstream: "192.168.30.5:636", bytes from/to client:0/0, bytes from/to upstream:0/0
2025-07-25T20:49:23+02:00 [1:ldapproxy1:ldapproxy] 2025/07/25 18:49:23 [info] 29#29: *2896543 client 127.0.0.1:59054 connected to 127.0.0.1:20002

mrmarkuz · July 25, 2025, 8:00pm

Is 192.168.30.5 the IP of the NS8?

Please check the service status:

runagent -m ldapproxy1 systemctl --user status ldapproxy -l --no-pager

Let’s test openldap using curl:

root@home:~# runagent -m openldap1 grep LDAP_PORT environment
LDAP_PORT=20013
root@home:~# curl ldap://127.0.0.1:20013
DN:
	objectClass: top
	objectClass: OpenLDAProotDSE

davidep · July 25, 2025, 8:07pm

Port 636… I bet it’s Samba AD with File server

TheManWithOnlyOneArm · July 25, 2025, 8:07pm

Yes, it is the IP of the NS8

Service status:

[root@ns1 netadmin]# runagent -m ldapproxy1 systemctl --user status ldapproxy -l --no-pager
● ldapproxy.service - ldapproxy1 LDAP account provider local proxy
     Loaded: loaded (/home/ldapproxy1/.config/systemd/user/ldapproxy.service; enabled; preset: disabled)
     Active: active (running) since Fri 2025-07-25 20:47:37 CEST; 12min ago
       Docs: man:podman-generate-systemd(1)
    Process: 27330 ExecStartPre=/bin/rm -f /run/user/1004/ldapproxy.pid /run/user/1004/ldapproxy.ctr-id (code=exited, status=0/SUCCESS)
    Process: 27339 ExecStartPre=/usr/local/bin/runagent update-conf (code=exited, status=0/SUCCESS)
    Process: 27346 ExecStart=/usr/bin/podman run --detach --env=NGINX_ENTRYPOINT_QUIET_LOGS=1 --conmon-pidfile=/run/user/1004/ldapproxy.pid --cidfile=/run/user/1004/ldapproxy.ctr-id --cgroups=no-conmon --network=host --replace --name=ldapproxy --volume=./nginx:/srv:z ${NGINX_IMAGE} nginx -g daemon off; -c /srv/nginx.conf (code=exited, status=0/SUCCESS)
   Main PID: 27378 (conmon)
      Tasks: 1 (limit: 100188)
     Memory: 536.0K
        CPU: 3min 12.880s
     CGroup: /user.slice/user-1004.slice/user@1004.service/app.slice/ldapproxy.service
             └─27378 /usr/bin/conmon --api-version 1 -c aa4f4d64d3ab3a1ff83a5c95d87fcfb0563c74fd014945076d0c41f070bc727a -u aa4f4d64d3ab3a1ff83a5c95d87fcfb0563c74fd014945076d0c41f070bc727a -r /usr/bin/crun -b /home/ldapproxy1/.local/share/containers/storage/overlay-containers/aa4f4d64d3ab3a1ff83a5c95d87fcfb0563c74fd014945076d0c41f070bc727a/userdata -p /run/user/1004/containers/overlay-containers/aa4f4d64d3ab3a1ff83a5c95d87fcfb0563c74fd014945076d0c41f070bc727a/userdata/pidfile -n ldapproxy --exit-dir /run/user/1004/libpod/tmp/exits --persist-dir /run/user/1004/libpod/tmp/persist/aa4f4d64d3ab3a1ff83a5c95d87fcfb0563c74fd014945076d0c41f070bc727a --full-attach -s -l journald --log-level warning --syslog --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/run/user/1004/containers/overlay-containers/aa4f4d64d3ab3a1ff83a5c95d87fcfb0563c74fd014945076d0c41f070bc727a/userdata/oci-log --conmon-pidfile /run/user/1004/ldapproxy.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /home/ldapproxy1/.local/share/containers/storage --exit-command-arg --runroot --exit-command-arg /run/user/1004/containers --exit-command-arg --log-level --exit-command-arg warning --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /run/user/1004/libpod/tmp --exit-command-arg --network-config-dir --exit-command-arg "" --exit-command-arg --network-backend --exit-command-arg netavark --exit-command-arg --volumepath --exit-command-arg /home/ldapproxy1/.local/share/containers/storage/volumes --exit-command-arg --db-backend --exit-command-arg sqlite --exit-command-arg --transient-store=false --exit-command-arg --runtime --exit-command-arg crun --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --events-backend --exit-command-arg file --exit-command-arg container --exit-command-arg cleanup --exit-command-arg --stopped-only --exit-command-arg aa4f4d64d3ab3a1ff83a5c95d87fcfb0563c74fd014945076d0c41f070bc727a

[root@ns1 netadmin]# runagent -m openldap1 grep LDAP_PORT environment
runagent: [FATAL] Cannot find module openldap1 in the local node
[root@ns1 netadmin]#

TheManWithOnlyOneArm · July 25, 2025, 8:09pm

You are right!

davidep · July 25, 2025, 8:10pm

In this case run the curl command as suggested by Markus, but towards the Samba DC. Effectively Samba AD HAS a TLS certificate.

curl -v ldaps://192.168.30.5:636

TheManWithOnlyOneArm · July 25, 2025, 8:16pm

[root@ns1 netadmin]# curl -v ldaps://192.168.30.5:636
*   Trying 192.168.30.5:636...
* connect to 192.168.30.5 port 636 failed: Connection refused
* Failed to connect to 192.168.30.5 port 636: Connection refused
* Closing connection 0
curl: (7) Failed to connect to 192.168.30.5 port 636: Connection refused

davidep · July 25, 2025, 8:17pm

Inspect the Samba module from the logs page.

TheManWithOnlyOneArm · July 25, 2025, 8:20pm

2025-07-25T20:30:00+02:00 [1:samba1:systemd] Starting Samba DC and File Server...
2025-07-25T20:30:02+02:00 [1:samba1:podman] Error: lsetxattr(label=system_u:object_r:container_file_t:s0) /home/samba1/.local/share/containers/storage/volumes/shares/_data/marketing/Campaigns: operation not permitted
2025-07-25T20:30:02+02:00 [1:samba1:systemd] samba-dc.service: Control process exited, code=exited, status=126/n/a
2025-07-25T20:30:02+02:00 [1:samba1:podman] fc4091973ba9f6e2e59b19184c3f151298ad64c56b88271e4dbe35416ac6b0b6
2025-07-25T20:30:02+02:00 [1:samba1:systemd] samba-dc.service: Failed with result 'exit-code'.
2025-07-25T20:30:02+02:00 [1:samba1:systemd] Failed to start Samba DC and File Server.
2025-07-25T20:30:03+02:00 [1:samba1:systemd] samba-dc.service: Scheduled restart job, restart counter is at 459.
2025-07-25T20:30:03+02:00 [1:samba1:systemd] Stopped Samba DC and File Server.
2025-07-25T20:30:03+02:00 [1:samba1:systemd] samba-dc.service: Start request repeated too quickly.
2025-07-25T20:30:03+02:00 [1:samba1:systemd] samba-dc.service: Failed with result 'exit-code'.
2025-07-25T20:30:03+02:00 [1:samba1:systemd] Failed to start Samba DC and File Server.

davidep · July 25, 2025, 8:23pm

TheManWithOnlyOneArm:

Error: lsetxattr(label=system_u:object_r:container_file_t:s0) /home/samba1/.local/share/containers/storage/volumes/shares/_data/marketing/Campaigns: operation not permitted

There’s a permission or ownership issue in the marketing shared folder. Did you transfer contents with rsync/SFTP/ssh or mount the share on a remote filesystem?

TheManWithOnlyOneArm · July 25, 2025, 8:29pm

Not certain if content was copied over using SSH or via Nextcloud(using remote mount).

Odd thing that owner is root:root, I’m certain that this used to be samba1:493315 as the other shared ressources.

davidep · July 25, 2025, 8:32pm

Samba startup fails for shared folder content owned by root:root. Only root can do it, not Samba. Maybe the shell history provides some clue.

After fixing the content owner, restart Samba with:

runagent -m samba1 systemctl --user restart samba-dc

TheManWithOnlyOneArm · July 25, 2025, 8:36pm

Changing the owership and restarting the samba service did the trick.
Thank you very much for the quick response.