Problem Report: Broken Samba Secondary Module After Node Update Failure

NethServer Version: 8
Module: samba

Environment

  • Cluster setup: NS8 with multiple nodes
  • Primary Samba module: samba1 running on a separate machine (working fine)
  • Secondary Samba module: samba3 installed on node 2 (problematic)

What happened

  • During a system update, node 2 froze and had to be restarted manually.
  • After reboot, the secondary Samba module ( samba3 ) became corrupted.

Symptoms

  1. Provider configuration errors
  • When accessing Samba providers in Cockpit, the following error appears:
JSON Schema output validation aborted at step /home/samba3/.config/actions/get-password-policy/validate-output.json:
JSON unmarshal error: unexpected end of JSON input. Input data: []
  1. Removal/Reconfigure actions fail
  • Attempting to remove the app results in:
JSON Schema input validation aborted at step /home/samba3/.config/actions/leave-domain/validate-input.json:
Schema compile error for file ... EOF
  • The cluster reports:
agent.tasks.exceptions.TaskSubmissionCheckFailed: Client "module/samba3" was not found
  1. Podman container issues
  • Restarting the container fails with:
Error: unable to determine if ".../userdata/shm" is mounted: lstat ... no such file or directory
  • This indicates the container runtime was removed or corrupted.
  1. Agent service problems
  • On node 2, nethserver-agent.service is not loaded, so the cluster cannot manage apps there.

What has been tried

  • Pulled the correct image manually:
podman pull ghcr.io/nethserver/samba:3.1.1
  • Cleaned Podman storage with podman system prune and podman volume prune.
  • Attempted to restart/remove/reconfigure the module via Cockpit and CLI, but all failed due to corrupted JSON schema files and missing agent client.
  • Tried to locate JWT token for API calls, but the static file is not present in /var/lib/nethserver/node/state/.

Current status

  • The primary Samba ( samba1 ) is healthy.
  • The secondary Samba ( samba3 ) on node 2 is broken: no valid container, corrupted config files, and cannot be removed or reconfigured through Cockpit.
  • Backup restoration is blocked because the app cannot be removed first.

Request for help

  • How can I forcefully remove or repair the samba3 module on node 2 so that I can reinstall it and restore from backup?
  • Is there a supported way to clean up corrupted module references when the agent reports Client "module/samba3" was not found?
  • What is the correct procedure to obtain an API token (JWT) in NS8 for manual module removal when Cockpit actions fail?

Just to be able to reproduce it:

Is the samba1 instance a user domain with enabled file server?

Is the samba3 instance a provider replica or a domain member file server?

In the case of samba3, that was different from goauthentik, I was able to force uninstall and then I restore previous backup. Everything is working well again. Just the sambaID that was changed from 3 to 4. Even backup restoration doesn’t preserve the ID.

1 Like

Great, so this one is solved.

Yes, the ID (or instance number) is always increased, no matter if install, restore, clone or move.

I think with goauthentik there’s the issue that it was removed manually and now the app removal doesn’t find specific files/entries anymore and therefore can’t remove it, I’m going to check if it can be fixed by adding pseudo entries/dirs/files…