Flush Dead Modules

,

there are instances, where a module crashes completely, and updating is not possible for the module,

in this scenario, even uninstalling the module, sometimes is out of the question. and would ussually present errors,

like

cluster/remove-module
Task ID: Copy to clipboard
48a9fc4a-bbfa-4fd7-9a04-c3482ca39102
Traceback (most recent call last):
  File "/var/lib/nethserver/cluster/actions/remove-module/50update", line 72, in <module>
    raise ex
  File "/var/lib/nethserver/cluster/actions/remove-module/50update", line 57, in <module>
    destroy_module_result = agent.tasks.run(
                            ^^^^^^^^^^^^^^^^
  File "/usr/local/agent/pypkg/agent/tasks/run.py", line 39, in run
    results = runp([taskrq], **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/agent/pypkg/agent/tasks/run.py", line 50, in runp
    return asyncio.run(_runp(tasks, **kwargs))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/asyncio/base_events.py", line 654, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/usr/local/agent/pypkg/agent/tasks/run.py", line 131, in _runp
    return await asyncio.gather(*runners, return_exceptions=(len(tasks) > 1))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/agent/pypkg/agent/tasks/run.py", line 137, in _run_with_protocol
    return await run_redisclient(taskrq, **pconn)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/agent/pypkg/agent/tasks/redisclient.py", line 77, in run_redisclient
    await _task_submission_check_client_idle(rdb, taskrq, kwargs['check_idle_time'])
  File "/usr/local/agent/pypkg/agent/tasks/redisclient.py", line 41, in _task_submission_check_client_idle
    raise TaskSubmissionCheckFailed(f"Client \"{taskrq['agent_id']}\" was not found")
agent.tasks.exceptions.TaskSubmissionCheckFailed: Client "module/erpnext19" was not found

this issues, prompt a complete rm -rf nuking of the folder holsing the module.

while the module folder, and files would have been removed in the backedna dn on the server.
the UI would still be reporting the given module, as being available.

COuld we get the ability, to completely flush these modules, when the folder for the said module has completely been wiped clean?

The closest thing we currently have is:

remove-module --no-preserve --force

The --force flag is specifically intended to proceed with the removal even when the application’s destroy-module action fails.

What you’re describing is a slightly different and, as far as I know, largely unexplored use case: the module homedir has already been manually removed or otherwise lost, leaving only cluster metadata behind.

One possible improvement would be to extend the --force behavior to explicitly handle the missing homedir case, avoiding hangs or failures when the application files are no longer present.

If someone wanted to improve the cleanup path for this scenario, I would start by looking at the implementations of cluster/remove-module and node/remove-module.

Assuming the module homedir is already gone, the remaining cleanup should mostly consist of:

  • Removing Redis references (see cluster/remove-module/50update)
  • Releasing allocated TCP/UDP ports (see node/remove-module/60ports)

In other words, the problem is probably not deleting files anymore, but cleaning up the cluster state that still references the removed module.

1 Like

if this weould cleanup necessarily, then it would be great.

Within the remove module, could we not implement a handler
when module directory is missins, then we should do the cleanup instead.

1 Like

Yes, I think that would fit well with the semantics of the existing --force flag.

Today --force means “continue the removal even if the application’s destroy-module action fails”. Extending it to also handle the case where the module homedir is already missing would make the behavior more consistent: if the application cannot perform its own cleanup for any reason, the framework should still remove the remaining cluster metadata and resource allocations.

In that scenario, remove-module --no-preserve --force could skip the application-specific cleanup and proceed directly with the framework cleanup steps, such as removing Redis references and releasing allocated ports.

I tested to remove an app without homedir (moved away before) and it already works with the --force option.

There’s an error in the logs about the missing homedir but the app is removed correctly without throwing an error in the UI:

2026-06-09T20:43:38+02:00 [1:n8n2:podman] cannot resolve /home/n8n2: lstat /home/n8n2: no such file or directory

The remove-module actions are completed:

2026-06-09T20:43:39+02:00 [1::agent@node] task/node/1/0e789400-f6aa-48bf-86c2-f956f1e84eeb: action "remove-module" status is "completed" (0) at step validate-output.json
2026-06-09T20:43:39+02:00 [1::agent@cluster] task/cluster/8ffc54af-c8a7-465b-a44c-aef19696f263: action "remove-module" status is "completed" (0) at step validate-output.json

Did you remove entries from the redis db manually?

Maybe related:

1 Like

Not Really, there is a Bug in the Current erpenxt version, Whereby, Navigating to the build images, page, and saving the config, ends up breaking the app, without Option to recover, update or do anything elese, A while back i sa a similar situation occur ona differen tapp, i dont rememebr which one, and i have been trying to wonder what caused these kinds of failure.

attempts to remove the App, via all known means so far, would result in failure.

let me attempt the force command in this regards and reprot the findings.

@kemboielvis22 since we have Implemented Force rebuild, checkbox on he settings page, i think, we need to remove the Build images page, before it causes mayhem for someone.

Report:

using the --force appended, seems to solve the problem at hand.

the module was also compeltely removed, as well as from the UI.

thank you all.

1 Like