Sorry, I’m out of ideas.
@davidep do you have an idea how to clean half installed/removed apps.
Recap:
Sorry, I’m out of ideas.
@davidep do you have an idea how to clean half installed/removed apps.
Recap:
Nobody else?
How are you, bro? Tries:
api-cli run remove-module --data ‘{“module_id”:“goauthentik1”, “force”:true, “no_preserve”:true}’
Have you checked if there are references, for example:
/var/lib/nethserver/cluster/*
/var/lib/nethserver/catalog/*
The error you saw (TaskSubmissionCheckFailed) indicates that the module registration is stuck in the task and agent management system, which relies on Redis. Manual removal involves deleting the Redis-related keys and metadata files.
Manipulating Redis directly can corrupt the NethServer 8 cluster state if not done correctly. Make a backup or make sure you understand the risk.
I found that a good old fashioned complete reboot of the server/node cleared and killed the NS8, no longer wanted but persistent, tasks in Node memory.
Server rebooted and error persists.
Maybe this thread has usefull information?
I have tried it before. Not usefull for my case.
Ping @davidep
There might still be some Redis values containing a module_id reference that need to be manually cleaned up. The module_id is used as a value inside complex structures like HASH and SET, so it’s not always an obvious part of the key. For example, see the HASH cluster/module_node.
You can refer to the Python code for the complete list: ns8-core/core/imageroot/var/lib/nethserver/cluster/actions/remove-module/50update at main · NethServer/ns8-core · GitHub
Unfortunatelly I am not able to fix it.
I don’t think I have any suggestions on how to fix it, but “I am not able to fix it” really doesn’t give the rest of us (including those who might have such suggestions) much to go on. Detail really matters here–what exactly have you tried, what exactly was the result (including any error messages, logs, etc.), any observed behavior of the system (particularly anything that seems out of the ordinary), etc.
So sorry, but when I said unfortunatelly it’s because I have tried everything:
api-cli run remove-module --data '{"module_id": "goauthentik2", "force": true, "preserve_data": false}'
<3>Cannot retrieve the NODE_ID of goauthentik2
{"context":{"action":"remove-module","data":{"force":true,"module_id":"goauthentik2","preserve_data":false},"extra":{"description":"api-cli endpoint redis://cluster-leader","isNotificationHidden":false,"title":"cluster/remove-module"},"id":"d879b967-c97c-4fe2-ba67-6cfb82107ed7","parent":""},"status":"aborted","progress":0,"subTasks":[],"validated":false,"result":{"error":"<3>Cannot retrieve the NODE_ID of goauthentik2\n","exit_code":1,"file":"task/cluster/d879b967-c97c-4fe2-ba67-6cfb82107ed7","output":""}}
api-cli run update-module --data '{"module_url":"ghcr.io/geniusdynamics/goauthentik:latest","instances":["goauthentik1"],"force":true}'
Traceback (most recent call last):
File "/var/lib/nethserver/cluster/actions/update-module/50update", line 40, in <module>
ping_errors = agent.tasks.runp_brief([{"agent_id": f"module/{mid}", "action": "list-actions"} for mid in instances],
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/agent/pypkg/agent/tasks/run.py", line 61, in runp_brief
results = asyncio.run(_runp(tasks, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/usr/local/agent/pypkg/agent/tasks/run.py", line 120, in _runp
return await asyncio.gather(*runners, return_exceptions=(len(tasks) > 1))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/agent/pypkg/agent/tasks/run.py", line 127, in _run_with_protocol
return await run_redisclient(taskrq, **pconn)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/agent/pypkg/agent/tasks/redisclient.py", line 77, in run_redisclient
await _task_submission_check_client_idle(rdb, taskrq, kwargs['check_idle_time'])
File "/usr/local/agent/pypkg/agent/tasks/redisclient.py", line 41, in _task_submission_check_client_idle
raise TaskSubmissionCheckFailed(f"Client \"{taskrq['agent_id']}\" was not found")
agent.tasks.exceptions.TaskSubmissionCheckFailed: Client "module/goauthentik1" was not found
""
Trying suggestions from: https://community.nethserver.org/t/core-update-failed-lamp-has-no-active-instances/25663
runagent -m goauthentik1 grep IMAGE_URL environment
runagent: [FATAL] Cannot find module goauthentik1 in the local node
edis-cli hget module/goauthentik2/environment IMAGE_URL
(nil)
api-cli run get-configuration --agent module/goauthentik1
TaskSubmissionCheckFailed: Client "module/goauthentik1" was not found
I could reproduce the issue and following commands removed the app and the log entries, just adapt goauthentik1 to the app instance you want to remove.
APPTOREMOVE=goauthentik1
redis-cli hset module/${APPTOREMOVE}/environment NODE_ID "1"
redis-cli ACL SETUSER module/${APPTOREMOVE}
remove-module --no-preserve ${APPTOREMOVE}
We need this permanently documented somewhere, since we also had instances where we coulnt remove Some Apps, for whatever reason compeltely.
Let’s wait for a confirmation that it really works.
I guess the same is valid for the NODE_ID ?
e.g:
APPTOREMOVE=goauthentik1
NODEIDNR=1 (not sure with or without “ “)
redis-cli hset module/${APPTOREMOVE}/environment NODE_ID ${NODEIDNR}
(or “${NODEIDNR}” ) ?
Just spinning thoughts.
I’m not sure if it just needs a value or if it needs to be the right value, for a single node cluster “1” should usually work.
redis-cli hset module/${APPTOREMOVE}/environment NODE_ID "3"
(integer) 1
edis-cli ACL SETUSER module/${APPTOREMOVE}
OK
remove-module --no-preserve ${APPTOREMOVE}
Traceback (most recent call last):
File "/var/lib/nethserver/cluster/actions/remove-module/50update", line 72, in <module>
raise ex
File "/var/lib/nethserver/cluster/actions/remove-module/50update", line 57, in <module>
destroy_module_result = agent.tasks.run(
^^^^^^^^^^^^^^^^
File "/usr/local/agent/pypkg/agent/tasks/run.py", line 39, in run
results = runp([taskrq], **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/agent/pypkg/agent/tasks/run.py", line 50, in runp
return asyncio.run(_runp(tasks, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/usr/local/agent/pypkg/agent/tasks/run.py", line 120, in _runp
return await asyncio.gather(*runners, return_exceptions=(len(tasks) > 1))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/agent/pypkg/agent/tasks/run.py", line 127, in _run_with_protocol
return await run_redisclient(taskrq, **pconn)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/agent/pypkg/agent/tasks/redisclient.py", line 77, in run_redisclient
await _task_submission_check_client_idle(rdb, taskrq, kwargs['check_idle_time'])
File "/usr/local/agent/pypkg/agent/tasks/redisclient.py", line 41, in _task_submission_check_client_idle
raise TaskSubmissionCheckFailed(f"Client \"{taskrq['agent_id']}\" was not found")
agent.tasks.exceptions.TaskSubmissionCheckFailed: Client "module/goauthentik1" was not found
I re-run with --force argument and it works. I am tracking the logs to see if errors become again and will report back soon.