Authentik uninstall fails

Sorry, I’m out of ideas.
@davidep do you have an idea how to clean half installed/removed apps.

Recap:

Nobody else?

How are you, bro? Tries:

api-cli run remove-module --data ‘{“module_id”:“goauthentik1”, “force”:true, “no_preserve”:true}’

Have you checked if there are references, for example:

/var/lib/nethserver/cluster/*
/var/lib/nethserver/catalog/*

2 Likes

The error you saw (TaskSubmissionCheckFailed) indicates that the module registration is stuck in the task and agent management system, which relies on Redis. Manual removal involves deleting the Redis-related keys and metadata files.

Manipulating Redis directly can corrupt the NethServer 8 cluster state if not done correctly. Make a backup or make sure you understand the risk.

2 Likes

I found that a good old fashioned complete reboot of the server/node cleared and killed the NS8, no longer wanted but persistent, tasks in Node memory.

2 Likes

Server rebooted and error persists.

Maybe this thread has usefull information?

I have tried it before. Not usefull for my case.

Ping @davidep

1 Like

There might still be some Redis values containing a module_id reference that need to be manually cleaned up. The module_id is used as a value inside complex structures like HASH and SET, so it’s not always an obvious part of the key. For example, see the HASH cluster/module_node.

You can refer to the Python code for the complete list: ns8-core/core/imageroot/var/lib/nethserver/cluster/actions/remove-module/50update at main · NethServer/ns8-core · GitHub

2 Likes

Unfortunatelly I am not able to fix it.

I don’t think I have any suggestions on how to fix it, but “I am not able to fix it” really doesn’t give the rest of us (including those who might have such suggestions) much to go on. Detail really matters here–what exactly have you tried, what exactly was the result (including any error messages, logs, etc.), any observed behavior of the system (particularly anything that seems out of the ordinary), etc.

So sorry, but when I said unfortunatelly it’s because I have tried everything:

api-cli run remove-module --data '{"module_id": "goauthentik2", "force": true, "preserve_data": false}'
<3>Cannot retrieve the NODE_ID of goauthentik2

{"context":{"action":"remove-module","data":{"force":true,"module_id":"goauthentik2","preserve_data":false},"extra":{"description":"api-cli endpoint redis://cluster-leader","isNotificationHidden":false,"title":"cluster/remove-module"},"id":"d879b967-c97c-4fe2-ba67-6cfb82107ed7","parent":""},"status":"aborted","progress":0,"subTasks":[],"validated":false,"result":{"error":"<3>Cannot retrieve the NODE_ID of goauthentik2\n","exit_code":1,"file":"task/cluster/d879b967-c97c-4fe2-ba67-6cfb82107ed7","output":""}}

api-cli run update-module --data '{"module_url":"ghcr.io/geniusdynamics/goauthentik:latest","instances":["goauthentik1"],"force":true}'
Traceback (most recent call last):
  File "/var/lib/nethserver/cluster/actions/update-module/50update", line 40, in <module>
    ping_errors = agent.tasks.runp_brief([{"agent_id": f"module/{mid}", "action": "list-actions"} for mid in instances],
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/agent/pypkg/agent/tasks/run.py", line 61, in runp_brief
    results = asyncio.run(_runp(tasks, **kwargs))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/usr/local/agent/pypkg/agent/tasks/run.py", line 120, in _runp
    return await asyncio.gather(*runners, return_exceptions=(len(tasks) > 1))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/agent/pypkg/agent/tasks/run.py", line 127, in _run_with_protocol
    return await run_redisclient(taskrq, **pconn)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/agent/pypkg/agent/tasks/redisclient.py", line 77, in run_redisclient
    await _task_submission_check_client_idle(rdb, taskrq, kwargs['check_idle_time'])
  File "/usr/local/agent/pypkg/agent/tasks/redisclient.py", line 41, in _task_submission_check_client_idle
    raise TaskSubmissionCheckFailed(f"Client \"{taskrq['agent_id']}\" was not found")
agent.tasks.exceptions.TaskSubmissionCheckFailed: Client "module/goauthentik1" was not found
""

Trying suggestions from: https://community.nethserver.org/t/core-update-failed-lamp-has-no-active-instances/25663

runagent -m goauthentik1 grep IMAGE_URL environment
runagent: [FATAL] Cannot find module goauthentik1 in the local node

edis-cli hget module/goauthentik2/environment IMAGE_URL
(nil)

api-cli run get-configuration --agent module/goauthentik1
TaskSubmissionCheckFailed: Client "module/goauthentik1" was not found

I could reproduce the issue and following commands removed the app and the log entries, just adapt goauthentik1 to the app instance you want to remove.

APPTOREMOVE=goauthentik1

redis-cli hset module/${APPTOREMOVE}/environment NODE_ID "1"
redis-cli ACL SETUSER module/${APPTOREMOVE}
remove-module --no-preserve ${APPTOREMOVE}
1 Like

We need this permanently documented somewhere, since we also had instances where we coulnt remove Some Apps, for whatever reason compeltely.

2 Likes

Let’s wait for a confirmation that it really works.

1 Like

I guess the same is valid for the NODE_ID ?

e.g:
APPTOREMOVE=goauthentik1
NODEIDNR=1 (not sure with or without “ “)

redis-cli hset module/${APPTOREMOVE}/environment NODE_ID ${NODEIDNR}

(or “${NODEIDNR}” ) ?

Just spinning thoughts.

1 Like

I’m not sure if it just needs a value or if it needs to be the right value, for a single node cluster “1” should usually work.

redis-cli hset module/${APPTOREMOVE}/environment NODE_ID "3"
(integer) 1

edis-cli ACL SETUSER module/${APPTOREMOVE}
OK

remove-module --no-preserve ${APPTOREMOVE}
Traceback (most recent call last):
  File "/var/lib/nethserver/cluster/actions/remove-module/50update", line 72, in <module>
    raise ex
  File "/var/lib/nethserver/cluster/actions/remove-module/50update", line 57, in <module>
    destroy_module_result = agent.tasks.run(
                            ^^^^^^^^^^^^^^^^
  File "/usr/local/agent/pypkg/agent/tasks/run.py", line 39, in run
    results = runp([taskrq], **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/agent/pypkg/agent/tasks/run.py", line 50, in runp
    return asyncio.run(_runp(tasks, **kwargs))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/usr/local/agent/pypkg/agent/tasks/run.py", line 120, in _runp
    return await asyncio.gather(*runners, return_exceptions=(len(tasks) > 1))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/agent/pypkg/agent/tasks/run.py", line 127, in _run_with_protocol
    return await run_redisclient(taskrq, **pconn)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/agent/pypkg/agent/tasks/redisclient.py", line 77, in run_redisclient
    await _task_submission_check_client_idle(rdb, taskrq, kwargs['check_idle_time'])
  File "/usr/local/agent/pypkg/agent/tasks/redisclient.py", line 41, in _task_submission_check_client_idle
    raise TaskSubmissionCheckFailed(f"Client \"{taskrq['agent_id']}\" was not found")
agent.tasks.exceptions.TaskSubmissionCheckFailed: Client "module/goauthentik1" was not found
1 Like

I re-run with --force argument and it works. I am tracking the logs to see if errors become again and will report back soon.

3 Likes