Initial cluster creation failed when adding module

I’ve just installed NethServer 8 using the automated script, but when I try to create the first cluster, the “add-module” task for the Grafana Loki image fails, and I can’t proceed.
image


This is the generated trace.

cluster/create-cluster

Task ID: 0689abf5-4a21-4726-9e16-0213eb306db3

<7>sed -i '/- 127.0.0.1$/a\ \ \ \ \ \ \ \ - 10.5.4.0/24' /home/traefik1/.config/state/configs/_api.yml
<7>sed -i -e '/cluster-localnode$/c\10.5.4.1 cluster-localnode' /etc/hosts
<7>systemctl stop wg-quick@wg0.service
<7>systemctl start wg-quick@wg0.service
Task cluster/add-module run failed: {'output': '', 'error': '<7>podman-pull-missing ghcr.io/nethserver/loki:1.1.0\nTrying to pull ghcr.io/nethserver/loki:1.1.0...\nGetting image source signatures\nCopying blob sha256:37ad42a8b4872d3b048a3da1cb123239579000777565098fe52749b7a5e4f04d\nCopying config sha256:3ef6544dd0708381fb5db263a83dfa4593194cc73d2635e2f4eaff2b4b61d51b\nWriting manifest to image destination\n3ef6544dd0708381fb5db263a83dfa4593194cc73d2635e2f4eaff2b4b61d51b\n<7>extract-ui ghcr.io/nethserver/loki:1.1.0\nExtracting container filesystem ui to /var/lib/nethserver/cluster/ui/apps/loki1\nui/index.html\nbaadd0b8f8ce20c4425a774b01ae9897a0e4aa084e941fbdf40ed7541eac3b4b\nAssertion failed\n  File "/var/lib/nethserver/cluster/actions/add-module/50update", line 223, in <module>\n    agent.assert_exp(create_module_result[\'exit_code\'] == 0) # Ensure create-module is successful\n', 'exit_code': 2}
Assertion failed
  File "/var/lib/nethserver/cluster/actions/create-cluster/50update", line 99, in <module>
    agent.assert_exp(add1_module_failures == 0)

cluster/add-module

Task ID: 43445749-3ba3-451d-9a5a-6e56676ddb86

<7>podman-pull-missing ghcr.io/nethserver/loki:1.1.0
Trying to pull ghcr.io/nethserver/loki:1.1.0...
Getting image source signatures
Copying blob sha256:37ad42a8b4872d3b048a3da1cb123239579000777565098fe52749b7a5e4f04d
Copying config sha256:3ef6544dd0708381fb5db263a83dfa4593194cc73d2635e2f4eaff2b4b61d51b
Writing manifest to image destination
3ef6544dd0708381fb5db263a83dfa4593194cc73d2635e2f4eaff2b4b61d51b
<7>extract-ui ghcr.io/nethserver/loki:1.1.0
Extracting container filesystem ui to /var/lib/nethserver/cluster/ui/apps/loki1
ui/index.html
baadd0b8f8ce20c4425a774b01ae9897a0e4aa084e941fbdf40ed7541eac3b4b
Assertion failed
  File "/var/lib/nethserver/cluster/actions/add-module/50update", line 223, in <module>
    agent.assert_exp(create_module_result['exit_code'] == 0) # Ensure create-module is successful

cluster/add-module

Task ID: 20c32dc5-0395-4f81-92cf-3f9148bc254a

cluster/create-module

Task ID: 8dc97ae8-22b4-4393-a59d-5e346bb02b61

Add to module/loki1 environment TRAEFIK_IMAGE=docker.io/traefik:v2.4
Add to module/loki1 environment LOKI_IMAGE=docker.io/grafana/loki:2.2.1
<7>podman-pull-missing docker.io/traefik:v2.4 docker.io/grafana/loki:2.2.1
Trying to pull docker.io/library/traefik:v2.4...
Getting image source signatures
Copying blob sha256:ddad3d7c1e96adf9153f8921a7c9790f880a390163df453be1566e9ef0d546e0
Copying blob sha256:5f6722e60c2f6c55424dadebe886f88ba1b903df075b00048427439abb91b85a
Copying blob sha256:3abdcd3bb40ca29c232ad12d1f2cba6efcb28e8d8ab7e5787ad2771b4e3862b0
Copying blob sha256:fe4701c53ae539044a129428575f42a0e0aa5e923d04b97466915bf45f5df0e3
Copying config sha256:de1a7c9d5d63d8ab27b26f16474a74e78d252007d3a67ff08dcbad418eb335ae
Writing manifest to image destination
de1a7c9d5d63d8ab27b26f16474a74e78d252007d3a67ff08dcbad418eb335ae
Trying to pull docker.io/grafana/loki:2.2.1...
Getting image source signatures
Copying blob sha256:753793ea21f66410ed4b05c828216728b6521b13a1de5939203258636f11eed5
Copying blob sha256:31603596830fc7e56753139f9c2c6bd3759e48a850659506ebfb885d1cf3aef5
Copying blob sha256:e4f5d1b1114583aca9dafec1f0cc4f8a21ae15f5ea04b0ac236c148e09aa5f54
Copying blob sha256:8818f75b6e26a7de66dd46182944abe3a695bcafe01f1bf4cffcab489b34f960
Copying blob sha256:858024693c419c0dd99a6d6ea88967d0f74cb3bdace948a115a6b7584019f644
Copying blob sha256:e94f8da5a2cba4300c6e74d1eef04f2cc7f39cb1d08303f803f119fe3e06ca0f
Copying blob sha256:586f67fcd52ef2996deebfe0c50f19ca0b2c59660e6efa9c76e580580dd8a5f9
Copying config sha256:727c39682956d63917122ed8b23b916821f7c850bed426ace01fabfe81530790
Writing manifest to image destination
727c39682956d63917122ed8b23b916821f7c850bed426ace01fabfe81530790
1
Created symlink /home/loki1/.config/systemd/user/default.target.wants/loki.service → /home/loki1/.config/systemd/user/loki.service.
Job for loki.service failed because the control process exited with error code.
See "systemctl --user status loki.service" and "journalctl --user -xeu loki.service" for details.

I’m not sure, but it seems the main error is:

Assertion failed
  File \"/var/lib/nethserver/cluster/actions/add-module/50update\", line 223, in <module>
    agent.assert_exp(create_module_result['exit_code'] == 0) # Ensure create-module is successful

Pointing to this code:

# Push the creation task for the new module.
create_module_result = agent.tasks.run(
    agent_id=f'module/{module_id}',
    action="create-module",
    data={
        'images': extra_images,
    },
    endpoint="redis://cluster-leader",
    check_idle_time=0, # disable idle check and wait until the agent is up
    progress_callback=agent.get_progress_callback(67,95),
)
agent.assert_exp(create_module_result['exit_code'] == 0) # Ensure create-module is successful

I can’t figure out why it doesn’t return 0.
Any thoughts on this matter?

HEllo @markfree and welcome to Nthserver community,

Could you kindly share with us, what OS youre using for the install?

Hi.
Thanks for getting back.

Initially I started with an Oracle Linux 9, but it failed as I described above.
Then, I switched to Debian 12 and got exactly the same error.


Only now did I realize that the installation process created a loki1 user.
So, I switched to it and tried to see more details about the error.

systemctl --user status loki.service command did not return a thing.

loki1@nethserver:~$ systemctl --user status loki.service
Failed to connect to bus: No medium found

But journalctl --user -xeu loki.service did show some more details.

Jul 02 23:59:07 nethserver podman[13570]:
Jul 02 23:59:07 nethserver podman[13570]: 2024-07-02 23:59:07.985288252 +0000 UTC m=+0.115720134 container create 9f81b1e44e2ed15ac8457017ec8469e37e93155bd85b10bc15f88df922b8dcbb (image=localhost/podman-pause:4.3.1-0, name=1902bd4cd35e-infra, pod_id=1902bd4cd35ea04fc5823efa4c69616e97ba83f20c9b9cee7283357af052d245, PODMAN_SYSTEMD_UNIT=loki.service, io.buildah.version=1.28.2)
Jul 02 23:59:07 nethserver podman[13570]: 2024-07-02 23:59:07.989336526 +0000 UTC m=+0.119768419 pod create 1902bd4cd35ea04fc5823efa4c69616e97ba83f20c9b9cee7283357af052d245 (image=, name=loki)
Jul 02 23:59:07 nethserver podman[13570]: 1902bd4cd35ea04fc5823efa4c69616e97ba83f20c9b9cee7283357af052d245
Jul 02 23:59:09 nethserver podman[13581]: Error: starting container 9f81b1e44e2ed15ac8457017ec8469e37e93155bd85b10bc15f88df922b8dcbb: /usr/bin/slirp4netns failed: "open(\"/dev/net/tun\"): No such file or directory\nWARNING: Support for seccomp is experimental\nWARNING: Support for IPv6 is experimental\nchild failed(1)\nWARNING: Support for seccomp is experimental\nWARNING: Support for IPv6 is experimental\n"
Jul 02 23:59:09 nethserver systemd[12959]: loki.service: Control process exited, code=exited, status=125/n/a

(...)

Jul 02 23:59:09 nethserver podman[13615]: 2024-07-02 23:59:09.12696667 +0000 UTC m=+0.056576957 container remove 9f81b1e44e2ed15ac8457017ec8469e37e93155bd85b10bc15f88df922b8dcbb (image=localhost/podman-pause:4.3.1-0, name=1902bd4cd35e-infra, pod_id=1902bd4cd35ea04fc5823efa4c69616e97ba83f20c9b9cee7283357af052d245, PODMAN_SYSTEMD_UNIT=loki.service, io.buildah.version=1.28.2)
Jul 02 23:59:09 nethserver podman[13615]: 2024-07-02 23:59:09.132574359 +0000 UTC m=+0.062184646 pod remove 1902bd4cd35ea04fc5823efa4c69616e97ba83f20c9b9cee7283357af052d245 (image=, name=loki)
Jul 02 23:59:09 nethserver podman[13615]: 1902bd4cd35ea04fc5823efa4c69616e97ba83f20c9b9cee7283357af052d245
Jul 02 23:59:09 nethserver systemd[12959]: loki.service: Failed with result 'exit-code'.

From all these lines, I noticed that slirp4netns could not create a tun interface.

/usr/bin/slirp4netns failed: "open(\"/dev/net/tun\"): No such file or directory

So, I had to change some options in my VM and restart all the installation process. Finally, the cluster was created.

I guess my problem was environmental after all.

Anyhow, I appreciate your time.