Signal-event interface-update hangs

djx · August 3, 2020, 12:54pm

NethServer Version: 7.8.2003
Module: networking

I run signal-event interface-update and it just hangs there (in a local console, not SSH) for over 10 minutes.

I’m trying to reconfigure some things and I notice that signal-event interface-update hangs indefinitely. So because of this, my network configuration fails to generate and the server is unavailable.

How do I debug this to see why it’s not completing successfully?

Andy_Wismer · August 3, 2020, 1:08pm

@djx

Hi

It would greatly help, if you could add a few infos as to your situation:

Is NethServer directly installed or virtualized?
Is it running AD? (makes reconfiguration more than tricky…)

Normally, a network reconfiguration is done either by cockpit or by the older dashboard - and both do not involve a CLI command like signal-event interface-update .I am well aware of the e-smith template system…

So WHAT exactly did you change in which files?

My 2 cents
Andy

djx · August 3, 2020, 1:17pm

Yes it’s virtualized. Not running AD, just simple LDAP.

I’m trying to work around the static network interface issues with Hetzner, and these are the scripts that have been adjusted so far:

/etc/cloud/cloud.cfg.d/98-disable-network.cfg

network:
  config: disabled

/etc/e-smith/events/actions/hetzner-route

#!/bin/sh
# Make Hetzner route

cat <<EOF >> /etc/sysconfig/network-scripts/route-eth0
172.31.1.1/32 dev eth0
default via 172.31.1.1 dev eth0
EOF

chmod ug+x /etc/e-smith/events/actions/hetzner-route

ln -sf ../actions/hetzner-route /etc/e-smith/events/static-routes-save/S33hetzner-route

Andy_Wismer · August 3, 2020, 1:27pm

I suspected as much - at least the virtualized part…

I am a bit familiar with Hetzner, I helped a guy here set up his Proxmox server on Hetzner, and inside that, we installed NethServer…
We had a total of 8 usable IPs, we had to “waste” one as a router IP, for the rest of a subnet.
Hetzner knew we intended to install Proxmox (They even support that!), but the allocated IPs weren’t optimal. We even had to return one IP in not being usable, due to routing constricts…

Where does this come from? /etc/cloud/
Normally, there’s no such folder in NethServer.

I assume you installed Centos7, and on top of that then NethServer.

Does your server have one or two NICs?

If you prefer, you can send me the IPs involved in Hetzners config by PM, so as not to publicly expose unneeded information…
172.31.1.1/32

This does NOT seem logical, wrong subnet (/32 is a single host, no routing possible…)

Maybe this helps, it did for me (different case)…
https://wiki.nethserver.org/doku.php?id=virtual_network_interface&s[]=dummy

My 2 cents
Andy

pike · August 3, 2020, 3:19pm

two. Router and Broadcast, pal

Andy_Wismer · August 3, 2020, 3:24pm

@pike

No: besides the two obvious ones!

We had three isolated, routable IP, and a small subnet of 8 (6 usable!).

Only one IP was directly usable, Hetzner defined that all traffic had to come from this IP - they only support “routed” networking in Proxmox, no MAC-Addr and promiscious mode…

That meant one of the three isolated IPs was per se unusable.
One we had to use for internal routing.

Andy

djx · August 3, 2020, 5:08pm

Thanks for the follow-up

Both of these come from Hetzner’s documentation on how to configure a static ip:
https://docs.hetzner.com/cloud/servers/static-configuration/

I asked them a more general question about if they plan to support a more standard approach to virtualizing network cards instead of requiring special configuration. They simply said:

there are scripts installed which help configuring attached networks automatically. Those can be removed by removing the package “hc-utils”. If this is gone the network configuration is handled by cloudinit as usual.

Thanks to @mrmarkuz (in a separate thread), it was actually related to my DNS configuration pointing to 127.0.0.1 somehow. I’m not sure how it got set this way.

djx · August 5, 2020, 12:39pm

Looks like I marked this as closed prematurely.
After a restart I am getting this same issue again: interface-update seems to stop processing scripts after S30.

Some notes:

S30 is showing in /var/log/messages as completed successfully
There are no messages after this about other scripts failing
when I manually run the script for static routes it generates the configuration file with no problems
ps -C shows signal-event sitting there with 0 cpu usage (what is it waiting for?!)

So the general question is: how can I do a more detailed step-through of a signal-event call to see which step it fails on without manually editing the template files to report their status to a log?

djx · August 8, 2020, 5:23pm

The only “solution” I found was to make some intermediate events like “S31log” that just uses the logger to write an “I’m here!” message to /var/log/messages.