NS7 to NS8 migration tool fails to connect

I used the IP directly and disabled TLS. I can telnet to the IP on the SSH port from NS7 without issue.

1 Like

I can confirm I have too an issue, looking in it

1 Like

OK I started again on my real dev server, the difference with the laptop is that I have a bridged IP on my network and this time I succeed to connect to the NS8 and I have migrated a mattermost app

what application do you have installed on NS7 to migrate
What resources did you have allocated to the VM, is it real hardware ???

1 Like

I succeed to migrate a NS7 to rocky9 NS8 with mattermost and openldap (nethserver-directory), I used both IP and FQDN

1 Like

Both machines are virtual machines.

The old server does have a few bridged connections, not to mention the typical docker networks. The new machine doesn’t have any bridged connections (yet) - I just see the loopback, WAN interface, LAN interface, and wireguard interface.

There are many applications installed (including Nextcloud, Mail, Mattermost) and even more running in containers I plan to migrate manually.

I went to look again this weekend and now I see this error in the migration interface. Are there any known fixes for this before I start ripping things out of the new server?

Migrating ns7 to a working cluster could raise name conflicts. The user domain existence is checked, but other things like application HTTP routes are not.

If you really know what your’re doing, remove the conflicting user domain from the cluster and go on. Otherwise migrate to a new cluster.

The cluster wasn’t “working”. It’s a fresh install that I made by following the migration steps: NethServer 7 migration — NS8 documentation

Now that Beta 2 is out I reset the new server and gave migration another try. The NS7 still had the new NS8 node configured, so I clicked the button to disconnect from that node. I waited until it confirmed it had been disconnected and then I tried again.

I’m still getting an error, and not even getting as far as the VPN connection setup. Here’s the error:

Odd number of elements in hash assignment at /usr/share/perl5/vendor_perl/esmith/db.pm line 273.
{
  "steps": 2,
  "pid": 16509,
  "args": "",
  "event": "nethserver-ns8-migration-save"
}
{
  "step": 1,
  "pid": 16509,
  "action": "S05generic_template_expand",
  "event": "nethserver-ns8-migration-save",
  "state": "running"
}
{
  "progress": "0.50",
  "time": "0.108469",
  "exit": 0,
  "event": "nethserver-ns8-migration-save",
  "state": "done",
  "step": 1,
  "pid": 16509,
  "action": "S05generic_template_expand"
}
{
  "step": 2,
  "pid": 16509,
  "action": "S90adjust-services",
  "event": "nethserver-ns8-migration-save",
  "state": "running"
}
{
  "progress": "1.00",
  "time": "0.7759",
  "exit": 256,
  "event": "nethserver-ns8-migration-save",
  "state": "done",
  "step": 2,
  "pid": 16509,
  "action": "S90adjust-services"
}
{
  "pid": 16509,
  "status": "failed",
  "event": "nethserver-ns8-migration-save"
}
Traceback (most recent call last):
  File "/usr/sbin/ns8-join", line 152, in <module>
    subprocess.run(['/sbin/e-smith/signal-event', '-j', 'nethserver-ns8-migration-save'], check=True)
  File "/usr/lib64/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/sbin/e-smith/signal-event', '-j', 'nethserver-ns8-migration-save']' returned non-zero exit status 1.
{
  "id": "1696202961",
  "type": "CommandFailed",
  "message": "See /var/log/messages"
}

I’m guessing it has something to do with this function that is splitting out DB configuration and for some reason comes up with an incorrect number of arguments for a proper key/value pair:

sub _db_string_to_type_and_hash ($)
{
    my ($arg) = @_;
    return ('', ()) unless defined $arg;

    # The funky regex is to avoid escaped pipes.
    # If you specify a negative limit empty trailing fields are omitted.
    return split(/(?<!\\)\|/, $arg, -1);
}

Looking into this further, it appears the DB configuration is being saved in a way that this function doesn’t expect. Pulling it as JSON it makes sense:

{"props":{"admin":"admin","":"","disabled":"disabled","User":"admin","TLSVerify":"disabled","LeaderIpAddress":"10.5.4.1","Password":"$PASSWORD","Host":"$HOSTNAME","enabled":"enabled","$SOME_WEIRD_STRING":"TLSVerify"},"name":"ns8","type":"configuration"}

but pulling it as a raw value (which is being passed in to the function above) returns this garbled mess:

configuration|||$SOME_WEIRD_STRING|TLSVerify|Host|$HOST|LeaderIpAddress|10.5.4.1|Password|$PASSWORD|TLSVerify|disabled|User|admin|admin|admin|disabled|disabled|enabled|enabled

A quick regex test shows that the first | in configuration||| gets caught by the regex

@giacomo or @davidep - I see both of you in GitHub, perhaps you can provide some insight? :slight_smile:

I’m too nervous to play around with this RegEx in my prod instance, since I’m not sure if adjusting it will break other things.

1 Like

Do you have a | (pipe) character in NS8 admin password? E-smith DB does not support strings containing it and the UI validation logic could not protect the input data enough.

:face_with_raised_eyebrow: is this a… known fact?

Yes it is a limitation or bug of e-smith that has been never fixed…

Thanks for sharing.
I’d loose the bet that this particular detail was never into NethServer documentation?

No. Consider that user input must be validated and free strings, like passwords, are not stored in e-smith DB.

I’d suggest to write this detail into the migration procedure documentation.

1 Like

I did think of the “|” in my password before so I changed the password I was using, but the problem persists. After looking closer at db get ns8 and db getjson ns8 I can see that the mangled information from the previous password is still there. I’m guessing the migration tool doesn’t wipe it clean and instead keeps updating the coniguration object which leaves it in a non-working state.

I just ran db delete ns8 and confirmed the keys were gone. After trying it again with a password that doesn’t contain | it seems to work, but the migration tool is unable to open up.

I uninstalled the migration tool and then re-installed it, and now the configurations appear correct.

However, now upon trying to connect I see this error from wireguard:
Oct 02 09:57:54 $NETH7_URI wg-quick[4045]: Name or service not known: '$COMPANY_NAME-main.$NETH8_IP:55820'

Since uninstalling and re-installing the migration tool I did not type in $COMPANY_NAME-main in any dialog box, so I’m not sure where this string is coming from.

I manually edited /etc/wireguard/wg0.conf and removed this string so it just has the IP, but it gets overwritten again when trying to reconnect. I have a grep running now to try to find where this string is coming from, but I’m about out of time to debug this.

2 Likes

Is there anything I can do to get some more support on this? I was thinking of using my existing NS7 to help test NS8 beta - I’m not afraid of a few bugs and a little hassle; but this bug happening before I can even attempt the migration has me concerned. I have a second server primed and ready to go, but if this is unlikely to get resolved soon I’d rather shut it down and save some money until then.