Lost users on NethServer v6.10 -> v7.8 upgrade

NethServer Version: NethServer release 7.8.2003 (final)
Module: sssd + AD/Samba possibly.

Hi,

I hope there’s someone out there that can help. I’ve been using NethServer for quite a few years and I’ve trawled through all the existing similar posts that I’ve found but haven’t come across a solution yet.

I decided to take the plunge and upgrade from v6 to v7 and I’d staged the 1st part of the upgrade waiting for an opportune moment to do the reboot. However, today we had a power outage and my hand was forced and during which the power cycled several times before I got to the power switch…

It seemed to mostly go OK and I fixed the httpd problem that stopped the GUI from starting but I’m now stuck with a box that has lost all its user info. The biggest problem is that emails are getting bounced by Postfix with “User does not exist” and I can’t log into it as me over IMAP to collect email. As you can imagine, this is quite a headache as we get all our personal emails delivered to this box.

In v6, it was running as a Samba workstation, no NT4 domain or AD config. The br0 and extra AD.DOMAIN.COM IP address seemed to get added OK during the 1st phase of the upgrade.

The primary cause seems to be sssd as the config file is completely empty. How should this get constructed? Also, I got the following error showing repeatedly in the log:

Jun  2 14:15:01 server esmith::event[17617]: [ERROR] could not connect to Samba Domain Controller
Jun  2 14:15:01 server esmith::event[17617]: Action: /etc/e-smith/events/nethserver-dc-save/S95nethserver-dc-waitstart FAILED: 1 [904.857944]
Jun  2 14:15:03 server esmith::event[17617]: Log to /var/spool/createldapservice-AFDBCa.log
Jun  2 14:15:03 server esmith::event[17617]: + errors=0
Jun  2 14:15:03 server esmith::event[17617]: ++ get_dn ldapservice
Jun  2 14:15:03 server esmith::event[17617]: ++ /usr/bin/ldbsearch -H /var/lib/samba/private/sam.ldb sAMAccountName=ldapservice dn
Jun  2 14:15:03 server esmith::event[17617]: ++ sed -n '/^dn: / { s/\r// ; p ; q }'
Jun  2 14:15:03 server esmith::event[17617]: ltdb: tdb(/var/lib/samba/private/sam.ldb): tdb_open_ex: could not open file /var/lib/samba/private/sam.ldb: No such file or directory
Jun  2 14:15:03 server esmith::event[17617]: 
Jun  2 14:15:03 server esmith::event[17617]: Unable to open tdb '/var/lib/samba/private/sam.ldb': No such file or directory
Jun  2 14:15:03 server esmith::event[17617]: Failed to connect to '/var/lib/samba/private/sam.ldb' with backend 'tdb': Unable to open tdb '/var/lib/samba/private/sam.ldb': No such file or directory
Jun  2 14:15:03 server esmith::event[17617]: Failed to connect to /var/lib/samba/private/sam.ldb - Unable to open tdb '/var/lib/samba/private/sam.ldb': No such file or directory
Jun  2 14:15:03 server esmith::event[17617]: + [[ -z '' ]]
Jun  2 14:15:03 server esmith::event[17617]: + samba-tool user create ldapservice --random-password --must-change-at-next-login --login-shell=/usr/bin/false '--given-name=NethServer LDAP simple auth identity' --use-username-as-cn
Jun  2 14:15:03 server esmith::event[17617]: ltdb: tdb(/var/lib/samba/private/sam.ldb): tdb_open_ex: could not open file /var/lib/samba/private/sam.ldb: No such file or directory
Jun  2 14:15:03 server esmith::event[17617]: 
Jun  2 14:15:03 server esmith::event[17617]: Unable to open tdb '/var/lib/samba/private/sam.ldb': No such file or directory
Jun  2 14:15:03 server esmith::event[17617]: Failed to connect to 'tdb:///var/lib/samba/private/sam.ldb' with backend 'tdb': Unable to open tdb '/var/lib/samba/private/sam.ldb': No such file or directory

So, with my Sherlock cap on, I’m guessing that whatever script generated the necessary AD/Samba/sssd config didn’t get to do its work.

The basic IPv4 NAT, DNS, DHCP, etc. is working fine but the higher level identity and authentication has failed.

I can obviously add any relevant log information but I didn’t want to fill my 1st post with unnecessary spam.

I have already tried to yum reinstall sssd and nethserver-dc but this hasn’t helped.

I’m hoping there’s someone out there that can!

Cheers.

@camski

Hello Dylan
and welcome to the NethServer Community!

A bit of clarification to understand your situation…

Did the power outage induce the migration, or did the power outage happen during the migration, or even after the migration?

Are any backups of the old / new system available?

What kind of hardware are you running on?

My 2 cents
Andy

Hi Andy,

Thanks for replying. As I’d done the 1st stage of “priming” the upgrade by doing the download, the outages both induced the migration and then it was possibly interrupted by the subsequent on/off cycles before I managed to hit the power switch. It’s probably the worst-case scenario…

I have system and data backups and I’m in the process now of building a VM with NS6.10 to try to perform a clean migration. It’s a PITA and too much like my day job…

The hardware is a Gigabyte board with an integrated Celeron N3150 CPU, 4GB RAM, dual Ethernet and 240GB SSD.

Any suggestions? I don’t know enough about the upgrade works and if there are file backups made at each upgrade stage that I could possibly roll-back the upgrade to and then re-run the steps by hand.

One bit I forgot to put in earlier was that when I go to the Users and Groups menu I just get the spinning twizzler permanently which tells me that it can’t find any…

Cheers.

I think you’re right, that’s one of the worst cases I could imagine in a migration, where not the hardware is at fault. As the saying goes Sh*t happens! (But it still needs to be cleaned up!)

Well, using the right virtualization can also speed things up!
You’re going the way I would, you have backups of the old system!.

Personally, I’d suggest Proxmox virtualization as a basis to help your migration. 20 minutes to install on almost any hardware, even an almost 10 year old Mac Mini… And NethServer 6x and 7x run both extremly well on Proxmox…

But what virtualization doesn’t really matter, Proxmox, ESXi or Virtualbox or whatever. Xen can be a bit of a PiTA in your situation…
As long as you have enough space, doing a “virtual” migration is quite OK. Backup / Restore to get back on the real hardware…

I can help you in the disaster recovery…

A Disk of 240 GB size should not take more than 1-2 hours to migrate, even with a Celeron.

My 2 cents
Andy

Hi Andy,

Thanks for your offer of help, much appreciated.

It’s getting the balance between the technical challenge of fixing it as it stands vs. old version on VM, restore backup, upgrade and migrate vs. new vanilla install and repopulate the info and emails.

As is typical, I’m currently doing a bit of all three; I have a Virtualbox VM of the old version which I’m building, I’m still tinkering with the broken upgrade and I have a new SSD that I was going to use for my NAS box upgrade which I can use to build a new v7.8 box. I know from bitter experience at one job that upgrades can be evil things as they can break not just because of power-outages but also from hidden faults that only manifest themselves 3 or 4 steps down the line. So, I may well just abandon the old instance and rebuild a new one and then as long as I’m careful I can copy back the old emails via IMAP.

At least this isn’t a commercial server, it’s at home; just with lock-down I can’t escape the boss as she wants her emails! :slight_smile:

If i have an eureka moment, I’ll let you know.

Cheers.

@camski

Hi Dylan

I wouldn’t bother repairng the screwed up system, too many variables, too much risk of being compromised due to the botched migration.

Restore the old running system, and then start a new migration, eg via Backup from the restored VM on Virtualbox to your existing hardware with a freshly installed Nethserver.

If it’s sufficient for you to be able to copy back your data (files & mails) you could do a fresh installation, create your users (as they wore on the old system), and copy over the storage folders in
/var/lib/nethserver/vmail
/var/lib/nethserver/home
/var/lib/nethserver/ibays
to the newly installed server.
After that you’ld need to correct permissions (chown…) on the subfolders, but luckily, on a home system, there aren’t too many users…

This would work… (I’ve also done this before.).

Your mileage may vary, but in any case good luck and no more outages, 'til you’re back up & running!

Sometimes the WAF factor is more important than Hardware… :slight_smile:
(WAF=Wife Acceptancy Factor)

My 2 cents
Andy

Hi Andy,

Thanks again for your reply.

I’m very tempted by going down the route of a fresh install onto a new drive and copying the files from the locations you listed. Nicely hacky but without too much risk :slight_smile: I’m always on the lookout for a shortcut that will save some time.

Would I then need to run any sort of ‘config’ or ‘db’ command afterwards?

As long as MS Teams, Ebay and emails are accessible, the WAF-factor is manageable!

Cheers

If you’re doing a new install on NS7x, and you have manually created the needed users / groups / ibays there should only be permissions needed to be set…
config or db commands would not be needed, everything is there where the server expects it!

A reboot first, BEFORE accessing the server to test would be in order!

This route also cleans up old ballast, like old test installs (never completly purged)… :slight_smile:

Good Luck

Ping again if you need more tips or are stuck!

Andy

@camski

Hi

As a small additional tip:

I’ll make a list of ALL folders I want transfered. If anything is missing, I can still do that later…

I often use rsync in these cases…

A sample list:

rsync -avzu -e ssh --delete /mnt/old-b/etc/ root@192.168.209.20:/AAB/etc/
rsync -avzu -e ssh --delete /mnt/old-b/root/ root@192.168.209.20:/AAB/root/
rsync -avzu -e ssh --delete /mnt/old-b/usr/ root@192.168.209.20:/usr/
rsync -avzu -e ssh --delete /mnt/old-b/srv/ root@192.168.209.20:/srv/
rsync -avzu -e ssh --delete /mnt/old-b/sys/ root@192.168.209.20:/sys/
rsync -avzu -e ssh /mnt/old-b/var/ root@192.168.209.20:/var/
rsync -avzu -e ssh --delete /mnt/old-b/var/lib/pgsql/ root@192.168.209.20:/var/lib/pgsql/
rsync -avzu -e ssh --delete /mnt/old-b/var/lib/mysql/ root@192.168.209.20:/var/lib/mysql/
rsync -avzu -e ssh --delete /mnt/old-b/var/www/ root@192.168.209.20:/var/www/
rsync -avzu -e ssh --delete /mnt/old-b/usr/share/zabbix/ root@192.168.209.20:/usr/share/zabbix/ 

In this case I actually moved a whole “screwed up” nethserver (Power Outage, no UPS, Home Server of a friend) to a new VM on a new Proxmox server. The Old server was then 8 years old…

Andy

Hi @Andy_Wismer,

It’s now been up and running overnight and all the emails, etc. are working. I just need to add whatever firewall port-forwarding rules, email whitelist domains, DNS and DHCP host settings that I can remember.

I’ll just add the following steps I did after the install in case anyone else needs them in future:

  1. Add whatever software applications you previously had.
  2. If you’re running the email service, unplug the Red/Internet interface. Otherwise, Postfix will bounce emails with unknown user or domain instead of the relaying MTA backing off & retrying later.
  3. Add all the users you previously had.
  4. Add the mail domains you previously had.
  5. If you have the contents of the relevant user-related data directories, now’s the time to add it. I only needed the contents of /var/lib/nethserver/vmail and I only copied the directories with @ in them as the rest seem to be ‘system’ accounts
  6. If you want and can, now’s the time to check if you can connect to the user accounts. For me, it was just emails so I tested with Thunderbird. I had SSL cert warnings because they’d changed; once they were ignored I could then get to my emails. Run ‘tail -f /var/log/imap’ to monitor if you like.
  7. Start a ‘tail -f /var/log/maillog’, plug the Red/Internet interface back in and watch the emails start to come in, hopefully without any further problem!

Thanks Andy for your pointer to the vmail directory and general help. I just need to work out how to mark this now as Resolved.

Cheers.

Hi

Glad it worked for you!
I think you’ve already marked this as solved!

Cheers!
Andy