Rsync-upgrade stops with error, have to reboot

jaapvdv · June 3, 2017, 4:43am

NethServer Version: 7.3.1611

rsync-upgrade/NethServer crashes with the following messages:

rsync: write failed on “/var/lib/nethserver/vmail/username/Maildir/.foldername/cur/1461802417.M554647P25724.mail.eisergon.nl,S=2216,W=2273:2,Sab”: Input/output error (5)
rsync error: error in file IO (code 11) at receiver.c(322) [receiver=3.0.9]
rsync: connection unexpectedly closed (46 bytes received so far) [generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(605) [generator=3.0.9]
/sbin/e-smith/rsync-upgrade: line 40: rm: command not found

[screen caught signal 7. (core dumped)]

The system responds to every command in console with:

[root@hostname ~]# ls
-bash: ls: command not found

After (force) rebooting the VM (Proxmax VE) - the first time - I was able to restart rsync-upgrade (apparently picking up where it left) eventually crashing the machine again.

The second time the following error occurs and keeps occurring (also after reboot):

Cannot exec ‘rsync-upgrade’ No such file or directory

[screen is terminating]

As far as I can tell the logs provide no indication to what is going wrong.

I uninstalled NethServer-Backup, rebooted and reinstalled. Had some httpd related errors on uninstall and install.

After reboot no web gui . . . .

davidep · June 3, 2017, 8:40am

We should investigate this.

The package uninstall triggered a full system reconfiguration. As the rsync was partial we expect an inconsistent system state. I’d say to reinstall from scratch.

@giacomo what do you think?

jaapvdv · June 3, 2017, 8:52am

Full reinstall is already in progress

Anything I can do, VM settings?? , to prevent this from happening again. More memory, cores . . . ?

davidep · June 3, 2017, 9:02am

The “command not found” error makes me think the rsync wipes out unwanted dirs. But I’m sure Giacomo can shed some light on this.

giacomo · June 5, 2017, 9:58am

The rsync command should synchronize only directories listed inside the backup configuration.

If something goes wrong, please try to see what rsync is doing and check your original config.

jaapvdv · June 6, 2017, 1:30pm

OK, I’ll try again and report back!

jaapvdv · June 8, 2017, 5:26pm

Well, I did a complete new install off NS7 in ProxMox (2 processor cores / 5GB memory / virtual HD of 1TB)

Updated all components in software center

Installed the following packages:

Backup
Dutch language
Email
NextCloud
SOGo groupware

Configured the new server (hostname, certificate and accounts provider-LDAP)

Followed the upgrade with rsync instructions in the documentation:

screen rsync-upgrade sourceserver port

Had to rerun 2 or 3 times because of errors (timeouts and such - probably because destination and source server are running respectively on ProxMox and ESXi using the same NFS server).

Finally finished the rsync-upgrade, rebooted for good measure.

Issued the ‘final’ command:

screen rsync-upgrade -u sourceserver port

All seemed well until an error message occurred telling me that the ‘post-restore-config’ did not complete successfully . . . .

Contents of restore-config.log:

2017-06-08 16:44:32 - START - Restore config started
2017-06-08 16:44:33 - STEP - pre-restore-config done
2017-06-08 16:44:38 - STEP - restore-config-execute done
2017-06-08 16:45:40 - STEP - restore-config-reinstall done
2017-06-08 16:54:01 - ERROR - Event post-restore-config failed - 256

After this message, the network interfaces on the destination server were down, obvious since source and destination have the seem IP-address after the upgrade, but probably something to mention in the admin manual.

/var/log/messages does not seem to reveal anything suspicious

After the upgrade IMAP/SMTP seemed to be fine, SOGo is not.

/var/log/sogo/sogo.log:

Jun 08 16:00:05 sogod [1628]: vmem size check enabled: shutting down app when vmem > 512 MB. Currently at 312 MB
Jun 08 16:00:06 sogod [1628]: <0x0x7ff225ec93d0[SOGoProductLoader]> SOGo products loaded from ‘/usr/lib64/GNUstep/SOGo’:
Jun 08 16:00:06 sogod [1628]: <0x0x7ff225ec93d0[SOGoProductLoader]> AdministrationUI.SOGo, Appointments.SOGo, CommonUI.SOGo, Contacts.SOGo, ContactsUI.SOGo, MailPartViewers.SOGo, Mailer.SOGo, MailerUI.SOGo, MainUI.SOGo, PreferencesUI.SOGo, SchedulerUI.SOGo, ActiveSync.SOGo
Jun 08 16:00:07 sogod [1628]: All products loaded - current memory usage at 357 MB
Jun 08 16:00:07 sogod [1628]: <0x0x7ff225fbfe80[WOWatchDog]> listening on 127.0.0.1:20000
Jun 08 16:00:07 sogod [1628]: <0x0x7ff225fbfe80[WOWatchDog]> watchdog process pid: 1628
Jun 08 16:00:07 sogod [1628]: <0x0x7ff223208980[WOWatchDogChild]> watchdog request timeout set to 60 minutes
Jun 08 16:00:07 sogod [1628]: <0x0x7ff225fbfe80[WOWatchDog]> preparing 10 children
Jun 08 16:00:07 sogod [1628]: <0x0x7ff225fbfe80[WOWatchDog]> child spawned with pid 1667
Jun 08 16:00:07 sogod [1628]: <0x0x7ff225fbfe80[WOWatchDog]> child spawned with pid 1668
Jun 08 16:00:07 sogod [1628]: <0x0x7ff225fbfe80[WOWatchDog]> child spawned with pid 1669
Jun 08 16:00:07 sogod [1628]: <0x0x7ff225fbfe80[WOWatchDog]> child spawned with pid 1670
Jun 08 16:00:07 sogod [1628]: <0x0x7ff225fbfe80[WOWatchDog]> child spawned with pid 1671
Jun 08 16:00:07 sogod [1628]: <0x0x7ff225fbfe80[WOWatchDog]> child spawned with pid 1672
Jun 08 16:00:07 sogod [1628]: <0x0x7ff225fbfe80[WOWatchDog]> child spawned with pid 1673
Jun 08 16:00:07 sogod [1628]: <0x0x7ff225fbfe80[WOWatchDog]> child spawned with pid 1674
Jun 08 16:00:07 sogod [1628]: <0x0x7ff225fbfe80[WOWatchDog]> child spawned with pid 1675
Jun 08 16:00:07 sogod [1628]: <0x0x7ff225fbfe80[WOWatchDog]> child spawned with pid 1676
2017-06-08 16:00:07.891 sogod[1673:1673] ERROR: could not open MySQL4 connection to database ‘sogo’: Can’t connect to local MySQL server through socket ‘/var/lib/mysql/mysql.sock’ (2)
Jun 08 16:00:07 sogod [1673]: [ERROR] <0x0x7ff225f916d0[GCSChannelManager]> could not open channel <MySQL4Channel[0x0x7ff226023f10] connection=0x(null)> for mysql://localhost/sogo/sogo_user_profile
Jun 08 16:00:07 sogod [1673]: [WARN] <0x0x7ff225f916d0[GCSChannelManager]> will prevent opening of this channel 5 seconds after 2017-06-08 16:00:07 +0200

So, my second try at upgrading to NS7 wasn’t successful either so far.

giacomo · June 9, 2017, 7:39am

It seems something went wrong with mysql.
Check what is failed:

grep FAILED /var/log/messages

jaapvdv · June 9, 2017, 9:14am

Jun 8 16:47:06 customer-213-233-226-123 esmith::event[2622]: Action: /etc/e-smith/events/interface-update/S70network-start FAILED: 1 [1.901757]
Jun 8 16:48:49 customer-213-233-226-123 esmith::event[3186]: Action: /etc/e-smith/events/actions/adjust-services FAILED: 1 [99.049997]
Jun 8 16:51:50 customer-213-233-226-123 esmith::event[3186]: Event: trusted-networks-modify FAILED
Jun 8 16:51:50 customer-213-233-226-123 esmith::event[2622]: Action: /etc/e-smith/events/interface-update/S95trusted-networks-modify FAILED: 1 [280.305252]
Jun 8 16:51:50 customer-213-233-226-123 esmith::event[2622]: Event: interface-update FAILED
Jun 8 16:51:50 customer-213-233-226-123 esmith::event[2388]: Action: /etc/e-smith/events/post-restore-config/S70interface-update-cond FAILED: 1 [325.626031]
Jun 8 16:52:05 customer-213-233-226-123 esmith::event[3624]: Action: /etc/e-smith/events/actions/adjust-services FAILED: 1 [1.24129]
Jun 8 16:52:05 customer-213-233-226-123 esmith::event[3624]: Event: nethserver-httpd-update FAILED
Jun 8 16:53:08 customer-213-233-226-123 esmith::event[4270]: Action: /etc/e-smith/events/actions/adjust-services FAILED: 1 [0.284688]
Jun 8 16:53:08 customer-213-233-226-123 esmith::event[4270]: Event: nethserver-nextcloud-update FAILED
Jun 8 16:53:13 customer-213-233-226-123 esmith::event[4843]: Action: /etc/e-smith/events/actions/adjust-services FAILED: 1 [4.24589]
Jun 8 16:53:13 customer-213-233-226-123 esmith::event[4843]: Event: nethserver-sogo-update FAILED
Jun 8 16:53:13 customer-213-233-226-123 esmith::event[4240]: Action: /etc/e-smith/events/nethserver-sssd-save/S80nethserver-sssd-notifyclients FAILED: 1 [14.467475]
Jun 8 16:53:24 customer-213-233-226-123 esmith::event[4240]: Event: nethserver-sssd-save FAILED
Jun 8 16:53:24 customer-213-233-226-123 esmith::event[4122]: Action: /etc/e-smith/events/nethserver-directory-update/S90nethserver-directory-sssd FAILED: 1 [26.851308]
Jun 8 16:53:24 customer-213-233-226-123 esmith::event[4122]: Event: nethserver-directory-update FAILED
Jun 8 16:53:33 customer-213-233-226-123 esmith::event[4979]: Action: /etc/e-smith/events/actions/adjust-services FAILED: 1 [0.279061]
Jun 8 16:53:33 customer-213-233-226-123 esmith::event[4979]: Event: nethserver-nextcloud-update FAILED
Jun 8 16:53:34 customer-213-233-226-123 esmith::event[5552]: Action: /etc/e-smith/events/actions/adjust-services FAILED: 1 [0.230778]
Jun 8 16:53:34 customer-213-233-226-123 esmith::event[5552]: Event: nethserver-letsencrypt-update FAILED
Jun 8 16:53:57 customer-213-233-226-123 esmith::event[6078]: Action: /etc/e-smith/events/actions/adjust-services FAILED: 1 [0.236056]
Jun 8 16:53:57 customer-213-233-226-123 esmith::event[6078]: Event: nethserver-httpd-admin-update FAILED
Jun 8 16:53:58 customer-213-233-226-123 esmith::event[6103]: Action: /etc/e-smith/events/actions/adjust-services FAILED: 1 [0.572373]
Jun 8 16:53:58 customer-213-233-226-123 esmith::event[6103]: Event: nethserver-sogo-update FAILED
Jun 8 16:54:01 customer-213-233-226-123 esmith::event[6259]: Action: /etc/e-smith/events/runlevel-adjust/S20runlevel-adjust FAILED: 1 [1.610924]
Jun 8 16:54:01 customer-213-233-226-123 esmith::event[6259]: Event: runlevel-adjust FAILED
Jun 8 16:54:01 customer-213-233-226-123 esmith::event[2388]: Action: /etc/e-smith/events/post-restore-config/S70system-adjust FAILED: 1 [130.367836]
Jun 8 16:54:01 customer-213-233-226-123 esmith::event[2388]: Event: post-restore-config FAILED
Jun 8 16:54:33 customer-213-233-226-123 esmith::event[6602]: Action: /etc/e-smith/events/post-restore-data/S02mysql-load-tables FAILED: 1 [32.299063]
Jun 8 16:54:47 customer-213-233-226-123 esmith::event[6602]: Event: post-restore-data FAILED
Jun 8 17:57:57 customer-213-233-226-123 esmith::event[5965]: Action: /etc/e-smith/events/actions/adjust-services FAILED: 1 [0.78471]
Jun 8 17:57:59 customer-213-233-226-123 esmith::event[5965]: Event: trusted-networks-modify FAILED
Jun 8 17:57:59 customer-213-233-226-123 esmith::event[4541]: Action: /etc/e-smith/events/interface-update/S95trusted-networks-modify FAILED: 1 [2.703886]
Jun 8 17:57:59 customer-213-233-226-123 esmith::event[4541]: Event: interface-update FAILED
Jun 8 17:58:47 customer-213-233-226-123 esmith::event[7691]: Action: /etc/e-smith/events/actions/adjust-services FAILED: 1 [0.761838]
Jun 8 17:58:49 customer-213-233-226-123 esmith::event[7691]: Event: trusted-networks-modify FAILED
Jun 8 17:58:49 customer-213-233-226-123 esmith::event[6498]: Action: /etc/e-smith/events/interface-update/S95trusted-networks-modify FAILED: 1 [3.448995]
Jun 8 17:58:49 customer-213-233-226-123 esmith::event[6498]: Event: interface-update FAILED

giacomo · June 9, 2017, 10:28am

A lot of things have failed …

You need to inspect each update event.

jaapvdv · June 9, 2017, 1:06pm

Question:
Do I need to install more packages on the destination than Backup before upgrading?

Does the upgrade script take care of installing the missing packages?

giacomo · June 9, 2017, 1:45pm

All packages are automatically installed because the script calls restore-config command.

jaapvdv · June 9, 2017, 2:13pm

The behavior displayed above (/var/log/messages) could be caused by having the packages setup prior to the upgrade?

giacomo · June 12, 2017, 7:44am

No, we did the same thing when migrating our production environment.

jaapvdv · June 15, 2017, 9:15am

I am getting close now. This is what I did:

Installed NethServer 7;
Installed the available updates;
Installed backup;
Configured Accounts provider LDAP local;
Issued command:
screen rsync-upgrade ip-address port;
After this was finished successfully I issued the command:
screen rsync-upgrade -u ip-address port;
In my case it took quite a while after I saw the on-screen message Finished! before the upgrade continued.
After the upgrade completely finished (console returned to regular prompt) I shut down the old server;
On the new server - still in console, no network - I entered the command:
systemctl restart network.

The only problem that is still left is SOGo; address book and calendar are fine (both in the wegbui as in my Calendar and Addressbook software).

Mail however only shows empty -standard- folders, the folder structure actually on the server is not available in the wegbui (in a mailclient there is no problem).

When clicking on a folder, for instance Drafts, a message is shown:

An error occured while communicating with the mail server

I am however able to access my mail account (on the same NethServer installation) by adding an IMAP account using the same credentials used in a mail client.

All other users on the server have the same problem.

/var/log/sogo/sogo.log contains the following:

2017-06-15 07:23:14.881 sogod[1506:1506] ERROR: could not open MySQL4 connection to database ‘sogo’: Can’t connect to local MySQL server through socket ‘/var/lib/mysql/mysql.sock’ (2)
Jun 15 07:23:14 sogod [1506]: [ERROR] <0x0x7fec9fc82450[GCSChannelManager]> could not open channel <MySQL4Channel[0x0x7fec9fd53f10] connection=0x(null)> for mysql://localhost/sogo/sogo_user_profile
Jun 15 07:23:14 sogod [1506]: [WARN] <0x0x7fec9fc82450[GCSChannelManager]> will prevent opening of this channel 5 seconds after 2017-06-15 07:23:14 +0200

How do I correct this?

Continued:

Had a more thorough look at /var/log/messages:
Event: interface-update FAILED -> ip-address in use by other device on network (correct = source server, how to prevent this)
Event: nethserver-httpd-update FAILED
Event: nethserver-sogo-update FAILED
Event: nethserver-sssd-save FAILED
Event: nethserver-directory-update FAILED
Event: nethserver-letsencrypt-update FAILED
Event: nethserver-httpd-admin-update FAILED
Event: nethserver-sogo-update FAILED
Event: runlevel-adjust FAILED
Event: post-restore-config FAILED

It seems the MySQL error is gone, I am also able to connect to the sogo database using the credentials in /etc/sogo/sogo.conf.

When accessing the Drafts sogo.log shows the following:
Jun 15 11:56:46 sogod [15059]: [ERROR] <0x0x7f8f7a8daeb0[NGImap4Connection]> could not select URL: imap://digitaal%40eisergon.nl@localhost/Drafts/: {RawResponse = "{ResponseResult = {description = \"Mailbox doesn't exist: Drafts\"; result = no; tagId = 6; }; }"; reason = "Mailbox doesn't exist: Drafts"; result = 0; } Jun 15 11:56:46 sogod [15059]: 10.9.8.1, 10.9.8.160 "POST /SOGo/so/jaapvdv/Mail/0/folderDrafts/view HTTP/1.1" 500 72/45 1.198 - - 0

When accessing the a shared folder sogo.log shows the following (both in the ‘standard’ account as in the separately added account):
Jun 15 11:58:11 sogod [15059]: [ERROR] <0x0x7f8f7a8b21b0[NGImap4Connection]> could not select URL: imap://digitaal%40eisergon.nl@localhost/Public/: {RawResponse = "{ResponseResult = {description = \" Permission denied\"; flag = NOPERM; result = no; tagId = 6; }; }"; access = NOPERM; reason = " Permission denied"; result = 0; } Jun 15 11:58:11 sogod [15059]: 10.9.8.1, 10.9.8.160 "POST /SOGo/so/jaapvdv/Mail/0/folderPublic/view HTTP/1.1" 500 72/45 0.969 - - 0

jaapvdv · June 16, 2017, 11:36am

I noticed that SOGo is using the first mail alias (mail) mentioned in the LDAP directory for any user as the userid to connect to the mailserver. It should be (at least in my situation); uid followed by @domain.tld.

As far as I can tell previously (Nethserver 6.9) )only the uid was needed!

Can anyone tell me how to correct this behavior?

So far I have been able to set the uid as the username (by modifying /ext/sogo/sogo.conf), no clue how to add @domain.tld.