SSSD ldap error: Cannot contact any KDC

fasttech · November 3, 2017, 11:16pm

Sorry @compsos I didn’t mean to ping you when I copied your log.

I’m having this issue with my problem production server… my other servers, test, are all 693.5.2, does anyone know if this kernel corrects the issue? I ran into this while trying to update the dc… which failed with an auth error banner and this log;

Oct 28 16:48:21 server7c [sssd[ldap_child[17207]]]: Failed to initialize credentials using keytab [MEMORY:/etc/krb5.keytab]: Cannot contact any KDC for realm 'burbledo.COM'. Unable to create GSSAPI-encrypted LDAP connection.
Oct 28 16:48:21 server7c [sssd[ldap_child[17207]]]: Cannot contact any KDC for realm 'burbledo.COM'
Oct 28 16:48:22 server7c logger: Shorewall reloaded
Oct 28 16:48:22 server7c esmith::event[17079]: [NOTICE] Shorewall restart
Oct 28 16:48:22 server7c esmith::event[17079]: Action: /etc/e-smith/events/nethserver-firewall-base-save/S89nethserver-shorewall-restart SUCCESS [4.233884]
Oct 28 16:48:22 server7c systemd: Reloading.
Oct 28 16:48:22 server7c esmith::event[17079]: [INFO] service lsm is disabled: skipped
Oct 28 16:48:22 server7c esmith::event[17079]: Action: /etc/e-smith/events/actions/adjust-services SUCCESS [0.422545]
Oct 28 16:48:22 server7c esmith::event[17079]: Event: nethserver-firewall-base-save SUCCESS
Oct 28 16:48:22 server7c esmith::event[17078]: Action: /etc/e-smith/events/firewall-adjust/S20firewall-adjust SUCCESS [6.719383]
Oct 28 16:48:22 server7c esmith::event[17078]: Event: firewall-adjust SUCCESS
Oct 28 16:48:57 server7c httpd: [EXCEPTION] RuntimeException 1405610072: Nethgui\Model\SystemTasks: Socket read error (in /usr/share/nethesis/Nethgui/Model/SystemTasks.php:166)
Oct 28 16:49:04 server7c [sssd[ldap_child[17304]]]: Failed to initialize credentials using keytab [MEMORY:/etc/krb5.keytab]: Cannot contact any KDC for realm 'burbledo.COM'. Unable to create GSSAPI-encrypted LDAP connection.
Oct 28 16:49:04 server7c [sssd[ldap_child[17304]]]: Cannot contact any KDC for realm 'burbledo.COM'
Oct 28 16:49:27 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/20admin-user exit code 9
Oct 28 16:49:36 server7c httpd: [ERROR] NethServer\Tool\GroupProvider: Account provider generic error: SSSD exit code 1
Oct 28 16:49:36 server7c httpd: [ERROR] (1) SASL:[GSSAPI]: Failed to start authentication backend: NT_STATUS_INTERNAL_ERROR at /usr/share/perl5/vendor_perl/NethServer/LdapClient.pm line 126.
Oct 28 16:49:38 server7c sshd[17431]: Did not receive identification string from 192.168.124.107 port 51649
Oct 28 16:49:38 server7c [sssd[ldap_child[17438]]]: Failed to initialize credentials using keytab [MEMORY:/etc/krb5.keytab]: Cannot contact any KDC for realm 'burbledo.COM'. Unable to create GSSAPI-encrypted LDAP connection.
Oct 28 16:49:38 server7c [sssd[ldap_child[17438]]]: Cannot contact any KDC for realm 'burbledo.COM'
Oct 28 16:49:39 server7c admin-todos: (1) SASL:[GSSAPI]: Failed to start authentication backend: NT_STATUS_INTERNAL_ERROR at /usr/share/perl5/vendor_perl/NethServer/LdapClient.pm line 126.
Oct 28 16:50:21 server7c httpd: [ERROR] NethServer\Tool\GroupProvider: Account provider generic error: SSSD exit code 1
Oct 28 16:50:21 server7c httpd: [ERROR] (1) SASL:[GSSAPI]: Failed to start authentication backend: NT_STATUS_INTERNAL_ERROR at /usr/share/perl5/vendor_perl/NethServer/LdapClient.pm line 126.
Oct 28 16:50:23 server7c admin-todos: (1) SASL:[GSSAPI]: Failed to start authentication backend: NT_STATUS_INTERNAL_ERROR at /usr/share/perl5/vendor_perl/NethServer/LdapClient.pm line 126.

Since this is a production server I reverted back to the snapshot but then this issue came up and it took 2 reboots for proper operation… on 693.2.2, I haven’t backed down the kernel.

indent preformatted text by 4 spacesupdate failed: SERVFAIL
 ; TSIG error with server: tsig verify failure
 update failed: SERVFAIL
 ; TSIG error with server: tsig verify failure
 update failed: SERVFAIL
 ; TSIG error with server: tsig verify failure
 update failed: SERVFAIL
: tkey query failed: GSSAPI error: Major = Unspecified GSS failure.  Minor code may provide more information, Minor = Server not found in Kerberos database.
: tkey query failed: GSSAPI error: Major = Unspecified GSS failure.  Minor code may provide more information, Minor = Server not found in Kerberos database.
: tkey query failed: GSSAPI error: Major = Unspecified GSS failure.  Minor code may provide more information, Minor = Server not found in Kerberos database.

So, if 693.5.2 is good, I can update that and then update the dc I guess.

fasttech · November 3, 2017, 11:27pm

I read through this but it doesn’t list all the bug fixes;
https://www.redhat.com/archives/rhsa-announce/2017-October/msg00025.html

and I don’t have subscriber access to get to the bug fixes on redhat’s site.

mrmarkuz · November 4, 2017, 9:49am

Hi @fasttech,

I checked my servers now for your errors and found all entries except of the “NT_STATUS_INTERNAL_ERROR” ones. The other entries are “one-timers”, related to other things and not relevant in this case I think.

So this may be the really bad one:

NT_STATUS_INTERNAL_ERROR at /usr/share/perl5/vendor_perl/NethServer/LdapClient.pm line 126.

@Andy_Wismer has exactly the same error, both servers have in common that they are VMs:

Maybe perl related?

You may try:

Updating to actual kernel and update the DC container
Change IP of container
Reinstall AD
Backup/recovery procedure

Here is another thread of the same problem, just to summarize:

dnutan · November 4, 2017, 11:33am

Could the problem be realm’s letter case?

Ctek · November 5, 2017, 8:37am

Excelent catch @dnutan.
The realm should always be in upper case.
Here is an excerpt from the MIT docs:

Realm name¶

Although your Kerberos realm can be any ASCII string, convention is to make it the same as your domain name, in upper-case letters.

For example, hosts in the domain example.com would be in the Kerberos realm:

EXAMPLE.COM

If you need multiple Kerberos realms, MIT recommends that you use descriptive names which end with your domain name, such as:

BOSTON.EXAMPLE.COM
HOUSTON.EXAMPLE.COM

More info can be found here: https://web.mit.edu/kerberos/krb5-1.12/doc/admin/realm_config.html

BR
B.

fasttech · November 5, 2017, 10:23pm

@dnutan @Ctek I’m sorry, that lower case domain was just me obfuscating the public posting of the logs… that’s not the domain.

fasttech · November 5, 2017, 10:26pm

I can’t believe I have to deal with this on the one production server a dozen people need all the time and none of my other servers have this problem. grrrrrrr. Hulk smash.

davidep · November 6, 2017, 8:02am

Did you already send the /etc/krb5.conf of this server in another thread? Could you paste it here again?

Also the output of

config show sssd
config show dns
config show nsdc
cat /etc/hosts

Edit: attach also the output of

journalctl -M nsdc -u samba | grep 'krb5_init_context failed'

If the grep matches, this could be a workaround:

cp -v /var/lib/machines/nsdc/var/lib/samba/private/krb5.conf /var/lib/machines/nsdc/etc/krb5.conf 
systemctl -M nsdc restart samba

Also ensure the domain/realm is present in /etc/krb5.conf

# Configuration snippets may be placed in this directory as well
includedir /etc/krb5.conf.d/

includedir /var/lib/sss/pubconf/krb5.include.d/
[logging]
 default = FILE:/var/log/krb5libs.log
 kdc = FILE:/var/log/krb5kdc.log
 admin_server = FILE:/var/log/kadmind.log

[libdefaults]
 dns_lookup_realm = false
 ticket_lifetime = 24h
 renew_lifetime = 7d
 forwardable = true
 rdns = false
# default_realm = EXAMPLE.COM
 default_ccache_name = KEYRING:persistent:%{uid}

 default_realm = AD.MYDOM.COM
[realms]
# EXAMPLE.COM = {
#  kdc = kerberos.example.com
#  admin_server = kerberos.example.com
# }

 AD.MYDOM.COM = {
 }

[domain_realm]
# .example.com = EXAMPLE.COM
# example.com = EXAMPLE.COM
 ad.mydom.com = AD.MYDOM.COM
 .ad.mydom.com = AD.MYDOM.COM

fasttech · November 6, 2017, 7:51pm

@davidep As it runs right now:
domain is not present…

[root@server7c ~]# cat /etc/krb5.conf
# Configuration snippets may be placed in this directory as well
includedir /etc/krb5.conf.d/

[logging]
 default = FILE:/var/log/krb5libs.log
 kdc = FILE:/var/log/krb5kdc.log
 admin_server = FILE:/var/log/kadmind.log

[libdefaults]
 dns_lookup_realm = false
 ticket_lifetime = 24h
 renew_lifetime = 7d
 forwardable = true
 rdns = false
# default_realm = EXAMPLE.COM
 default_ccache_name = KEYRING:persistent:%{uid}

[realms]
# EXAMPLE.COM = {
#  kdc = kerberos.example.com
#  admin_server = kerberos.example.com
# }

[domain_realm]
# .example.com = EXAMPLE.COM
# example.com = EXAMPLE.COM

[root@server7c ~]# config show sssd
sssd=service
    AdDns=192.168.124.228
    LdapURI=
    Provider=ad
    Realm=MYDOMAIN.COM
    Workgroup=MYDOMAIN
    status=enabled

[root@server7c ~]# config show dns
dns=configuration
    NameServers=192.168.124.2

[root@server7c ~]# config show nsdc
nsdc=service
    IpAddress=192.168.124.228
    ProvisionType=newdomain
    bridge=br0
    status=enabled

[root@server7c ~]# cat /etc/hosts

127.0.0.1 localhost localhost.localdomain
192.168.124.227 server7c.mydomain.com server7c approach-server.adomain.local sync-server.adomain.local

No output from the journalctl command.

davidep · November 6, 2017, 8:11pm

I’ve found another installation with the same error message and missing lines in /etc/krb5.conf but couldn’t reproduce the problem.

Please follow the commands and instructions above

fasttech · November 6, 2017, 8:17pm

Ok, it’ll have to wait for a window so I can snapshot it.

Hmmmm… usually I shut it down to snapshot it in an off state, but when I bring it back up it can take 3 reboots before auth works (using the services gui and restarting sssd doesn’t help)… maybe I should live snapshot it… that always throws the time off though… eh.

davidep · November 6, 2017, 8:22pm

You already know it: it’s a bad idea

Ctek · November 6, 2017, 11:00pm

In my opinion the realm should be present in the config, maybe i’m wrong but you should try and see if you get the same consistent behaviour.

fasttech · November 8, 2017, 1:18am

@davidep The copy command you posted didn’t change the etc/krb5.conf even after allowing overwrite and running the samba restart. I tried it a couple of times, I verified the content of the file to be copied contained the correct domain.

Then I went ahead and tried an update which was successful, meaning clients could browse shares and no error banner on the dashboard;

Nov 07 17:50:23 Updated: nethserver-base-3.1.1-1.ns7.noarch
Nov 07 17:50:25 Updated: 1:grub2-common-2.02-0.65.el7.centos.2.noarch
Nov 07 17:50:25 Installed: 1:grub2-tools-minimal-2.02-0.65.el7.centos.2.x86_64
Nov 07 17:50:26 Installed: 1:grub2-tools-2.02-0.65.el7.centos.2.x86_64
Nov 07 17:50:27 Updated: nethserver-mysql-1.1.3-1.ns7.noarch
Nov 07 17:50:27 Updated: nethserver-sssd-1.3.2-1.ns7.noarch
Nov 07 17:50:28 Installed: 1:grub2-tools-extra-2.02-0.65.el7.centos.2.x86_64
Nov 07 17:50:29 Updated: 1:grub2-pc-modules-2.02-0.65.el7.centos.2.noarch
Nov 07 17:50:30 Updated: 1:grub2-pc-2.02-0.65.el7.centos.2.x86_64
Nov 07 17:50:31 Updated: kernel-tools-libs-3.10.0-693.5.2.el7.x86_64
Nov 07 17:52:19 Updated: nextcloud-12.0.3-1.el7.noarch
Nov 07 17:52:21 Updated: python2-acme-0.19.0-1.el7.noarch
Nov 07 17:52:26 Updated: python2-certbot-0.19.0-1.el7.noarch
Nov 07 17:52:35 Updated: certbot-0.19.0-1.el7.noarch
Nov 07 17:52:35 Updated: nethserver-nextcloud-1.1.8-1.ns7.noarch
Nov 07 17:52:37 Updated: kernel-tools-3.10.0-693.5.2.el7.x86_64
Nov 07 17:52:37 Installed: 1:grub2-2.02-0.65.el7.centos.2.x86_64
Nov 07 17:52:38 Updated: nethserver-dc-1.3.1-1.ns7.x86_64
Nov 07 17:52:38 Updated: nethserver-samba-audit-1.1.3-1.ns7.noarch
Nov 07 17:52:39 Updated: nethserver-firewall-base-3.2.7-1.ns7.noarch
Nov 07 17:52:39 Updated: nethserver-duc-1.4.3-1.ns7.noarch
Nov 07 17:52:40 Updated: nethserver-release-7-5.ns7.noarch
Nov 07 17:52:40 Updated: python2-keyring-5.0-3.el7.noarch
Nov 07 17:52:41 Updated: python-perf-3.10.0-693.5.2.el7.x86_64
Nov 07 17:52:44 Updated: tzdata-2017c-1.el7.noarch
Nov 07 17:52:45 Updated: wget-1.14-15.el7_4.1.x86_64
Nov 07 17:52:45 Updated: epel-release-7-11.noarch
Nov 07 17:53:17 Installed: kernel-3.10.0-693.5.2.el7.x86_64
Nov 07 17:53:17 Updated: nethserver-lang-en-1.2.3-1.ns7.noarch
Nov 07 17:53:28 Erased: 1:grub2-tools-efi-2.02-0.64.el7.centos.x86_64

this error was in messages;

Nov  7 18:09:32 server7c sssd: ; TSIG error with server: tsig verify failure
Nov  7 18:09:32 server7c sssd: update failed: SERVFAIL
Nov  7 18:09:32 server7c sssd: ; TSIG error with server: tsig verify failure
Nov  7 18:09:32 server7c sssd: update failed: SERVFAIL
Nov  7 18:09:32 server7c sssd: ; TSIG error with server: tsig verify failure
Nov  7 18:09:32 server7c sssd: update failed: SERVFAIL
Nov  7 18:09:32 server7c sssd: ; TSIG error with server: tsig verify failure
Nov  7 18:09:32 server7c sssd: update failed: SERVFAIL
Nov  7 18:09:32 server7c sssd: tkey query failed: GSSAPI error: Major = Unspecified GSS failure.  Minor code may provide more information, Minor = Server not found in Kerberos database.
Nov  7 18:09:32 server7c sssd: tkey query failed: GSSAPI error: Major = Unspecified GSS failure.  Minor code may provide more information, Minor = Server not found in Kerberos database.
Nov  7 18:09:32 server7c sssd: tkey query failed: GSSAPI error: Major = Unspecified GSS failure.  Minor code may provide more information, Minor = Server not found in Kerberos database.

rebooted to shapshot for the container update but when I went to the accounts provider page there was no option to reboot and the samba ver 4.6.8, I know you guys stated you were going to set the container to auto update after the last updates;

but, post reboot I get this… a long list of rrd errors… but I still have successful share auth and nextcloud works.

Nov  7 18:18:01 server7c sssd: ; TSIG error with server: tsig verify failure
Nov  7 18:18:01 server7c sssd: update failed: SERVFAIL
Nov  7 18:18:01 server7c sssd: ; TSIG error with server: tsig verify failure
Nov  7 18:18:01 server7c sssd: update failed: SERVFAIL
Nov  7 18:18:02 server7c sssd: ; TSIG error with server: tsig verify failure
Nov  7 18:18:02 server7c sssd: update failed: SERVFAIL
Nov  7 18:18:02 server7c sssd: ; TSIG error with server: tsig verify failure
Nov  7 18:18:02 server7c sssd: update failed: SERVFAIL
Nov  7 18:18:02 server7c sssd: tkey query failed: GSSAPI error: Major = Unspecified GSS failure.  Minor code may provide more information, Minor = Server not found in Kerberos database.
Nov  7 18:18:02 server7c sssd: tkey query failed: GSSAPI error: Major = Unspecified GSS failure.  Minor code may provide more information, Minor = Server not found in Kerberos database.
Nov  7 18:18:02 server7c sssd: tkey query failed: GSSAPI error: Major = Unspecified GSS failure.  Minor code may provide more information, Minor = Server not found in Kerberos database.
Nov  7 18:18:03 server7c collectd[993]: rrdtool plugin: rrd_update_r (/var/lib/collectd/rrd/server7c.mydomain.com/memory/memory-used.rrd) failed: /var/lib/collectd/rrd/server7c.mydomain.com/memory/memory-used.rrd: illegal attempt to update using time 1510103793 when last update time is 1510103793 (minimum one second step)
Nov  7 18:18:03 server7c collectd[993]: rrdtool plugin: rrd_update_r (/var/lib/collectd/rrd/server7c.mydomain.com/memory/memory-buffered.rrd) failed: /var/lib/collectd/rrd/server7c.mydomain.com/memory/memory-buffered.rrd: illegal attempt to update using time 1510103793 when last update time is 1510103793 (minimum one second step)
Nov  7 18:18:03 server7c collectd[993]: rrdtool plugin: rrd_update_r (/var/lib/collectd/rrd/server7c.mydomain.com/memory/memory-cached.rrd) failed: /var/lib/collectd/rrd/server7c.mydomain.com/memory/memory-cached.rrd: illegal attempt to update using time 1510103793 when last update time is 1510103793 (minimum one second step)
Nov  7 18:18:03 server7c collectd[993]: rrdtool plugin: rrd_update_r (/var/lib/collectd/rrd/server7c.mydomain.com/memory/memory-free.rrd) failed: /var/lib/collectd/rrd/server7c.mydomain.com/memory/memory-free.rrd: illegal attempt to update using time 1510103793 when last update time is 1510103793 (minimum one second step)
Nov  7 18:18:03 server7c collectd[993]: rrdtool plugin: rrd_update_r (/var/lib/collectd/rrd/server7c.mydomain.com/memory/memory-slab_unrecl.rrd) failed: /var/lib/collectd/rrd/server7c.mydomain.com/memory/memory-slab_unrecl.rrd: illegal attempt to update using time 1510103793 when last update time is 1510103793 (minimum one second step)
Nov  7 18:18:03 server7c collectd[993]: rrdtool plugin: rrd_update_r (/var/lib/collectd/rrd/server7c.mydomain.com/memory/memory-slab_recl.rrd) failed: /var/lib/collectd/rrd/server7c.mydomain.com/memory/memory-slab_recl.rrd: illegal attempt to update using time 1510103793 when last update time is 1510103793 (minimum one second step)

Now what? This is that problematic production server.

And… after all this… the /etc/krb5.conf is still the same as I originally posted… it does not have the domain written in it.

from the gui, the domain accounts page looks good, the accounts provider page looks right and there are no error banners on the dashboard, shares are accessible by domain\user and the nextcloud client connects fine. I’m still scared though.

davidep · November 8, 2017, 8:17am

The command does not change /etc/krb5.conf. Note the copy destination is /var/lib/machines/nsdc/etc/krb5.conf.

I’ve seen the same error somewhere, and - as you said - it seems harmless.

SSSD tries to send a DDNS update query I suppose. We should investigate its origin. I suppose the latest SSSD version changed some behavior and now we see that error message.

I’d fix it, as explained above.

fasttech · November 8, 2017, 3:54pm

I understood that, my guess was during the samba restart krb5.conf would be rewritten from that file… how does it get written…, If I edit it, I’m assuming it won’t get overwritten on reboot?

davidep · November 8, 2017, 4:41pm

Yes, it is left untouched because it is not a template.

davidep · November 9, 2017, 10:02am

@fasttech did you run the restore config procedure on this server?

I’ve found an issue with the restore config procedure: it deletes the file /var/lib/machines/nsdc/etc/krb5.conf without restoring the good one… Maybe it reproduces your error condition!

fasttech · November 9, 2017, 3:48pm

No. +5characters

davidep · November 15, 2017, 5:08pm

There’s a fix for the restore-config procedure that prevents this from happen. However it’s really difficult to hit this bug in real world servers.

github.com/NethServer/dev

Cannot contact any KDC for realm (sssd)

opened 05:04PM - 14 Nov 17 UTC

closed 04:42PM - 24 Nov 17 UTC

DavidePrincipi

bug verified

Under some circumstances, the Samba DC container looses the krb5.conf file state… and samba DC fails to start the KDC services. The service seems to run but ports 88 (kerberos-sec) and 464 (kpasswd5) are closed and some services fail to authenticate correctly (say SSSD, account listings...). **Steps to reproduce** Real world: - Configure a local AD accounts provider - Create a config backup - Restore the config backup on a clean 7.4 - Update/Reinstall krb5-libs in nsdc container ```text yum --installroot=/var/lib/machines/nsdc reinstall krb5-libs ``` - Restart samba service in nsdc container ```text systemctl -M nsdc restart samba ``` Testing environment shortcut: - Configure a local AD accounts provider - Remove ``/var/lib/machines/nsdc/var/lib/samba/private/krb5.conf`` - Restart samba service in nsdc container - Get users list ``/usr/libexec/nethserver/list-users`` **Expected behavior** Get the list of users **Actual behavior** Error message: ```text (1) SASL:[GSSAPI]: Failed to start authentication backend: NT_STATUS_INTERNAL_ERROR at /usr/share/perl5/vendor_perl/NethServer/LdapClient.pm line 126. ``` In sssd journal ``journalctl -u sssd``: ```text Failed to initialize credentials using keytab [MEMORY:/etc/krb5.keytab]: Cannot contact any KDC for realm 'AD.DPNET.NETHESIS.IT'. Unable to create GSSAPI-encrypted LDAP connection. ``` In samba dc journal ``journalctl -M nsdc -u samba``: ```text task_server_terminate: [kdc: krb5_init_context failed] ``` **Components** nethserver-sssd-1.3.3-1.ns7.noarch nethserver-dc-1.3.1-1.ns7.x86_64 **See also** https://community.nethserver.org/t/sssd-ldap-error-cannot-contact-any-kdc/8204 ---- Thanks to @fasttech and André Wismer

Opened (and closed as “wontfix”) an issue here. It is tracked by an upstream bug; as said it can be ignored. More info here:

github.com/NethServer/dev

sssd: tkey query failed (dyndns_update)

opened 05:15PM - 14 Nov 17 UTC

closed 05:06PM - 15 Nov 17 UTC

DavidePrincipi

bug invalid

On a running NethServer 7.4 with local AD accounts provider an error message is …logged to the journal every day at the same hour. **Steps to reproduce** ```text journalctl -u sssd | grep 'tkey query' ``` **Expected behavior** No error in the journal **Actual behavior** The query matches the same error line in the same hour every day and when sssd is restarted **Components** nethserver-sssd-1.3.3-1.ns7.noarch nethserver-dc-1.3.1-1.ns7.x86_64 **See also** - upstream bug https://bugzilla.redhat.com/show_bug.cgi?id=1394320 - https://community.nethserver.org/t/sssd-ldap-error-cannot-contact-any-kdc/8204/14 ---- Thanks to @fasttech and André Wismer