Hi all,
while moving 389ds server to another machine (and another version) I realize performance issues during ldapsearch.
Normaly a query ist quite quick (about 20ms - but sometimes(like every five seconds) it hangs for one ore even several seconds).
I test this with:
time ldapsearch -h localhost ...
Since the new server should be a log faster (cpu, io) I'm wondering about what can cause this.
There is a replication with three servers and suppier-supplier config among each. We have about 50 databases but each only maximum with a few hundred records. Most of them smaller.
I looked for cache configuration - but these are similar to the old server and I get entrycachehitratio about 99%.
Any idea for further debugging? Regards Jan
Hi all,
we are still suffering from this poor performance problem. Any hint for further debugging are welcome.
As this 389ds installation is part of an kolab multidomain setup we habe about 50 separated database backends each replicated with two other servers.
The symtop is - depending on queries: sometimes everything is lacking for several seconds (up to 3 or 4 and worse).
389ds is version 1.3.3.5-4 and runs on a xen 4.8 domU.
Load is not high neigher disk io at all. Everything else on the machine or other xen-domains is running fine.
any idea? Regards Jan
Am 28.05.2018 um 17:17 schrieb Jan Kowalsky:
Hi all,
while moving 389ds server to another machine (and another version) I realize performance issues during ldapsearch.
Normaly a query ist quite quick (about 20ms - but sometimes(like every five seconds) it hangs for one ore even several seconds).
I test this with:
time ldapsearch -h localhost ...
Since the new server should be a log faster (cpu, io) I'm wondering about what can cause this.
There is a replication with three servers and suppier-supplier config among each. We have about 50 databases but each only maximum with a few hundred records. Most of them smaller.
I looked for cache configuration - but these are similar to the old server and I get entrycachehitratio about 99%.
Any idea for further debugging? Regards Jan _______________________________________________ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject....
Can we see your access log showing the slow searches? Are they unindexed? If you have unindexed searches they will bog down the entire server. Do you see high cpu for ns-slapd?
Can you also run logconv.pl?
logconv.pl -ulatn /var/log/dirsrv/slapd-YOUR_INSTANCE/access*
Thanks, Mark
On 06/14/2018 04:22 PM, Jan Kowalsky wrote:
Hi all,
we are still suffering from this poor performance problem. Any hint for further debugging are welcome.
As this 389ds installation is part of an kolab multidomain setup we habe about 50 separated database backends each replicated with two other servers.
The symtop is - depending on queries: sometimes everything is lacking for several seconds (up to 3 or 4 and worse).
389ds is version 1.3.3.5-4 and runs on a xen 4.8 domU.
Load is not high neigher disk io at all. Everything else on the machine or other xen-domains is running fine.
any idea? Regards Jan
Am 28.05.2018 um 17:17 schrieb Jan Kowalsky:
Hi all,
while moving 389ds server to another machine (and another version) I realize performance issues during ldapsearch.
Normaly a query ist quite quick (about 20ms - but sometimes(like every five seconds) it hangs for one ore even several seconds).
I test this with:
time ldapsearch -h localhost ...
Since the new server should be a log faster (cpu, io) I'm wondering about what can cause this.
There is a replication with three servers and suppier-supplier config among each. We have about 50 databases but each only maximum with a few hundred records. Most of them smaller.
I looked for cache configuration - but these are similar to the old server and I get entrycachehitratio about 99%.
Any idea for further debugging? Regards Jan _______________________________________________ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject....
389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject....
Hi Marc,
thanks for answer, and the hint for indexes.
Am 15.06.2018 um 00:53 schrieb Mark Reynolds:
Can we see your access log showing the slow searches? Are they unindexed? If you have unindexed searches they will bog down the entire
Whole log is difficult because of privacy, because it's a productive environment with real user data.
What I can see are a log of unindexec component queries, most of them like:
[15/Jun/2018:21:51:14 +0200] conn=462 op=31251 SRCH base="ou=Domains,dc=example,dc=org" scope=2 filter="(&(objectClass=domainrelatedobject)(associatedDomain=example.net))" attrs="associatedDomain inetDomainBaseDN" [15/Jun/2018:21:51:14 +0200] conn=462 op=31251 RESULT err=0 tag=101 nentries=1 etime=0 notes=U
but I'm not shure, if they are really a problem, because there are only about 60 entries in domainrelatedobject
more relevant could be that there is no index for alias:
[15/Jun/2018:21:50:21 +0200] conn=106426 op=1 SRCH base="dc=example,dc=net" scope=2 filter="(&(objectClass=kolabInetOrgPerson)(|(mail=test.user@example.net)(alias=test.user@example.net)(uid=test.user@example.net)))" attrs="distinguishedName" [15/Jun/2018:21:50:21 +0200] conn=106426 op=1 RESULT err=0 tag=101 nentries=1 etime=0 notes=U
server. Do you see high cpu for ns-slapd?
no. not at all: avarage about 7 %, load avarage about 0.4 with max 0.6, and no i/o wait.
The specific phenomena is: for a couple of seconds everything is smooth but suddenly it hangs for about 1 until 4 or more seconds - and this seems to be related to all queries in this moment.
Can you also run logconv.pl?
logconv.pl -ulatn /var/log/dirsrv/slapd-YOUR_INSTANCE/access*
Access Log Analyzer 8.0 Command: logconv.pl /var/log/dirsrv/slapd-ldap0/access Processing 1 Access Log(s)...
[001] /var/log/dirsrv/slapd-ldap0/access size (bytes): 4715242 25000 Lines Processed 3430177 of 4715242 bytes (72.747%)
Total Log Lines Analysed: 33372
----------- Access Log Output ------------
Start of Logs: 15/Jun/2018:19:06:52 End of Logs: 15/Jun/2018:19:26:39
Processed Log Time: 0 Hours, 19 Minutes, 47 Seconds
Restarts: 0 Total Connections: 2225 - StartTLS Connections: 0 - LDAPS Connections: 0 - LDAPI Connections: 0 Peak Concurrent Connections: 11 Total Operations: 13362 Total Results: 13389 Overall Performance: 100.0%
Searches: 10248 (8.63/sec) (518.01/min) Modifications: 0 (0.00/sec) (0.00/min) Adds: 0 (0.00/sec) (0.00/min) Deletes: 0 (0.00/sec) (0.00/min) Mod RDNs: 0 (0.00/sec) (0.00/min) Compares: 0 (0.00/sec) (0.00/min) Binds: 3038 (2.56/sec) (153.56/min)
Proxied Auth Operations: 0 Persistent Searches: 0 Internal Operations: 0 Entry Operations: 0 Extended Operations: 76 Abandoned Requests: 0 Smart Referrals Received: 0
VLV Operations: 0 VLV Unindexed Searches: 0 VLV Unindexed Components: 0 SORT Operations: 0
Entire Search Base Queries: 806 Paged Searches: 82 Unindexed Searches: 0 Unindexed Components: 8896
Regards Jan
On 6/15/2018 2:04 PM, Jan Kowalsky wrote:
What I can see are a log of unindexec component queries, most of them like:
[15/Jun/2018:21:51:14 +0200] conn=462 op=31251 SRCH base="ou=Domains,dc=example,dc=org" scope=2 filter="(&(objectClass=domainrelatedobject)(associatedDomain=example.net))" attrs="associatedDomain inetDomainBaseDN" [15/Jun/2018:21:51:14 +0200] conn=462 op=31251 RESULT err=0 tag=101 nentries=1 etime=0 notes=U
The "etime=0" implies that this is not the operation you are looking for.
Looking back at your original question, I am wondering : when you say that some searches are very slow do you mean "searches with certain properties in terms of filter and so on" or do you mean " for the exact same kind of search, submitted over and over, some responses are very slow but most are very fast"?
Hi Davind,
thanks for answer
Am 15.06.2018 um 22:15 schrieb David Boreham:
On 6/15/2018 2:04 PM, Jan Kowalsky wrote:
What I can see are a log of unindexec component queries, most of them like:
[15/Jun/2018:21:51:14 +0200] conn=462 op=31251 SRCH base="ou=Domains,dc=example,dc=org" scope=2 filter="(&(objectClass=domainrelatedobject)(associatedDomain=example.net))"
attrs="associatedDomain inetDomainBaseDN" [15/Jun/2018:21:51:14 +0200] conn=462 op=31251 RESULT err=0 tag=101 nentries=1 etime=0 notes=U
The "etime=0" implies that this is not the operation you are looking for.
The maximum I ever had was etime=4
[15/Jun/2018:20:22:43 +0200] conn=95644 op=17 SRCH base="dc=example,dc=de" scope=2 filter="(&(objectClass=inetorgperson)(mail=test@example.n et))" attrs="entryuuid nsUniqueId objectguid guid ipauniqueid distinguishedName uid samaccountname memberOf dknOwncloudQuota mail mail jpegPhoto thu mbnailphoto" [15/Jun/2018:20:22:47 +0200] conn=95644 op=17 RESULT err=0 tag=101 nentries=0 etime=4
And they result from nexcloud query with a lot of base dns configured...
Looking back at your original question, I am wondering : when you say that some searches are very slow do you mean "searches with certain properties in terms of filter and so on" or do you mean " for the exact same kind of search, submitted over and over, some responses are very slow but most are very fast"?
The latter. The same queries are sometimes fast and hang some times.
Regards Jan
On 06/15/2018 05:17 PM, Jan Kowalsky wrote:
Hi Davind,
thanks for answer
Am 15.06.2018 um 22:15 schrieb David Boreham:
On 6/15/2018 2:04 PM, Jan Kowalsky wrote:
What I can see are a log of unindexec component queries, most of them like:
[15/Jun/2018:21:51:14 +0200] conn=462 op=31251 SRCH base="ou=Domains,dc=example,dc=org" scope=2 filter="(&(objectClass=domainrelatedobject)(associatedDomain=example.net))"
attrs="associatedDomain inetDomainBaseDN" [15/Jun/2018:21:51:14 +0200] conn=462 op=31251 RESULT err=0 tag=101 nentries=1 etime=0 notes=U
The "etime=0" implies that this is not the operation you are looking for.
The maximum I ever had was etime=4
[15/Jun/2018:20:22:43 +0200] conn=95644 op=17 SRCH base="dc=example,dc=de" scope=2 filter="(&(objectClass=inetorgperson)(mail=test@example.n et))" attrs="entryuuid nsUniqueId objectguid guid ipauniqueid distinguishedName uid samaccountname memberOf dknOwncloudQuota mail mail jpegPhoto thu mbnailphoto" [15/Jun/2018:20:22:47 +0200] conn=95644 op=17 RESULT err=0 tag=101 nentries=0 etime=4
Jan you also stated before you have 50 backends, are they sub backends of dc=example,dc=de? If they are sub backends, or children, of dc=example,dc=de, then this subtree search is scanning all 50 backends - there will be some overhead in that case and it "could" be the reason for these random delays. If you could share the layout/design of these backends it would be helpful.
Thanks, Mark
And they result from nexcloud query with a lot of base dns configured...
Looking back at your original question, I am wondering : when you say that some searches are very slow do you mean "searches with certain properties in terms of filter and so on" or do you mean " for the exact same kind of search, submitted over and over, some responses are very slow but most are very fast"?
The latter. The same queries are sometimes fast and hang some times.
Regards Jan _______________________________________________ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject....
On 06/15/2018 04:04 PM, Jan Kowalsky wrote:
Hi Marc,
thanks for answer, and the hint for indexes.
Am 15.06.2018 um 00:53 schrieb Mark Reynolds:
Can we see your access log showing the slow searches? Are they unindexed? If you have unindexed searches they will bog down the entire
Whole log is difficult because of privacy, because it's a productive environment with real user data.
What I can see are a log of unindexec component queries, most of them like:
[15/Jun/2018:21:51:14 +0200] conn=462 op=31251 SRCH base="ou=Domains,dc=example,dc=org" scope=2 filter="(&(objectClass=domainrelatedobject)(associatedDomain=example.net))" attrs="associatedDomain inetDomainBaseDN" [15/Jun/2018:21:51:14 +0200] conn=462 op=31251 RESULT err=0 tag=101 nentries=1 etime=0 notes=U
but I'm not shure, if they are really a problem, because there are only about 60 entries in domainrelatedobject
more relevant could be that there is no index for alias:
[15/Jun/2018:21:50:21 +0200] conn=106426 op=1 SRCH base="dc=example,dc=net" scope=2 filter="(&(objectClass=kolabInetOrgPerson)(|(mail=test.user@example.net)(alias=test.user@example.net)(uid=test.user@example.net)))" attrs="distinguishedName" [15/Jun/2018:21:50:21 +0200] conn=106426 op=1 RESULT err=0 tag=101 nentries=1 etime=0 notes=U
server. Do you see high cpu for ns-slapd?
no. not at all: avarage about 7 %, load avarage about 0.4 with max 0.6, and no i/o wait.
The specific phenomena is: for a couple of seconds everything is smooth but suddenly it hangs for about 1 until 4 or more seconds - and this seems to be related to all queries in this moment.
Can you also run logconv.pl?
logconv.pl -ulatn /var/log/dirsrv/slapd-YOUR_INSTANCE/access*
Access Log Analyzer 8.0 Command: logconv.pl /var/log/dirsrv/slapd-ldap0/access Processing 1 Access Log(s)...
[001] /var/log/dirsrv/slapd-ldap0/access size (bytes): 4715242 25000 Lines Processed 3430177 of 4715242 bytes (72.747%)
Total Log Lines Analysed: 33372
----------- Access Log Output ------------
Start of Logs: 15/Jun/2018:19:06:52 End of Logs: 15/Jun/2018:19:26:39
Processed Log Time: 0 Hours, 19 Minutes, 47 Seconds
Restarts: 0 Total Connections: 2225
- StartTLS Connections: 0
- LDAPS Connections: 0
- LDAPI Connections: 0
Peak Concurrent Connections: 11 Total Operations: 13362 Total Results: 13389 Overall Performance: 100.0%
Searches: 10248 (8.63/sec) (518.01/min) Modifications: 0 (0.00/sec) (0.00/min) Adds: 0 (0.00/sec) (0.00/min) Deletes: 0 (0.00/sec) (0.00/min) Mod RDNs: 0 (0.00/sec) (0.00/min) Compares: 0 (0.00/sec) (0.00/min) Binds: 3038 (2.56/sec) (153.56/min)
Proxied Auth Operations: 0 Persistent Searches: 0 Internal Operations: 0 Entry Operations: 0 Extended Operations: 76 Abandoned Requests: 0 Smart Referrals Received: 0
VLV Operations: 0 VLV Unindexed Searches: 0 VLV Unindexed Components: 0 SORT Operations: 0
Entire Search Base Queries: 806 Paged Searches: 82 Unindexed Searches: 0 Unindexed Components: 8896
You did not run logconv.pl the way I requested, can you please run it again this way:
logconv.pl -ulatn <access logs>
There should be much more important information returned from this tool. Like David pointed out the unindexed search you reported above in fact was harmless, but that doesn't mean they all are. The command above will print detailed information about each potential unindexed search.
I think the next step, besides gathering the right logconv output, would be to get several pstacks of the ns-slapd process at the same time the "hang" occurs. Then we can at least see what all the threads are doing and hopefully get better information. I would try and get the 389-ds-base-debuginfo package installed first though (this should give us more readable pstacks).
Thanks, Mark
Regards Jan _______________________________________________ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject....
Hi Marc,
thanks for help.
Am 15.06.2018 um 22:50 schrieb Mark Reynolds:
You did not run logconv.pl the way I requested, can you please run it again this way:
logconv.pl -ulatn <access logs>
I omitted the detailed searches because there are user-data in it...
but this here does't look like this is a problem:
----- Top 20 Most Frequent etimes -----
64902 etime=0 157 etime=1 34 etime=3 27 etime=2 18 etime=4
I think the next step, besides gathering the right logconv output, would be to get several pstacks of the ns-slapd process at the same time the "hang" occurs. Then we can at least see what all the threads are doing and hopefully get better information. I would try and get the 389-ds-base-debuginfo package installed first though (this should give us more readable pstacks).
hm, I installed 389-ds-base-dbg and 389-ds-base-libs-dbg. What else do I have to do?
pstack gives me:
pstack 13463
13463: /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-ldap0 -i /var/run/dirsrv/slapd-ldap0.pid -w /var/run/dirsrv/slapd-ldap0.startpid (No symbols found) 0x7f808a7717bc: ???? (7f803c03cf20, 0, 7f803c0421f0, 0, 7f808c0dc5d6, 5615f1d4bc80) + ffffd695834bb170 0x300000157: ???? (0, 200000, 0, 31, 3, 0) + ffffa9ea0c848bd3 crawl: Input/output error Error tracing through process 13463
Does this help:
If everything runs fast:
root@ldap0:~# cat /proc/13463/stack [<ffffffff810eaa9a>] hrtimer_start_range_ns+0x1aa/0x3d0 [<ffffffff810ea660>] hrtimer_init+0x110/0x110 [<ffffffff8121ce26>] poll_schedule_timeout+0x46/0x70 [<ffffffff8121e50f>] do_sys_poll+0x44f/0x560 [<ffffffff814aab95>] loopback_xmit+0x65/0xa0 [<ffffffff815785e5>] tcp_rate_gen+0x105/0x180 [<ffffffff81565fc7>] tcp_ack+0xd97/0x16e0 [<ffffffff8121cff0>] poll_select_copy_remaining+0x150/0x150 [<ffffffff8121cff0>] poll_select_copy_remaining+0x150/0x150 [<ffffffff8121cff0>] poll_select_copy_remaining+0x150/0x150 [<ffffffff8121cff0>] poll_select_copy_remaining+0x150/0x150 [<ffffffff8121cff0>] poll_select_copy_remaining+0x150/0x150 [<ffffffff8121cff0>] poll_select_copy_remaining+0x150/0x150 [<ffffffff8121cff0>] poll_select_copy_remaining+0x150/0x150 [<ffffffff8121cff0>] poll_select_copy_remaining+0x150/0x150 [<ffffffff81209bcd>] __fput+0x17d/0x220 [<ffffffff8102f530>] recalibrate_cpu_khz+0x10/0x10 [<ffffffff810efbf6>] ktime_get_ts64+0x46/0xf0 [<ffffffff8121e707>] SyS_poll+0x67/0x120 [<ffffffff81003bd1>] do_syscall_64+0x91/0x1a0 [<ffffffff816137ce>] entry_SYSCALL_64_after_swapgs+0x58/0xc6 [<ffffffffffffffff>] 0xffffffffffffffff
When it hangs:
root@ldap0:~# cat /proc/13463/stack [<ffffffff810fa433>] futex_wait_queue_me+0xd3/0x140 [<ffffffff810fb2cc>] futex_wait+0xfc/0x260 [<ffffffff8121cff0>] poll_select_copy_remaining+0x150/0x150 [<ffffffff810fce17>] do_futex+0x2b7/0xb40 [<ffffffff810e81e7>] lock_timer_base+0x67/0x80 [<ffffffff810e839a>] internal_add_timer+0x1a/0x70 [<ffffffff81202d3b>] __check_object_size+0x10b/0x1dc [<ffffffff814eeece>] move_addr_to_user+0xbe/0xe0 [<ffffffff814ef160>] SYSC_getsockname+0x80/0xd0 [<ffffffff810fd723>] SyS_futex+0x83/0x180 [<ffffffff814efb17>] SyS_setsockopt+0xb7/0xf0 [<ffffffff81003bd1>] do_syscall_64+0x91/0x1a0 [<ffffffff816137ce>] entry_SYSCALL_64_after_swapgs+0x58/0xc6 [<ffffffffffffffff>] 0xffffffffffffffff
This is reproducable.
Regards Jan
On 06/15/2018 05:41 PM, Jan Kowalsky wrote:
Hi Marc,
thanks for help.
Am 15.06.2018 um 22:50 schrieb Mark Reynolds:
You did not run logconv.pl the way I requested, can you please run it again this way:
logconv.pl -ulatn <access logs>
I omitted the detailed searches because there are user-data in it...
but this here does't look like this is a problem:
----- Top 20 Most Frequent etimes -----
64902 etime=0 157 etime=1 34 etime=3 27 etime=2 18 etime=4
I think the next step, besides gathering the right logconv output, would be to get several pstacks of the ns-slapd process at the same time the "hang" occurs. Then we can at least see what all the threads are doing and hopefully get better information. I would try and get the 389-ds-base-debuginfo package installed first though (this should give us more readable pstacks).
hm, I installed 389-ds-base-dbg and 389-ds-base-libs-dbg. What else do I have to do?
pstack gives me:
pstack 13463
13463: /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-ldap0 -i /var/run/dirsrv/slapd-ldap0.pid -w /var/run/dirsrv/slapd-ldap0.startpid (No symbols found) 0x7f808a7717bc: ???? (7f803c03cf20, 0, 7f803c0421f0, 0, 7f808c0dc5d6, 5615f1d4bc80) + ffffd695834bb170 0x300000157: ???? (0, 200000, 0, 31, 3, 0) + ffffa9ea0c848bd3 crawl: Input/output error Error tracing through process 13463
Does this help:
Are you sure the "debuginfo" packages are the same version as 389-ds-base/libs?
Run: rpm -qa | grep 389-ds-base
The other option is attach gdb to the process and get a stack trace that way:
# gdb -p PID (gdb) thread apply all bt (gdb) quit
If everything runs fast:
root@ldap0:~# cat /proc/13463/stack [<ffffffff810eaa9a>] hrtimer_start_range_ns+0x1aa/0x3d0 [<ffffffff810ea660>] hrtimer_init+0x110/0x110 [<ffffffff8121ce26>] poll_schedule_timeout+0x46/0x70 [<ffffffff8121e50f>] do_sys_poll+0x44f/0x560 [<ffffffff814aab95>] loopback_xmit+0x65/0xa0 [<ffffffff815785e5>] tcp_rate_gen+0x105/0x180 [<ffffffff81565fc7>] tcp_ack+0xd97/0x16e0 [<ffffffff8121cff0>] poll_select_copy_remaining+0x150/0x150 [<ffffffff8121cff0>] poll_select_copy_remaining+0x150/0x150 [<ffffffff8121cff0>] poll_select_copy_remaining+0x150/0x150 [<ffffffff8121cff0>] poll_select_copy_remaining+0x150/0x150 [<ffffffff8121cff0>] poll_select_copy_remaining+0x150/0x150 [<ffffffff8121cff0>] poll_select_copy_remaining+0x150/0x150 [<ffffffff8121cff0>] poll_select_copy_remaining+0x150/0x150 [<ffffffff8121cff0>] poll_select_copy_remaining+0x150/0x150 [<ffffffff81209bcd>] __fput+0x17d/0x220 [<ffffffff8102f530>] recalibrate_cpu_khz+0x10/0x10 [<ffffffff810efbf6>] ktime_get_ts64+0x46/0xf0 [<ffffffff8121e707>] SyS_poll+0x67/0x120 [<ffffffff81003bd1>] do_syscall_64+0x91/0x1a0 [<ffffffff816137ce>] entry_SYSCALL_64_after_swapgs+0x58/0xc6 [<ffffffffffffffff>] 0xffffffffffffffff
When it hangs:
root@ldap0:~# cat /proc/13463/stack [<ffffffff810fa433>] futex_wait_queue_me+0xd3/0x140 [<ffffffff810fb2cc>] futex_wait+0xfc/0x260 [<ffffffff8121cff0>] poll_select_copy_remaining+0x150/0x150 [<ffffffff810fce17>] do_futex+0x2b7/0xb40 [<ffffffff810e81e7>] lock_timer_base+0x67/0x80 [<ffffffff810e839a>] internal_add_timer+0x1a/0x70 [<ffffffff81202d3b>] __check_object_size+0x10b/0x1dc [<ffffffff814eeece>] move_addr_to_user+0xbe/0xe0 [<ffffffff814ef160>] SYSC_getsockname+0x80/0xd0 [<ffffffff810fd723>] SyS_futex+0x83/0x180 [<ffffffff814efb17>] SyS_setsockopt+0xb7/0xf0 [<ffffffff81003bd1>] do_syscall_64+0x91/0x1a0 [<ffffffff816137ce>] entry_SYSCALL_64_after_swapgs+0x58/0xc6 [<ffffffffffffffff>] 0xffffffffffffffff
Its hard to say for sure if this relates or not to the issue. This looks to be more of OS issue (polling) than a DS issue, but I can't say be sure. Lets try and get some useful stack traces using gdb and go from there.
Thanks, Mark
This is reproducable.
Regards Jan _______________________________________________ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject....
On 06/15/2018 11:50 PM, Mark Reynolds wrote:
On 06/15/2018 05:41 PM, Jan Kowalsky wrote:
Hi Marc,
thanks for help.
Am 15.06.2018 um 22:50 schrieb Mark Reynolds:
You did not run logconv.pl the way I requested, can you please run it again this way:
logconv.pl -ulatn <access logs>
I omitted the detailed searches because there are user-data in it...
but this here does't look like this is a problem:
----- Top 20 Most Frequent etimes -----
64902 etime=0 157 etime=1 34 etime=3 27 etime=2 18 etime=4
I think the next step, besides gathering the right logconv output, would be to get several pstacks of the ns-slapd process at the same time the "hang" occurs. Then we can at least see what all the threads are doing and hopefully get better information. I would try and get the 389-ds-base-debuginfo package installed first though (this should give us more readable pstacks).
hm, I installed 389-ds-base-dbg and 389-ds-base-libs-dbg. What else do I have to do?
pstack gives me:
pstack 13463
13463: /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-ldap0 -i /var/run/dirsrv/slapd-ldap0.pid -w /var/run/dirsrv/slapd-ldap0.startpid (No symbols found) 0x7f808a7717bc: ???? (7f803c03cf20, 0, 7f803c0421f0, 0, 7f808c0dc5d6, 5615f1d4bc80) + ffffd695834bb170 0x300000157: ???? (0, 200000, 0, 31, 3, 0) + ffffa9ea0c848bd3 crawl: Input/output error Error tracing through process 13463
Does this help:
Are you sure the "debuginfo" packages are the same version as 389-ds-base/libs?
Run: rpm -qa | grep 389-ds-base
The other option is attach gdb to the process and get a stack trace that way:
# gdb -p PID (gdb) thread apply all bt (gdb) quit
If everything runs fast:
root@ldap0:~# cat /proc/13463/stack [<ffffffff810eaa9a>] hrtimer_start_range_ns+0x1aa/0x3d0 [<ffffffff810ea660>] hrtimer_init+0x110/0x110 [<ffffffff8121ce26>] poll_schedule_timeout+0x46/0x70 [<ffffffff8121e50f>] do_sys_poll+0x44f/0x560 [<ffffffff814aab95>] loopback_xmit+0x65/0xa0 [<ffffffff815785e5>] tcp_rate_gen+0x105/0x180 [<ffffffff81565fc7>] tcp_ack+0xd97/0x16e0 [<ffffffff8121cff0>] poll_select_copy_remaining+0x150/0x150 [<ffffffff8121cff0>] poll_select_copy_remaining+0x150/0x150 [<ffffffff8121cff0>] poll_select_copy_remaining+0x150/0x150 [<ffffffff8121cff0>] poll_select_copy_remaining+0x150/0x150 [<ffffffff8121cff0>] poll_select_copy_remaining+0x150/0x150 [<ffffffff8121cff0>] poll_select_copy_remaining+0x150/0x150 [<ffffffff8121cff0>] poll_select_copy_remaining+0x150/0x150 [<ffffffff8121cff0>] poll_select_copy_remaining+0x150/0x150 [<ffffffff81209bcd>] __fput+0x17d/0x220 [<ffffffff8102f530>] recalibrate_cpu_khz+0x10/0x10 [<ffffffff810efbf6>] ktime_get_ts64+0x46/0xf0 [<ffffffff8121e707>] SyS_poll+0x67/0x120 [<ffffffff81003bd1>] do_syscall_64+0x91/0x1a0 [<ffffffff816137ce>] entry_SYSCALL_64_after_swapgs+0x58/0xc6 [<ffffffffffffffff>] 0xffffffffffffffff
When it hangs:
root@ldap0:~# cat /proc/13463/stack [<ffffffff810fa433>] futex_wait_queue_me+0xd3/0x140 [<ffffffff810fb2cc>] futex_wait+0xfc/0x260 [<ffffffff8121cff0>] poll_select_copy_remaining+0x150/0x150 [<ffffffff810fce17>] do_futex+0x2b7/0xb40 [<ffffffff810e81e7>] lock_timer_base+0x67/0x80 [<ffffffff810e839a>] internal_add_timer+0x1a/0x70 [<ffffffff81202d3b>] __check_object_size+0x10b/0x1dc [<ffffffff814eeece>] move_addr_to_user+0xbe/0xe0 [<ffffffff814ef160>] SYSC_getsockname+0x80/0xd0 [<ffffffff810fd723>] SyS_futex+0x83/0x180 [<ffffffff814efb17>] SyS_setsockopt+0xb7/0xf0 [<ffffffff81003bd1>] do_syscall_64+0x91/0x1a0 [<ffffffff816137ce>] entry_SYSCALL_64_after_swapgs+0x58/0xc6 [<ffffffffffffffff>] 0xffffffffffffffff
Its hard to say for sure if this relates or not to the issue. This looks to be more of OS issue (polling) than a DS issue, but I can't say be sure. Lets try and get some useful stack traces using gdb and go from there.
these stacks don't look like slapd stacks, also for ns-slapd process there should be many threads. Are you sure you run pstack on the right process ?
Thanks, Mark
This is reproducable.
Regards Jan _______________________________________________ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject....
389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject....
389-users@lists.fedoraproject.org