ups, the mail wasn't ready yet - I sent it by accident.
Hi all,
I'm running a set of three 389-ds servers with about 50 databases with replication on each server.
No I'm encounter a constant very hight disk write rate (about 300 write io/sec. - avarage).
In the audit-log there is nothing what would explain this. But in iotop I see a lot of threads like:
1621 be/4 dirsrv 0.00 B/s 3.95 K/s 0.00 % 0.46 % ns-slapd -D /etc/dirsrv/slapd-ldap0 -i /var/run/dirsrv/slapd-ldap0.pid -w /var/run/dirsrv/slapd-ldap0.startpid 1628 be/4 dirsrv 0.00 B/s 7.90 K/s 0.00 % 0.46 % ns-slapd -D /etc/dirsrv/slapd-ldap0 -i /var/run/dirsrv/slapd-ldap0.pid -w /var/run/dirsrv/slapd-ldap0.startpid
....
1631 be/4 dirsrv 0.00 B/s 892.18 K/s 0.00 % 0.00 % ns-slapd -D /etc/dirsrv/slapd-ldap0 -i /var/run/dirsrv/slapd-ldap0.pid -w /var/run/dirsrv/slapd-ldap0.startpid 1463 be/4 dirsrv 0.00 B/s 580.31 K/s 0.00 % 0.00 % ns-slapd -D /etc/dirsrv/slapd-ldap0 -i /var/run/dirsrv/slapd-ldap0.pid -w /var/run/dirsrv/slapd-ldap0.startpid 1462 be/4 dirsrv 0.00 B/s 363.19 K/s 0.00 % 0.00 % ns-slapd -D /etc/dirsrv/slapd-ldap0 -i /var/run/dirsrv/slapd-ldap0.pid -w /var/run/dirsrv/slapd-ldap0.startpid
I configured caching and have entrycachehits about 99 - but anyway this would have only impact to read-operations.
What's conspicuous: One of the three server has a significant higher write rate than the others. When I watch our munin stats the two other have with 40 to 60 write operations per second 10 times less. And one of the server suddenly reduces write-rates from more than 100 to averate 50.
Any idea? Regards Jan
In the audit-log there is nothing what would explain this. But in iotop I see a lot of threads like:
The audit log itself (and search log) will generate IO themself :)
1621 be/4 dirsrv 0.00 B/s 3.95 K/s 0.00 % 0.46 % ns-slapd -D /etc/dirsrv/slapd-ldap0 -i /var/run/dirsrv/slapd-ldap0.pid -w /var/run/dirsrv/slapd-ldap0.startpid
Sadly this doesn't tell us much :(
I configured caching and have entrycachehits about 99 - but anyway this would have only impact to read-operations.
DBcache would help with writes.
What's conspicuous: One of the three server has a significant higher write rate than the others. When I watch our munin stats the two other have with 40 to 60 write operations per second 10 times less. And one of the server suddenly reduces write-rates from more than 100 to averate 50.
It would be good to see the ldbm config here I think. It would also be good to know what kind of work load you have. In some configurations of the network, things like load balancers could be sending more read/write traffic to one server, rather than over all of them.
Another spot to check is your log buffer size, as it may be small (causing the threads to all be writing small ammounts). The logging subsystem is slated for a rework in the future to make it more scalable.
Hope that helps,
Any idea? Regards Jan _______________________________________________ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject....
On 08/09/2018 01:53 AM, William Brown wrote:
In the audit-log there is nothing what would explain this. But in iotop I see a lot of threads like:
The audit log itself (and search log) will generate IO themself :)
1621 be/4 dirsrv 0.00 B/s 3.95 K/s 0.00 % 0.46 % ns-slapd -D /etc/dirsrv/slapd-ldap0 -i /var/run/dirsrv/slapd-ldap0.pid -w /var/run/dirsrv/slapd-ldap0.startpid
Sadly this doesn't tell us much :(
we could get a pstack along with iotop to see which threads do teh IO, regular mods or the BDB regulars like trickle, checkpointing ....
I configured caching and have entrycachehits about 99 - but anyway this would have only impact to read-operations.
DBcache would help with writes.
What's conspicuous: One of the three server has a significant higher write rate than the others. When I watch our munin stats the two other have with 40 to 60 write operations per second 10 times less. And one of the server suddenly reduces write-rates from more than 100 to averate 50.
It would be good to see the ldbm config here I think. It would also be good to know what kind of work load you have. In some configurations of the network, things like load balancers could be sending more read/write traffic to one server, rather than over all of them.
Another spot to check is your log buffer size, as it may be small (causing the threads to all be writing small ammounts). The logging subsystem is slated for a rework in the future to make it more scalable.
Hope that helps,
Any idea? Regards Jan _______________________________________________ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject....
Hi all, thanks for answering,
it took some time until I came again to this problem.
Am 09.08.2018 um 10:44 schrieb Ludwig Krispenz:
On 08/09/2018 01:53 AM, William Brown wrote:
Sadly this doesn't tell us much :(
we could get a pstack along with iotop to see which threads do teh IO,
unfortunately pstack doesn't give any output, even with debugging symbols installed:
ii 389-ds-base 1.3.3.5-4+deb8u1 amd64 389 Directory Server suite - server ii 389-ds-base-dbg 1.3.3.5-4+deb8u1 amd64 389 Directory Server suite - server debugging symbols ii 389-ds-base-libs 1.3.3.5-4+deb8u1 amd64 389 Directory Server suite - libraries ii 389-ds-base-libs-dbg 1.3.3.5-4+deb8u1 amd64 389 Directory Server suite - library debugging symbols
pstack 718
718: /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-ldap1 -i /var/run/dirsrv/slapd-ldap1.pid -w /var/run/dirsrv/slapd-ldap1.startpid (No symbols found) 0x7f69b42b37bc: ???? (7f69a0018270, 0, 7f69a00824a0, 0, 7f69b5c1e616, 564c254998e0) + ffffd6e2892ce4a0 0x3000001b3: ???? (0, 200000, 0, 31, 3, 564c252df680) + ffffa9b3da9bbfe3 crawl: Input/output error Error tracing through process 718 0x564c242aaa00: ????
regular mods or the BDB regulars like trickle, checkpointing ....
It would be good to see the ldbm config here I think. It would also be good to know what kind of work load you have. In some configurations of the network, things like load balancers could be sending more
work load ist generally ok. With two cores ~ 0.80, i/o wait is little but CPU is sometimes on 100 % ore more (look here: https://nextcloud.datenkollektiv.net/s/XEHmCyscEt4j9cn).
read/write traffic to one server, rather than over all of them.
dn: cn=ldbm database,cn=plugins,cn=config objectClass: top objectClass: nsSlapdPlugin objectClass: extensibleObject cn: ldbm database nsslapd-pluginPath: libback-ldbm nsslapd-pluginInitfunc: ldbm_back_init nsslapd-pluginType: database nsslapd-pluginEnabled: on nsslapd-plugin-depends-on-type: Syntax nsslapd-plugin-depends-on-type: matchingRule nsslapd-pluginId: ldbm-backend nsslapd-pluginVersion: 1.3.3.5 nsslapd-pluginVendor: 389 Project nsslapd-pluginDescription: high-performance LDAP backend database plugin numSubordinates: 58
example for ldbm database config:
dn: cn=example_net,cn=ldbm database,cn=plugins,cn=config objectClass: top objectClass: extensibleobject objectClass: nsbackendinstance cn: notraces_net creatorsName: cn=directory manager modifiersName: cn=directory manager createTimestamp: 20171214123209Z modifyTimestamp: 20171214123209Z numSubordinates: 4 nsslapd-suffix: dc=notraces,dc=net nsslapd-cachesize: -1 nsslapd-cachememsize: 1346371 nsslapd-readonly: off nsslapd-require-index: off nsslapd-directory: /var/lib/dirsrv/slapd-ldap0/db/notraces_net nsslapd-dncachememsize: 512000
The issue: even if I set the dncachememsize to some higher value (while dirsrv is off), after starting it the values are changed by dirsrv.
Another spot to check is your log buffer size, as it may be small (causing the threads to all be writing small ammounts). The logging subsystem is slated for a rework in the future to make it more scalable.
access log is off. Only audit log is enabled - and there is not much.
Probably the special setup with a lot of single databases (one per email-domain > 50 dbs) is responsible.
Maybe the high write rates are not really responsible for poor ldap-performance (look at my thread on this list with "ldapsearch performance problem" in june).
It's still the same. Sometimes (a few times per minute) the ldap-server becomes unresponsive for one up to several seconds. I testet it with a loop like:
while true; do time ldapsearch -h localhost -o ldif-wrap=no -D "cn=directory manager" -w secret -s sub -b "dc=datenkollektiv.net,dc=net" 'objectclass=*' dn > /dev/null; done
Interesting: if one of these queries hang for some time every other query from other hosts also hang for exactly the same time.
We wrote a strace along with the queries (output here: https://nextcloud.datenkollektiv.net/s/XEHmCyscEt4j9cn)
so I see e.g. for a query which took about 5.489 seconds (timestamp is above):
in query.log:
12:55:39.745018658
real 0m5.489s user 0m0.024s sys 0m0.040s 12:55:45.237697886
real 0m1.625s user 0m0.004s sys 0m0.008s 12:55:46.869872903
in strace.log:
[pid 8088] 12:55:39.739539 poll([{fd=435, events=POLLOUT}], 1, 1800000 <unfinished ...> [pid 8058] 12:55:39.739573 <... write resumed> ) = 1 <0.000087> [pid 8088] 12:55:39.739723 <... poll resumed> ) = 1 ([{fd=435, revents=POLLOUT}]) <0.000168> [pid 8058] 12:55:39.739754 write(426, "dn: cn=ntU"..., 344 <unfinished ...> [pid 8088] 12:55:39.739824 sendto(435, "0<\2\1\2d7\0043u"..., 62, 0, NULL, 0 <unfinished ...> [pid 8058] 12:55:39.739858 <... write resumed> ) = 344 <0.000086> [pid 8088] 12:55:39.739917 <... sendto resumed> ) = 62 <0.000077> [pid 8058] 12:55:39.739944 write(426, "\n", 1 <unfinished ...> [pid 8088] 12:55:39.740032 clock_gettime(CLOCK_MONOTONIC, <unfinished ...> [pid 8058] 12:55:39.740061 <... write resumed> ) = 1 <0.000100> [pid 8088] 12:55:39.740191 <... clock_gettime resumed> {1083541, 905025629}) = 0 <0.000149> [pid 8058] 12:55:39.740223 write(426, "dn: cn=ntU"..., 352 <unfinished ...> [pid 8088] 12:55:39.740296 poll([{fd=435, events=POLLOUT}], 1, 1800000 <unfinished ...> [pid 8058] 12:55:39.740331 <... write resumed> ) = 352 <0.000090> [pid 8088] 12:55:39.740391 <... poll resumed> ) = 1 ([{fd=435, revents=POLLOUT}]) <0.000079> [pid 8058] 12:55:39.740423 write(426, "\n", 1 <unfinished ...> [pid 13776] 12:55:39.740502 <... select resumed> ) = 0 (Timeout) <0.129345> [pid 8088] 12:55:39.740528 sendto(435, "0\f\2\1\2e\7\n\1\0"..., 14, 0, NULL, 0 <unfinished ...> [pid 8058] 12:55:39.740562 <... write resumed> ) = 1 <0.000123> [pid 13776] 12:55:39.740705 gettimeofday( <unfinished ...> [pid 8088] 12:55:39.740739 <... sendto resumed> ) = 14 <0.000191> [pid 8058] 12:55:39.740760 write(426, "dn: cn=num"..., 355 <unfinished ...> [pid 13776] 12:55:39.740952 <... gettimeofday resumed> {1534244139, 740898}, NULL) = 0 <0.000229> [pid 8088] 12:55:39.740986 setsockopt(435, SOL_TCP, TCP_CORK, [0], 4 <unfinished ...> [pid 8058] 12:55:39.741019 <... write resumed> ) = 355 <0.000240> [pid 13776] 12:55:39.742191 poll([{fd=436, events=POLLIN|POLLPRI}], 1, 0 <unfinished ...> [pid 8088] 12:55:39.742258 <... setsockopt resumed> ) = 0 <0.001252> [pid 8058] 12:55:39.742289 write(426, "\n", 1 <unfinished ...> [pid 13776] 12:55:39.743439 <... poll resumed> ) = 0 (Timeout) <0.001206> [pid 8088] 12:55:39.743487 write(409, "\0", 1 <unfinished ...> [pid 8058] 12:55:39.743529 <... write resumed> ) = 1 <0.001219> [pid 13776] 12:55:39.748812 select(0, NULL, NULL, NULL, {0, 256000} <unfinished ...> [pid 13772] 12:55:39.748893 <... select resumed> ) = 0 (Timeout) <0.258082> [pid 13772] 12:55:39.748934 gettimeofday({1534244139, 748952}, NULL) = 0 <0.000019> [pid 13772] 12:55:39.749001 poll([{fd=434, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) <0.000022> [pid 13772] 12:55:39.749072 select(0, NULL, NULL, NULL, {0, 512000} <unfinished ...> [pid 8088] 12:55:39.749148 <... write resumed> ) = 1 <0.005636> [pid 8058] 12:55:39.749189 write(426, "dn: cn=obj"..., 345 <unfinished ...> [pid 8088] 12:55:39.749254 futex(0x564c333401dc, FUTEX_WAIT_PRIVATE, 6381055, NULL <unfinished ...> [pid 8058] 12:55:39.749314 <... write resumed> ) = 345 <0.000103> [pid 8058] 12:55:39.749360 write(426, "\n", 1) = 1 <0.000036> [pid 8058] 12:55:39.749638 write(426, "dn: cn=own"..., 332) = 332 <0.000035> [pid 8058] 12:55:39.749729 write(426, "\n", 1) = 1 <0.000033> [pid 8058] 12:55:39.749956 write(426, "dn: cn=par"..., 337) = 337 <0.000036> [pid 8058] 12:55:39.750049 write(426, "\n", 1) = 1 <0.000035> [pid 8058] 12:55:39.750214 write(426, "dn: cn=see"..., 336) = 336 <0.000034> [pid 8058] 12:55:39.750303 write(426, "\n", 1) = 1 <0.000032> [pid 8058] 12:55:39.750446 write(426, "dn: cn=sn,"..., 361) = 361 <0.000034> [pid 8058] 12:55:39.750532 write(426, "\n", 1) = 1 <0.000032> [pid 8058] 12:55:39.750663 write(426, "dn: cn=tar"..., 352) = 352 <0.000033> [pid 8058] 12:55:39.750747 write(426, "\n", 1) = 1 <0.000032> [pid 8058] 12:55:39.750877 write(426, "dn: cn=tel"..., 389) = 389 <0.000033> [pid 8058] 12:55:39.750963 write(426, "\n", 1) = 1 <0.000035> [pid 8058] 12:55:39.751111 write(426, "dn: cn=uid"..., 328) = 328 <0.000034> [pid 8058] 12:55:39.751199 write(426, "\n", 1) = 1 <0.000032>
....
but this is not a regular pattern:
We still don't have a clue if the performance problem is maybe caused from xen virtualisation or it's a dirsrv performance issue due to our configuration or from replication (three masters replicate to each other). But replication doesn't seem the problem since the issue still exists with disabled replications.
Before upgrading to another machine (more performant) we didn't had these problems (but with replicas only between two servers).
Debian Jessie 8.0 Kernel 4.9.0-0 Xen-Version Xen 4.8.4-pre Setup for Kolab-Groupware
Thanks for ideas.
Jan
in strace.log:
[pid 8088] 12:55:39.739539 poll([{fd=435, events=POLLOUT}], 1, 1800000 <unfinished ...> [pid 8058] 12:55:39.739573 <... write resumed> ) = 1 <0.000087> [pid 8088] 12:55:39.739723 <... poll resumed> ) = 1 ([{fd=435, revents=POLLOUT}]) <0.000168> [pid 8058] 12:55:39.739754 write(426, "dn: cn=ntU"..., 344 <unfinished ...> [pid 8088] 12:55:39.739824 sendto(435, "0<\2\1\2d7\0043u"..., 62, 0, NULL, 0 <unfinished ...>
Usually you need to do a few iterations on the strace data to get to the bottom of things.
I took a look at the full file you uploaded. Couple of things :
1. With a truncated strace it helps to know the identity of the FDs. You can use "lsof" or dump the data out of /proc /sys...somewhere.
2. You can tell strace to only output certain classes of syscalls. For this problem, where filesystem I/O is the target, you can use : strace -e trace=file and -e trace=desc. More details here : https://linux-audit.com/the-ultimate-strace-cheat-sheet/ and also in the man page. This will remove the extra noise (gettimeofday() etc) in the output.
That said, I saw this, which looks possibly like a smoking gun:
[pid 8058] 12:55:40.563393 open("/etc/dirsrv/slapd-ldap0/dse.ldif.tmp", O_RDWR|O_CREAT|O_TRUNC, 0600 <unfinished ...> [pid 3186] 12:55:40.563826 <... select resumed> ) = 0 (Timeout) <0.009394> [pid 8058] 12:55:40.563855 <... open resumed> ) = 426 <0.000431>
It looks to have opened dse.ldif, then assuming that FD = 426 is from the open() call, it goes on to write a bunch of data to that file :
[pid 8058] 12:55:40.567465 write(426, "dn: cn=mon"..., 413) = 413 <0.000209> [pid 8058] 12:55:40.567826 write(426, "\n", 1) = 1 <0.000356> [pid 8058] 12:55:40.568601 write(426, "dn: cn=cha"..., 318) = 318 <0.000058> [pid 8058] 12:55:40.568727 write(426, "\n", 1) = 1 <0.000045> [pid 8058] 12:55:40.568937 write(426, "dn: cn=enc"..., 321) = 321 <0.000042> [pid 8058] 12:55:40.569040 write(426, "\n", 1) = 1 <0.000041> [pid 8058] 12:55:40.569182 write(426, "dn: cn=fea"..., 100) = 100 <0.000042> [pid 8058] 12:55:40.569281 write(426, "\n", 1) = 1 <0.000040> [pid 8058] 12:55:40.569427 write(426, "dn: cn=kol"..., 409) = 409 <0.000042> [pid 8058] 12:55:40.569528 write(426, "\n", 1) = 1 <0.000041>
dse.ldif is supposed to hold seldom-changing config data for the server.
So...one theory is that something the server is doing, or you are unknowingly asking it to do, is causing dse.ldif to be re-written constantly.
As you point out, the high filesystem I/O may not be the cause for the poor search performance, but certainly the server should not be constantly re-writing dse.ldif.
On 08/15/2018 10:13 AM, David Boreham wrote:
in strace.log:
[pid 8088] 12:55:39.739539 poll([{fd=435, events=POLLOUT}], 1, 1800000 <unfinished ...> [pid 8058] 12:55:39.739573 <... write resumed> ) = 1 <0.000087> [pid 8088] 12:55:39.739723 <... poll resumed> ) = 1 ([{fd=435, revents=POLLOUT}]) <0.000168> [pid 8058] 12:55:39.739754 write(426, "dn: cn=ntU"..., 344 <unfinished ...> [pid 8088] 12:55:39.739824 sendto(435, "0<\2\1\2d7\0043u"..., 62, 0, NULL, 0 <unfinished ...>
Usually you need to do a few iterations on the strace data to get to the bottom of things.
I took a look at the full file you uploaded. Couple of things :
- With a truncated strace it helps to know the identity of the FDs.
You can use "lsof" or dump the data out of /proc /sys...somewhere.
- You can tell strace to only output certain classes of syscalls. For
this problem, where filesystem I/O is the target, you can use : strace -e trace=file and -e trace=desc. More details here : https://linux-audit.com/the-ultimate-strace-cheat-sheet/ and also in the man page. This will remove the extra noise (gettimeofday() etc) in the output.
That said, I saw this, which looks possibly like a smoking gun:
[pid 8058] 12:55:40.563393 open("/etc/dirsrv/slapd-ldap0/dse.ldif.tmp", O_RDWR|O_CREAT|O_TRUNC, 0600 <unfinished ...> [pid 3186] 12:55:40.563826 <... select resumed> ) = 0 (Timeout) <0.009394> [pid 8058] 12:55:40.563855 <... open resumed> ) = 426 <0.000431>
It looks to have opened dse.ldif, then assuming that FD = 426 is from the open() call, it goes on to write a bunch of data to that file :
[pid 8058] 12:55:40.567465 write(426, "dn: cn=mon"..., 413) = 413 <0.000209> [pid 8058] 12:55:40.567826 write(426, "\n", 1) = 1 <0.000356> [pid 8058] 12:55:40.568601 write(426, "dn: cn=cha"..., 318) = 318 <0.000058> [pid 8058] 12:55:40.568727 write(426, "\n", 1) = 1 <0.000045> [pid 8058] 12:55:40.568937 write(426, "dn: cn=enc"..., 321) = 321 <0.000042> [pid 8058] 12:55:40.569040 write(426, "\n", 1) = 1 <0.000041> [pid 8058] 12:55:40.569182 write(426, "dn: cn=fea"..., 100) = 100 <0.000042> [pid 8058] 12:55:40.569281 write(426, "\n", 1) = 1 <0.000040> [pid 8058] 12:55:40.569427 write(426, "dn: cn=kol"..., 409) = 409 <0.000042> [pid 8058] 12:55:40.569528 write(426, "\n", 1) = 1 <0.000041>
dse.ldif is supposed to hold seldom-changing config data for the server.
So...one theory is that something the server is doing, or you are unknowingly asking it to do, is causing dse.ldif to be re-written constantly.
Updating the csn generator and the uuid generator will cause a lot of churn in dse.ldif. There are other housekeeping tasks which will write dse.ldif
As you point out, the high filesystem I/O may not be the cause for the poor search performance, but certainly the server should not be constantly re-writing dse.ldif.
389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject....
On 8/15/2018 10:36 AM, Rich Megginson wrote:
Updating the csn generator and the uuid generator will cause a lot of churn in dse.ldif. There are other housekeeping tasks which will write dse.ldif
But if those things were being done so frequently that the resulting filesystem I/O showed up on the radar as a potential system-wide performance issue, that would mean something was wrong somewhere, right?
On 08/15/2018 10:56 AM, David Boreham wrote:
On 8/15/2018 10:36 AM, Rich Megginson wrote:
Updating the csn generator and the uuid generator will cause a lot of churn in dse.ldif. There are other housekeeping tasks which will write dse.ldif
But if those things were being done so frequently that the resulting filesystem I/O showed up on the radar as a potential system-wide performance issue, that would mean something was wrong somewhere, right?
I would think so. Then I suppose the first step would be to measure the dse.ldif churn on a "normal" system to get a baseline.
389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject....
On Wed, 2018-08-15 at 11:03 -0600, Rich Megginson wrote:
On 08/15/2018 10:56 AM, David Boreham wrote:
On 8/15/2018 10:36 AM, Rich Megginson wrote:
Updating the csn generator and the uuid generator will cause a lot of churn in dse.ldif. There are other housekeeping tasks which will write dse.ldif
But if those things were being done so frequently that the resulting filesystem I/O showed up on the radar as a potential system-wide performance issue, that would mean something was wrong somewhere, right?
I would think so. Then I suppose the first step would be to measure the dse.ldif churn on a "normal" system to get a baseline.
We do have some poor locking strategies in some parts of the codebase that sadly I just never had time to finish fixing. Access logging comes to mind as a culprit for bottlenecking the server over locks needing a complete rewrite .... However he says the access log is off. I'm not sure that means that the locking over it is disabled though.
Other areas are the operation struct reuse (locks and has unbounded growth with high operation numbers), the cn=config locking on many variables (even for reads) which is frequently locked/unlocked (some are at least atomics now, but they still cause stalls on cpu syncs), locking in some plugins. These could all be parts of the issue.
I also had plans to add profiling into access logs so we could really narrow down the time hogs in searches/writes, but given the current access log design it's hard to do. We need more visibility into the server state when these queries come in, and today we just don't have it :(
Finally, it could come down to simpler things like indexes or db locking for certain queries ...
I think we need the access log enabled to get a better idea, with highres etime enabled. I would not be surprised to see a pattern like :
100 fast searches 1 slow search 100 fast searches 1 slow search
That would indicate issues in the logging lock.
To me, this entire issue, indicates we need better profiling information in our logs, because tools like strace just don't explain what's going on. We need to know *why* and *where* threads are getting stuck, how long each plugin is taking, and where the stalls are. Without investment into our servers internal state, these issues will always remain elusive to our users and us as a team.
----- long description of current access log issues -----
The current access log is protected in memory by a single mutex, that has an in memory buffer. Threads during their operations are writing to (and contending) the mutex to this buffer. Normally this is "reasonably fast", until the buffer fills. At this point the buffer needs to be flushed. The next search thread when it encounters the buffer approaching this limit *does the flushing itself* while holding the log mutex.
At this point, one search is busy writing the whole access log buffer, while every other thread begins to build up behind it waiting to write into the buffer themself. At some point the (poor, unlucky) operation thread that was writing to the buffer has been stuck doing disk IO for any period of time, while everyone else waits. It may not have even begun to send results to the client either! Finally, once done, it unlocks, and can complete the operation. This is commonly what causes the "burst" behaviour of the server.
Temporary fixes are to lower the buffer size (so it's written in smaller, more frequent amounts), but really, the only solution today is put the log in ramdisk/ssd to make that write "finish faster", or to use the syslog interface with async enabled.
Hi all,
I just wanted to say, that the high i/o rates where gone more or less spontaneously. Probably they where related to multimaster replication and writing data to changelogdb. But we could not trigger down the reason. After one restart the high i/o rates of one of the three suppliers in a multi-supplier environment came to the rates of the other one.
Since it's a productive environment it's hard to control all parameters to reproduce it.
The poor performance and hangs still exist and are not related to the i/o rates.
regards Jan
Am 16.08.2018 um 02:44 schrieb William Brown:
On Wed, 2018-08-15 at 11:03 -0600, Rich Megginson wrote:
On 08/15/2018 10:56 AM, David Boreham wrote:
On 8/15/2018 10:36 AM, Rich Megginson wrote:
Updating the csn generator and the uuid generator will cause a lot of churn in dse.ldif. There are other housekeeping tasks which will write dse.ldif
But if those things were being done so frequently that the resulting filesystem I/O showed up on the radar as a potential system-wide performance issue, that would mean something was wrong somewhere, right?
I would think so. Then I suppose the first step would be to measure the dse.ldif churn on a "normal" system to get a baseline.
We do have some poor locking strategies in some parts of the codebase that sadly I just never had time to finish fixing. Access logging comes to mind as a culprit for bottlenecking the server over locks needing a complete rewrite .... However he says the access log is off. I'm not sure that means that the locking over it is disabled though.
Other areas are the operation struct reuse (locks and has unbounded growth with high operation numbers), the cn=config locking on many variables (even for reads) which is frequently locked/unlocked (some are at least atomics now, but they still cause stalls on cpu syncs), locking in some plugins. These could all be parts of the issue.
I also had plans to add profiling into access logs so we could really narrow down the time hogs in searches/writes, but given the current access log design it's hard to do. We need more visibility into the server state when these queries come in, and today we just don't have it :(
Finally, it could come down to simpler things like indexes or db locking for certain queries ...
I think we need the access log enabled to get a better idea, with highres etime enabled. I would not be surprised to see a pattern like :
100 fast searches 1 slow search 100 fast searches 1 slow search
That would indicate issues in the logging lock.
To me, this entire issue, indicates we need better profiling information in our logs, because tools like strace just don't explain what's going on. We need to know *why* and *where* threads are getting stuck, how long each plugin is taking, and where the stalls are. Without investment into our servers internal state, these issues will always remain elusive to our users and us as a team.
----- long description of current access log issues -----
The current access log is protected in memory by a single mutex, that has an in memory buffer. Threads during their operations are writing to (and contending) the mutex to this buffer. Normally this is "reasonably fast", until the buffer fills. At this point the buffer needs to be flushed. The next search thread when it encounters the buffer approaching this limit *does the flushing itself* while holding the log mutex.
At this point, one search is busy writing the whole access log buffer, while every other thread begins to build up behind it waiting to write into the buffer themself. At some point the (poor, unlucky) operation thread that was writing to the buffer has been stuck doing disk IO for any period of time, while everyone else waits. It may not have even begun to send results to the client either! Finally, once done, it unlocks, and can complete the operation. This is commonly what causes the "burst" behaviour of the server.
Temporary fixes are to lower the buffer size (so it's written in smaller, more frequent amounts), but really, the only solution today is put the log in ramdisk/ssd to make that write "finish faster", or to use the syslog interface with async enabled.
389-users@lists.fedoraproject.org