Hello Gurus,
looking for an answer to the following performance behavior
my env: 389-ds-base-1.3.5.15-1.fc24.x86_64 in multimaster fractional replication
running rsearch for 5 min with 1 thread seeing spikes for a basic read using index uid
And running with 10 threads same search the avg ms/ops performance are much better with no major spike/burst
Any explanation much appreciate it
see bellow for 1 thread and the spike/burst
T 300 -t 1 rsearch: 1 threads launched. T1 min= 0ms, max= 5ms, count = 54710 T1 min= 0ms, max= 42ms, count = 64930 T1 min= 0ms, max= 2ms, count = 65174 T1 min= 0ms, max= 2ms, count = 65110 T1 min= 0ms, max= 44ms, count = 64966 T1 min= 0ms, max= 1ms, count = 65101 T1 min= 0ms, max= 22ms, count = 65056 T1 min= 0ms, max= 32ms, count = 64981 T1 min= 0ms, max= 1ms, count = 65145 T1 min= 0ms, max= 1ms, count = 65223 T1 min= 0ms, max= 27ms, count = 65015 T1 min= 0ms, max= 1ms, count = 65182 T1 min= 0ms, max= 3ms, count = 65213 T1 min= 0ms, max= 23ms, count = 64760 T1 min= 0ms, max= 2ms, count = 64214 T1 min= 0ms, max= 3ms, count = 52279 T1 min= 0ms, max= 11ms, count = 64914 T1 min= 0ms, max= 1ms, count = 65118 T1 min= 0ms, max= 5ms, count = 64852 T1 min= 0ms, max= 91ms, count = 64180 T1 min= 0ms, max= 4ms, count = 64746 T1 min= 0ms, max= 1ms, count = 65080 T1 min= 0ms, max= 12ms, count = 65110 T1 min= 0ms, max= 702ms, count = 59243 T1 min= 0ms, max= 1ms, count = 65082 T1 min= 0ms, max= 89ms, count = 64331 T1 min= 0ms, max= 23ms, count = 64647 T1 min= 0ms, max= 5ms, count = 64818 T1 min= 0ms, max= 55ms, count = 64374 T1 min= 0ms, max= 8ms, count = 64713
T1 min= 0ms, max= 8ms, count = 64713 300 sec >= 300 Final Average rate: 6394.22/sec = 0.1564msec/op, total: 64713
And final avg rate for 10 threads, no significant spike/burst for this num of threads
20180905 14:07:23 - Rate: 16962.10/thr (16962.10/sec = 0.0590ms/op), total:169621 (10 thr) 300 sec >= 300 Final Average rate: 17420.40/sec = 0.0574msec/op, total:169621
Hi,
it could be flushing of logs (access.log) on disk which happens more often when server load is higher. you could use iostat or dstat to see what happens
Regards, Andrey
De: "Ghiurea, Isabella" Isabella.Ghiurea@nrc-cnrc.gc.ca À: 389-users@lists.fedoraproject.org Envoyé: Mercredi 5 Septembre 2018 23:14:24 Objet: [389-users] ldap perfomance
Hello Gurus,
looking for an answer to the following performance behavior my env: 389-ds-base-1.3.5.15-1.fc24.x86_64 in multimaster fractional replication
running rsearch for 5 min with 1 thread seeing spikes for a basic read using index uid
And running with 10 threads same search the avg ms/ops performance are much better with no major spike/burst
Any explanation much appreciate it
see bellow for 1 thread and the spike/burst
T 300 -t 1 rsearch: 1 threads launched. T1 min= 0ms, max= 5ms, count = 54710 T1 min= 0ms, max= 42ms, count = 64930 T1 min= 0ms, max= 2ms, count = 65174 T1 min= 0ms, max= 2ms, count = 65110 T1 min= 0ms, max= 44ms, count = 64966 T1 min= 0ms, max= 1ms, count = 65101 T1 min= 0ms, max= 22ms, count = 65056 T1 min= 0ms, max= 32ms, count = 64981 T1 min= 0ms, max= 1ms, count = 65145 T1 min= 0ms, max= 1ms, count = 65223 T1 min= 0ms, max= 27ms, count = 65015 T1 min= 0ms, max= 1ms, count = 65182 T1 min= 0ms, max= 3ms, count = 65213 T1 min= 0ms, max= 23ms, count = 64760 T1 min= 0ms, max= 2ms, count = 64214 T1 min= 0ms, max= 3ms, count = 52279 T1 min= 0ms, max= 11ms, count = 64914 T1 min= 0ms, max= 1ms, count = 65118 T1 min= 0ms, max= 5ms, count = 64852 T1 min= 0ms, max= 91ms, count = 64180 T1 min= 0ms, max= 4ms, count = 64746 T1 min= 0ms, max= 1ms, count = 65080 T1 min= 0ms, max= 12ms, count = 65110 T1 min= 0ms, max= 702ms, count = 59243 T1 min= 0ms, max= 1ms, count = 65082 T1 min= 0ms, max= 89ms, count = 64331 T1 min= 0ms, max= 23ms, count = 64647 T1 min= 0ms, max= 5ms, count = 64818 T1 min= 0ms, max= 55ms, count = 64374 T1 min= 0ms, max= 8ms, count = 64713
T1 min= 0ms, max= 8ms, count = 64713 300 sec >= 300 Final Average rate: 6394.22/sec = 0.1564msec/op, total: 64713
And final avg rate for 10 threads, no significant spike/burst for this num of threads
20180905 14:07:23 - Rate: 16962.10/thr (16962.10/sec = 0.0590ms/op), total:169621 (10 thr) 300 sec >= 300 Final Average rate: 17420.40/sec = 0.0574msec/op, total:169621
389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject....
A more detailed discussion about it: http://www.port389.org/docs/389ds/design/logging-performance-improvement.htm...
You could also disable logging and see whether the spikes disappear to be sure of their source: http://www.port389.org/docs/389ds/howto/howto-logsystemperf.html
Hi,
it could be flushing of logs (access.log) on disk which happens more often when server load is higher. you could use iostat or dstat to see what happens
Regards, Andrey
De: "Ghiurea, Isabella" Isabella.Ghiurea@nrc-cnrc.gc.ca À: 389-users@lists.fedoraproject.org Envoyé: Mercredi 5 Septembre 2018 23:14:24 Objet: [389-users] ldap perfomance
Hello Gurus,
looking for an answer to the following performance behavior my env: 389-ds-base-1.3.5.15-1.fc24.x86_64 in multimaster fractional replication
running rsearch for 5 min with 1 thread seeing spikes for a basic read using index uid
And running with 10 threads same search the avg ms/ops performance are much better with no major spike/burst
Any explanation much appreciate it
see bellow for 1 thread and the spike/burst
T 300 -t 1 rsearch: 1 threads launched. T1 min= 0ms, max= 5ms, count = 54710 T1 min= 0ms, max= 42ms, count = 64930 T1 min= 0ms, max= 2ms, count = 65174 T1 min= 0ms, max= 2ms, count = 65110 T1 min= 0ms, max= 44ms, count = 64966 T1 min= 0ms, max= 1ms, count = 65101 T1 min= 0ms, max= 22ms, count = 65056 T1 min= 0ms, max= 32ms, count = 64981 T1 min= 0ms, max= 1ms, count = 65145 T1 min= 0ms, max= 1ms, count = 65223 T1 min= 0ms, max= 27ms, count = 65015 T1 min= 0ms, max= 1ms, count = 65182 T1 min= 0ms, max= 3ms, count = 65213 T1 min= 0ms, max= 23ms, count = 64760 T1 min= 0ms, max= 2ms, count = 64214 T1 min= 0ms, max= 3ms, count = 52279 T1 min= 0ms, max= 11ms, count = 64914 T1 min= 0ms, max= 1ms, count = 65118 T1 min= 0ms, max= 5ms, count = 64852 T1 min= 0ms, max= 91ms, count = 64180 T1 min= 0ms, max= 4ms, count = 64746 T1 min= 0ms, max= 1ms, count = 65080 T1 min= 0ms, max= 12ms, count = 65110 T1 min= 0ms, max= 702ms, count = 59243 T1 min= 0ms, max= 1ms, count = 65082 T1 min= 0ms, max= 89ms, count = 64331 T1 min= 0ms, max= 23ms, count = 64647 T1 min= 0ms, max= 5ms, count = 64818 T1 min= 0ms, max= 55ms, count = 64374 T1 min= 0ms, max= 8ms, count = 64713
T1 min= 0ms, max= 8ms, count = 64713 300 sec >= 300 Final Average rate: 6394.22/sec = 0.1564msec/op, total: 64713
And final avg rate for 10 threads, no significant spike/burst for this num of threads
20180905 14:07:23 - Rate: 16962.10/thr (16962.10/sec = 0.0590ms/op), total:169621 (10 thr) 300 sec >= 300 Final Average rate: 17420.40/sec = 0.0574msec/op, total:169621
389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject....
389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject....
This does not justify this since running 1 tread takes 0.1564msec/op and running 10 threads takes 0.0590ms/op and the last one will require the access.log to be flush more frequently I think for 10 threads and I do not see the spike in exec time showed for 1 thread. Maybe something else ?
On 9/6/2018 8:50 AM, isabella.ghiurea@nrc-cnrc.gc.ca wrote:
This does not justify this since running 1 tread takes 0.1564msec/op and running 10 threads takes 0.0590ms/op and the last one will require the access.log to be flush more frequently I think for 10 threads and I do not see the spike in exec time showed for 1 thread. Maybe something else ? _______________________________________________
I think, looking at the data you posted, the question you're asking is "why, when I subject my server to a continuous search operation load, do some operations have much longer latency than others?".
This isn't a performance issue per se, imho because performance overall is acceptable. The problem is that rsearch reports a maximum search response time that is quite high. That maximum could have been measured on only _one_ operation though.
On Thu, 2018-09-06 at 08:55 -0600, David Boreham wrote:
On 9/6/2018 8:50 AM, isabella.ghiurea@nrc-cnrc.gc.ca wrote:
This does not justify this since running 1 tread takes 0.1564msec/op and running 10 threads takes 0.0590ms/op and the last one will require the access.log to be flush more frequently I think for 10 threads and I do not see the spike in exec time showed for 1 thread. Maybe something else ? _______________________________________________
I think, looking at the data you posted, the question you're asking is "why, when I subject my server to a continuous search operation load, do some operations have much longer latency than others?".
If they are doing the same operation repeatedly, then it *is* an issue because it shows that within the server architecture there is a flaw that causes non deterministic behaviour.
I have observed this behaviour myself, and traced it to logging (I did a server build with log_* as a stub function).
But there are plenty of other places in the code which could be issues - I know plenty of them, but time is the enemy of all things :)
On 9/6/2018 6:41 PM, William Brown wrote:
I think, looking at the data you posted, the question you're asking is "why, when I subject my server to a continuous search operation load, do some operations have much longer latency than others?".
If they are doing the same operation repeatedly, then it *is* an issue because it shows that within the server architecture there is a flaw that causes non deterministic behaviour.
I didn't say it wasn't a problem -- just attempting to clarify the specific scope of the problem.
On 09/06/2018 07:50 AM, isabella.ghiurea@nrc-cnrc.gc.ca wrote:
This does not justify this since running 1 tread takes 0.1564msec/op and running 10 threads takes 0.0590ms/op
Yes, but that's an average. Running 10 threads doesn't make individual searches take less time.
When you're running just one thread, you're getting the result of one CPU core. When you increase the number of threads in rsearch, the server will use more CPU cores. You'll see more searches completed in the same amount of time, which rsearch will report as a lower average time/op. When you increase threads enough to saturate the CPU cores in the server, then average time will start to increase again.
and the last one will require the access.log to be flush more frequently I think for 10 threads and I do not see the spike in exec time showed for 1 thread. Maybe something else ?
With access logging enabled, I see similar spikes in search time reported by rsearch. When I turn access logging off, I don't. Try it yourself and see if you can reproduce the behavior you reported:
# ldapmodify -h localhost -D 'cn=Directory Manager' -W dn: cn=config changetype: modify replace: nsslapd-accesslog-logging-enabled nsslapd-accesslog-logging-enabled: off
Hi there,
I have seen this behaviour due to an issue in the design of access log buffer flushing. During a buffer flush all other threads are delayed, which can cause this spike.
You can confirm this by changing your access log buffer size up or down.
Sadly this problem is not simply fixed without a complete rewrite of the logging subsystem - which I would love to do, but I am not employed to work on 389 at this time so I lack the time.
Hope that helps.
On Wed, 2018-09-05 at 21:14 +0000, Ghiurea, Isabella wrote:
Hello Gurus, looking for an answer to the following performance behavior my env: 389-ds-base-1.3.5.15-1.fc24.x86_64 in multimaster fractional replication
running rsearch for 5 min with 1 thread seeing spikes for a basic read using index uid And running with 10 threads same search the avg ms/ops performance are much better with no major spike/burst Any explanation much appreciate it
see bellow for 1 thread and the spike/burst T 300 -t 1 rsearch: 1 threads launched. T1 min= 0ms, max= 5ms, count = 54710 T1 min= 0ms, max= 42ms, count = 64930 T1 min= 0ms, max= 2ms, count = 65174 T1 min= 0ms, max= 2ms, count = 65110 T1 min= 0ms, max= 44ms, count = 64966 T1 min= 0ms, max= 1ms, count = 65101 T1 min= 0ms, max= 22ms, count = 65056 T1 min= 0ms, max= 32ms, count = 64981 T1 min= 0ms, max= 1ms, count = 65145 T1 min= 0ms, max= 1ms, count = 65223 T1 min= 0ms, max= 27ms, count = 65015 T1 min= 0ms, max= 1ms, count = 65182 T1 min= 0ms, max= 3ms, count = 65213 T1 min= 0ms, max= 23ms, count = 64760 T1 min= 0ms, max= 2ms, count = 64214 T1 min= 0ms, max= 3ms, count = 52279 T1 min= 0ms, max= 11ms, count = 64914 T1 min= 0ms, max= 1ms, count = 65118 T1 min= 0ms, max= 5ms, count = 64852 T1 min= 0ms, max= 91ms, count = 64180 T1 min= 0ms, max= 4ms, count = 64746 T1 min= 0ms, max= 1ms, count = 65080 T1 min= 0ms, max= 12ms, count = 65110 T1 min= 0ms, max= 702ms, count = 59243 T1 min= 0ms, max= 1ms, count = 65082 T1 min= 0ms, max= 89ms, count = 64331 T1 min= 0ms, max= 23ms, count = 64647 T1 min= 0ms, max= 5ms, count = 64818 T1 min= 0ms, max= 55ms, count = 64374 T1 min= 0ms, max= 8ms, count = 64713 T1 min= 0ms, max= 8ms, count = 64713 300 sec >= 300 Final Average rate: 6394.22/sec = 0.1564msec/op, total: 64713
And final avg rate for 10 threads, no significant spike/burst for this num of threads
20180905 14:07:23 - Rate: 16962.10/thr (16962.10/sec = 0.0590ms/op), total:169621 (10 thr) 300 sec >= 300 Final Average rate: 17420.40/sec = 0.0574msec/op, total:169621
389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject....
On 9/6/2018 6:37 PM, William Brown wrote:
I have seen this behaviour due to an issue in the design of access log buffer flushing. During a buffer flush all other threads are delayed, which can cause this spike.
You can confirm this by changing your access log buffer size up or down.
Sadly this problem is not simply fixed without a complete rewrite of the logging subsystem - which I would love to do, but I am not employed to work on 389 at this time so I lack the time.
Interesting to see this because when I first ran performance tests on the DS, around 20 years ago, I quickly discovered that the logging code was a bottleneck. So I re-wrote it with in-memory ping-pong buffering for the operation threads' log output, and removed any synchronous filesystem writes. I expect that in the years since perhaps concurrency bugs led to that that code being removed, or it could be that it no longer performs well with modern hardware.
On Fri, 2018-09-07 at 13:38 -0600, David Boreham wrote:
On 9/6/2018 6:37 PM, William Brown wrote:
I have seen this behaviour due to an issue in the design of access log buffer flushing. During a buffer flush all other threads are delayed, which can cause this spike.
You can confirm this by changing your access log buffer size up or down.
Sadly this problem is not simply fixed without a complete rewrite of the logging subsystem - which I would love to do, but I am not employed to work on 389 at this time so I lack the time.
Interesting to see this because when I first ran performance tests on the DS, around 20 years ago, I quickly discovered that the logging code was a bottleneck. So I re-wrote it with in-memory ping-pong buffering for the operation threads' log output, and removed any synchronous filesystem writes. I expect that in the years since perhaps concurrency bugs led to that that code being removed, or it could be that it no longer performs well with modern hardware.
Could be both. Modern hardware is especially bad at handling any situation where locking get's involved due to the highly async nature of modern CPU's and their cache layers.
There is a lot on my "todo list" to rearchitect to make the code scalable on modern systems today. A lot of what I want to do is directly inspired by erlang's actor model and concurrent readability.
389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject....
389-users@lists.fedoraproject.org