I expect the errors are rather benign... Like:
Jul 27 08:19:37 eldaph22 ns-slapd: [27/Jul/2020:08:19:37.743355076 -0700] - WARN -
NSACLPlugin - acl_parse - The ACL target cn=blah,dc=ubc,dc=ca does not exist
"Fix" was to either create cn=blah,dc=ubc,dc=ca or delete the ACI.
Thank you for the instructions on how to turn on ACI logging. I've stashed them for
when we might need them. 😊
Cheers,
Anthony
-----Original Message-----
From: Mark Reynolds <mreynolds(a)redhat.com>
Sent: July 28, 2020 10:48 AM
To: General discussion list for the 389 Directory server project.
<389-users(a)lists.fedoraproject.org>; Winstanley, Anthony <winstan(a)cs.ubc.ca>
Subject: Re: [389-users] Re: Limitations with large numbers of ACIs?
On 7/28/20 12:30 PM, Winstanley, Anthony wrote:
We're running with 458 ACIs right now (verified the same number
on all nodes), running on RHEL 7 with:
389-admin-1.1.46-1.el7.x86_64
389-admin-console-1.1.12-1.el7.noarch
389-admin-console-doc-1.1.12-1.el7.noarch
389-adminutil-1.1.22-2.el7.x86_64
389-adminutil-devel-1.1.22-2.el7.x86_64
389-console-1.1.19-6.el7.noarch
389-ds-1.2.2-6.el7.noarch
389-ds-base-1.3.10.1-9.el7_8.x86_64
389-ds-base-devel-1.3.10.1-9.el7_8.x86_64
389-ds-base-libs-1.3.10.1-9.el7_8.x86_64
389-ds-base-snmp-1.3.10.1-9.el7_8.x86_64
389-ds-console-1.2.16-1.el7.noarch
389-ds-console-doc-1.2.16-1.el7.noarch
389-dsgw-1.1.11-5.el7.x86_64
nsslapd-aclpb-max-selected-acl is set to 2000. (And I'm sure I set it long ago where
my memory is fuzzy; thanks for the reminder. I'll put a note somewhere I'll find
it for next time...)
So really, we should not be experiencing any ACI-related issue given that 2000 >
458.
I'll note that daemon restarts of the affected nodes cleared up any issues.
I'm feeling better about this now... One final question:
What about ACL failures, where a single ACI fails (say what it targets has been removed).
Is there any chance that the failure of the ACL plugin to load one ACI would affect the
loading of other valid ACIs?
(I took the opportunity to fix that sort of issue reported in the logs by the ACL plugin
and have no idea if that affected anything but the actual failed ACIs themselves. Again, a
restart fixed things, but the restart was after my cleanup...)
First, there is an ACI cache that could be corrupted, which explains why
a restart fixes the issue. Second, you are saying you "cleaned" some
things up. What exactly did you do? And what were the exact errors
messages you saw that led you to the "clean up"?
If the ACI's start failing again then there is a good chance you found a
bug in the ACL cache. What would be interesting to see would be the
error log with ACI logging enabled [1], where you have the output when
it works, and then you have the output once it starts failing. Then we
would have something to compare. With so many aci's this is going to be
tedious, but it might be the only option besides identifying a
reproducible test case.
[1]
https://access.redhat.com/documentation/en-us/red_hat_directory_server/10...
Use ldapmodify and set the level to "128" - this will impact server
performance so only enable it for brief tests, then set it back to "0".
Mark