I want to start a brief discussion about a major problem we have backend
transaction plugins and the entry caches. I'm finding that when we get
into a nested state of be txn plugins and one of the later plugins that
is called fails then while we don't commit the disk changes (they are
aborted/rolled back) we DO keep the entry cache changes!
For example, a modrdn operation triggers the referential integrity
plugin which renames the member attribute in some group and changes that
group's entry cache entry, but then later on the memberOf plugin fails
for some reason. The database transaction is aborted, but the entry
cache changes that RI plugin did are still present :-( I have also
found other entry cache issues with modrdn and BE TXN plugins, and we
know of other currently non-reproducible entry cache crashes as well
related to mishandling of cache entries after failed operations.
It's time to rework how we use the entry cache. We basically need a
transaction style caching mechanism - we should not commit any entry
cache changes until the original operation is fully successful.
Unfortunately the way the entry cache is currently designed and used it
will be a major change to try to change it.
William wrote up this doc:
But this also does not currently cover the nested plugin scenario either
(not yet). I do know how how difficult it would be to implement
William's proposal, or how difficult it would be to incorporate the txn
style caching into his design. What kind of time frame could this even
be implemented in? William what are your thoughts?
If William's design is too huge of a change that will take too long to
safely implement then perhaps we need to look into revising the existing
cache design where we use "cache_add_tentative" style functions and only
apply them at the end of the op. This is also not a trivial change.
And what impact would changing the entry cache have on Ludwig's plugable
Anyway we need to start thinking about redesigning the entry cache - no
matter what approach we want to take. If anyone has any ideas or
comments please share them, but I think due to the severity of this flaw
redesigning the entry cache should be one of our next major goals in DS
We have recently implemented Filter and Anonymous to lib389 . But it seems
like Filter does not work with Anonymous connection .
It actually does not work with any kind of connection whether ACI allow or
not rather than root .
My suspense is it is related to this issue which is not yet fixed:
Please check attached test case .
> Date: Tue, 26 Feb 2019 17:21:50 -0700
> From: Rich Megginson <rmeggins(a)redhat.com>
> Message-ID: <d40bde83-1e88-b34f-9b5d-d2b320468f14(a)redhat.com>
> On 2/26/19 4:26 PM, William Brown wrote:
>>> I think the recursive/nested transaction on the database level are not the problem, we do this correctly already, either all or no change becomes persistent.
>>> What we do not manage is modifications we do in parallel on the in memory structure like the entry cache, changes to the EC are not managed by any txn and I do not see how any of the database txn models would help, they do not know about ec and can abort changes.
>>> We would need to incorporate the EC into a generic txn model, or have a way to flag ec entries as garbage for if a txn is aborted
>> The issue is we allow parallel writes, which breaks the consistency guarantees of the EC anyway. LMDB won’t allow parallel writes (it’s single write - concurrent parallel readers), and most other modern kv stores take this approach too, so we should be remodelling our transactions to match this IMO. It will make the process of how we reason about the EC much much simpler I think.
> Some sort of in-memory data structure with fast lookup and transactional semantics (modify operations are stored as mvcc/cow so each read of the database with a given txn handle sees its own
> view of the ec, a txn commit updates the parent txn ec view, or the global ec view if no parent, from the copy, a txn abort deletes the txn's copy of the ec) is needed. A quick google search
> turns up several hits. I'm not sure if the B+Tree proposed at http://www.port389.org/docs/389ds/design/cache_redesign.html has transactional semantics, or if such code could be added to its
> With LMDB, if we could make the on-disk entry representation the same as the in-memory entry representation, then we could use LMDB as the entry cache too - the database would be the entry
> cache as well.
Exactly. This was the original design goal for back-mdb and LMDB in OpenLDAP.
Note that the back-mdb in OpenLDAP 2.4 is a compromise from this original design; we still
have a slight deserialization pass when reading entries from the DB. But it's much simpler
and faster than what we used to do with back-bdb/hdb.
Ultimately - if your local persistence layer is so slow that it needs an in-memory cache,
that local persistence layer is broken. This conclusion is inescapable, after many years of
working with BerkeleyDB.
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
I did a merge for a PR, and now everything on pagure seems to have locked up?
I don’t know what’s the next step to investigate or resolve this issue, but it looks like most git-related web UI actions are frozen for now.
Software Engineer, 389 Directory Server