On Mon, 2011-10-03 at 10:08 -0400, Stephen Gallagher wrote:
On Mon, 2011-10-03 at 14:33 +0200, Jan Zelený wrote:
> > On Thu, 2011-09-29 at 16:56 +0200, Jan Zelený wrote:
> > > > Jan Zelený <jzeleny(a)redhat.com> writes:
> > > > >> > As you described it, it looks like issue similar to
what Stephen
> > > > >> > found earlier and is now fixed. It probably isn't
caused by the
> > > > >> > size of groups but rather membership structure. Could
you be more
> > > > >> > specific how the structure is organized? Do you use
rfc2307 or
> > > > >> > rfc2307bis? If the latter is the case, how deep your
membership
> > > > >> > structure goes and how does it approximately look
like?
> > > > >>
> > > > >> This is rfc2307.
> > > > >
> > > > > There must be something else going on. I tested 2307 on a group
with
> > > > > ~1000 users and I found no problem.
> > > > >
> > > > > Could you provide me a backtrace of the place where the process
> > > > > hangs? Any other information you might have will be useful as
well.
> > > >
> > > > I don't have a backtrace, since I compiled this as an rpm.
However, an
> > > > strace of sssd_be hopefully holds a clue or two:
> > > > PS. I forgot to mention that I'm using sssd 1.5.12 + memberof
patch on
> > > > rhel6 x86_64, since it required the least amount of effort to
compile
> > > > and install.
> > > >
> > > > Regards,
> > >
> > > I managed to reproduce this issue, however I'm not sure it was related
to
> > > the plugin itself. What seemed to resolve it was update to latest
> > > packages in F15 and restart. However I did a small memory optimization,
> > > so I'm sending the corrected patch. Please let me know if everything
> > > works for you now. In not, try attaching gdb to sssd_be process *after*
> > > it freezes and send me a backtrace.
> >
> > Still seeing issues with this latest round of patches (by the way,
> > please always send all patches in a series when updating, it makes it
> > easier to keep track of them).
> >
>
> Sorry about that. I meant to send all of them, but I somehow forgot.
>
> > Please note that the issue is that we're calling talloc_free(msg) on msg
> > at a time when msg = 0x1. (This can only happen in two error conditions,
> > 1) member_name = ldb_msg_find_attr_as_string(member, DB_NAME, NULL);
> > has returned NULL for member_name, or
> >
> > 2) entry_is_user_object() returned something other than LDB_SUCCESS or
> > LDB_ERR_NO_SUCH_ATTRIBUTE
> > So there are two bugs here. 1) We need to initialize msg to NULL. 2) We
> > need to figure out which of the above two failures is happening, and
> > why.
>
> I'm sending corrected set of patches, both issues have been addressed. However
> I was unable to test them properly, I'm seeing a strange error on my machine
> which simply can't happen. I suspect that something is wrong with my machine,
> but I'm unable to troubleshoot it. Please let me know it these patches work
> for you.
>
> Thanks
> Jan
Testing is ongoing, but something else occurred to me. We need to handle
upgrades to this new approach properly. Please add a new
sysdb_upgrade_NN() function to src/db/sysdb.c and bump the internal
database version. When we upgrade to this version of the DB, we need to
ensure that we run a full recompute across the whole sysdb so that all
of the memberships are updated using the new mechanism.
Another question: why are you using hash_create() directly instead of
sss_hash_create()? The latter has the added bonus of being able to
maintain memory with talloc (so talloc_free(table) will work cleanly).