On Thu, Jan 28, 2016 at 8:18 AM, Bolke de Bruin <bdbruin(a)gmail.com> wrote:
As mentioned in another thread one of the Hadoop components (Ranger)
syncs all users and groups (including GIDs) on a regular basis to
provide authorization.
Unfortunately, that is the problem. :-(
Apache Ranger assumes that the back-end database for the passwd/group
services is capable of enumeration. That is true for the "files"
database, but is not guaranteed to be true for other databases.
More simply put: there is no guarantee that getpwent()/getgrent() will
enumerate all users/groups (respectively) known to the passwd/group
services.
At our site, we have a team that uses Hadoop, and they encountered
this issue when we first deployed sssd. Their work-around was to
manually create local passwd/group entries for the users/groups they
wanted to be visible within Hadoop. That worked for them, because
their Hadoop cluster was for only a handful of users, but that
solution isn't going to work for a production Hadoop cluster of any
significant size.
I asked the developers on our Hadoop team to file a bug against Apache
Ranger, but I don't know if they ever did.