Op 28 jan. 2016, om 20:42 heeft James Ralston ralston@pobox.com het volgende geschreven:
On Thu, Jan 28, 2016 at 8:18 AM, Bolke de Bruin bdbruin@gmail.com wrote:
As mentioned in another thread one of the Hadoop components (Ranger) syncs all users and groups (including GIDs) on a regular basis to provide authorization.
Unfortunately, that is the problem. :-(
Apache Ranger assumes that the back-end database for the passwd/group services is capable of enumeration. That is true for the "files" database, but is not guaranteed to be true for other databases.
More simply put: there is no guarantee that getpwent()/getgrent() will enumerate all users/groups (respectively) known to the passwd/group services.
At our site, we have a team that uses Hadoop, and they encountered this issue when we first deployed sssd. Their work-around was to manually create local passwd/group entries for the users/groups they wanted to be visible within Hadoop. That worked for them, because their Hadoop cluster was for only a handful of users, but that solution isn't going to work for a production Hadoop cluster of any significant size.
I asked the developers on our Hadoop team to file a bug against Apache Ranger, but I don't know if they ever did.
Ranger is actually even worse. It currently uses /etc/passwd and /etc/group directly - so no nss. I have a patch in the works that addresses this by using getent instead. Moreover, I am adding some config parameters that allow to sync/enumerate specific groups that ranger otherwise doesn’t see. It might help your guys in the future.
Still I think Ranger is a load of crap though, enumerating all users with over 50.000 in our corp directory that is not fun. I just try to make it a little bit more manageable.