On Thu, Jan 28, 2016 at 8:18 AM, Bolke de Bruin bdbruin@gmail.com wrote:
As mentioned in another thread one of the Hadoop components (Ranger) syncs all users and groups (including GIDs) on a regular basis to provide authorization.
Unfortunately, that is the problem. :-(
Apache Ranger assumes that the back-end database for the passwd/group services is capable of enumeration. That is true for the "files" database, but is not guaranteed to be true for other databases.
More simply put: there is no guarantee that getpwent()/getgrent() will enumerate all users/groups (respectively) known to the passwd/group services.
At our site, we have a team that uses Hadoop, and they encountered this issue when we first deployed sssd. Their work-around was to manually create local passwd/group entries for the users/groups they wanted to be visible within Hadoop. That worked for them, because their Hadoop cluster was for only a handful of users, but that solution isn't going to work for a production Hadoop cluster of any significant size.
I asked the developers on our Hadoop team to file a bug against Apache Ranger, but I don't know if they ever did.