847-505-9933 | +91 20 66446300 info@datametica.com

Apache Ranger Authorization On HDFS

Author: Arjun More.

Apache Ranger Authorization On HDFS

Authorization, a function of specifying access rights to resources related to information security.
Once the user is successfully authenticated, Authorization shall tell us what any given user can or cannot do inside Hadoop cluster. In HDFS this is primarily governed by file permissions. HDFS file permissions are very similar to BSD file permissions. If you’ve ever run hadoop fs -ls’ in a directory, you will see something like following:

ranger_blog1

Let us see, how to create a directory in HDFS with name ‘/directory’ and give permissions to it using following commands.
hadoop fs -mkdir /directory
hadoop fs -chmod 700 /directory
hadoop fs -chown arjun:hdfs /directory

ranger_blog2
As one can see in the above diagram, the ‘chmod 700/directory’ command has given rwx permissions to “arjun”, it has no permissions given to hdfs group and any other users.

Let’s try to write into /directory (i.e. try to create ‘/directory/dir1′) in hdfs, the username in this case is ‘arjun’. So, it will create /directory/dir1 because user arjun has all the permissions on /directory/.

This will result in an error with the following message.

“Access Denied
mkdir: Permission denied: user=hdfs, access=EXECUTE, inode=”/directory/dir1’“: arjun:hdfs:drwx——”

This happened because there were no permissions granted to any other users apart from ‘arjun’.
ACL: Access Control list are used to provide security on the HDFS.

Drawbacks of above approach.

Using command line for all these activities is time-consuming and involves many manual participations with a lot of redundant commands. It does not provide any centralized view of user permissions. With every command, we get only a specific output related to the specific user. To solve these challenges and overcome above-mentioned drawbacks one can use “Apache Ranger Authorization”.

Introduction to Apache Ranger:

It is a framework to enable, monitor and manage comprehensive data security across the Hadoop. It is “single pane of glass” to an administrator. It provides a centralized security administration, fine grain access control and detailed auditing for user’s access within Hadoop.

How Apache Ranger Policies Work for HDFS

Apache Ranger offers a federated authorization model for HDFS. Ranger plugin for HDFS checks for Ranger policies and if a policy exists, access is granted to the user.
If a policy doesn’t exist in Ranger, then Ranger look for default native permissions model in HDFS (POSIX or HDFS ACL). This federated model is applicable for HDFS and Yarn services in Ranger.

ranger_blog3

Here is how to configure Policy in Ranger.

In Ranger, UI adds admins group to the default policy to give access to root HDFS directory.

  • Ranger -> Access Manager -> HDFS -> (clustername)_hadoop
  • Click Policy ID # 1
  • Under select group; add admins
  • Save

ranger_blog4

Similarly, create another policy for /ranger/audit directory in HDFS where the audits will also be written

ranger_blog5

#run as root
sudo -u hdfs hadoop fs -mkdir /rangerdemo
sudo -u hdfs fs -chmod 700 /rangerdemo

  • One will notice HDFS agent show up in Ranger UI under Audit > Agents. Also, it will be noticed that under Audit > Access tab you can see an audit trail of what user accessed in HDFS at what time with what result.
  • Confirm that HDFS audits are appearing in Ranger: http://(your hostname):6080/index.html#!/reports/audit/bigData
  • Confirm that Audits are appearing in HDFS (if configured with the above steps)
    sudo -u hdfs hadoop fs -ls /ranger/audit/hdfs
  • Confirm that Audits are appearing in Solr (if configured above):

HDFS Audit Exercises in Ranger:
sudo su – ali
hadoop fs -ls /rangerdemo
# this should fail with “Permission denied”

ranger_blog6

Notice the audit report and filter on “SERVICE TYPE”=”HDFS” and “USER”=”ali” to see the how the denied request was logged.

Add policy in Ranger PolicyManager -> hdfs_sandbox -> Add new policy

  • Policy name: /rangerdemo
  • Resource path: /rangerdemo
  • Recursive: True
  • User: ali and give read, write, execute
  • Rights: give read, write, execute
  • Save > OK and wait for 30s

ranger_blog7

  • Now the HDFS access should be given and the user ‘ali’ can access the directory /rangerdemo.
    hadoop fs -ls /rangerdemo
  • By filtering on “SERVICE TYPE”=”HDFS” and “USER”=”ali” a user can see the changes in the policy updates in the audit report.

Attempt to access dir before/after adding group level Ranger HDFS policy
sudo su – hr
hadoop fs -ls /rangerdemo

#This would fail with “Permission denied” message. View the audit page for the new activity

Add ‘hr’ group to existing policy in Ranger:

  • Under Policy Manager tab, click “/rangerdemo” link
  • under the group add “hr” and give read, write, execute
  • Save > OK and wait for the 30s. While you wait you can review the summary of policies under Access Manager -> Reports tab in Ranger

ranger_blog8

  • Now the HDFS access should be given to the group ‘hr’ in which the users belong to the ‘hr’ group will be able to access the directory /rangerdemo.
    hadoop fs -ls /rangerdemo
  • This command will generate an audit report which allows the users of the group ‘hr’ to access /rangerdemo. The report will showcase the users of the ‘hr’ group who have accessed /rangerdemo directories.

Leave a Comment

POST COMMENT Back to Top
*
Contact Us

We're not around right now. But you can send us an email and we'll get back to you, asap.