847-505-9933 | +91 20 66446300 info@datametica.com

Installation of Latest version of Cloudera CDH5.4.0

Installation of Latest version of Cloudera CDH 5.4.0

Cloudera was the first, and is currently, the leading provider and supporter of Apache Hadoop for the enterprise. Cloudera offers software for business critical data challenges including storage, access, management, analysis, security and search.

There are two ways to install CDH(Cloudera’s Hadoop cluster) – first one is fully automated installation using Cloudera Manager and second one is installation of CDH without Cloudera Manager. In this post, I have explained the step by step procedure to install cdp.4.0 without Cloudera manager.

Prerequisites

  • CentOS 6.X

  • jdk1.7.X is needed in order to get CDH working. If you have lower version of jdk, uninstall it and install jdk1.7.X

  • Master machine – master.hadoop.com (192.168.111.130)
    1. Daemons that we are going to install on master are :
      1. Namenode
      2. HistoryServer
  • Slave machine – slave.hadoop.com (192.168.111.131)
    1. Daemons that we are going to install on master are :
      1. Resource Manager (Yarn)
      2. Node-manager
      3. Secondary Namenode
      4. Datanode

 

  • Add both the hostname and ip information to /etc/hosts file on each host

  • Your /etc/hosts file should look like below:

[root@master ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6192.168.111.130 master.hadoop.com
192.168.111.131 slave.hadoop.com[root@slave ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6192.168.111.130 master.hadoop.com
192.168.111.131 slave.hadoop.com
  • Verify that both the hosts are ping’able from each other

  • Stop the firewall and disable the selinux

To stop firewall in centos:

  • service iptables stop && chkconfig iptables off

To disable selinux:

  • vim /etc/selinux/config

  • Once file is opened, verify that “SELINUX=disabled” is set

Installation Methodology:

  1. Date should be in sync

Make sure that master and slave machine’s date is in sync, if not do it by configuring NTP.

  1. Passwordless ssh must be setup from master –> slave

To setup passwordless ssh follow the below procedure:

  1. Generate rsa key pair using ssh-keygen command

    [root@master conf]# ssh-keygen
    Generating public/private rsa key pair.
    Enter file in which to save the key (/root/.ssh/id_rsa):
    /root/.ssh/id_rsa already exists.
    Overwrite (y/n)?
  2. Copy generated public key to slave.hadoop.com
    [root@master conf]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@slave.hadoop.com
    Now try logging into the machine, with “ssh ‘root@slave.hadoop.com'”, and check in:.ssh/authorized_keysto make sure we haven’t added extra keys that you weren’t expecting.
  3. Now try connecting to slave.hadoop.com using ssh

    [root@master conf]# ssh root@slave.hadoop.com
    Last login: Fri Apr 24 14:20:43 2015 from master.hadoop.com
    [root@slave ~]# logout
    Connection to slave.hadoop.com closed.
    [root@master conf]#

That’s it! You have successfully configured passwordless ssh between master and slave node.

  1. Internet connection

    Make sure that you have working internet connection to download CDH packages in the next steps.

  2. Install cdh repo

    1. Download cdh repo rpm

      [root@master ~]# wget http://archive.cloudera.com/cdp/one-click-install/redhat/6/x86_64/cloudera-cdh-5-0.x86_64.rpm
    2. Install cdh repo downloaded in above step

      [root@master ~]# yum –nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm
      Loaded plugins: fastestmirror, refresh-packagekit, security
      Setting up Local Package Process ……….
      Complete!
    3. Do the same steps on slave node

      [root@slave ~]# wget http://archive.cloudera.com/cdp/one-click-install/redhat/6/x86_64/cloudera-cdh-5-0.x86_64.rpm
      [root@slave ~]# yum –nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm
      Loaded plugins: fastestmirror, refresh-packagekit, security
      Setting up Local Package Process
      ……
      Complete!
  1. Install and deploy ZooKeeper

    [root@master ~]# yum -y install zookeeper-server
    Loaded plugins: fastestmirror, refresh-packagekit, security
    Setting up Install Process
    …..
    Complete!
    1. Create zookeeper dir and apply permissions

      [root@master ~]# mkdir -p /var/lib/zookeeper
      [root@master ~]# chown -R zookeeper /var/lib/zookeeper/
    2. Init zookeeper and start the service

      [root@master ~]# service zookeeper-server init
      No myid provided, be sure to specify it in /var/lib/zookeeper/myid if using non-standalone[root@master ~]# service zookeeper-server start
      JMX enabled by default
      Using config: /etc/zookeeper/conf/zoo.cfg
      Starting zookeeper … STARTED
  1. Install namenode on master machine

    yum -y install hadoop-hdfs-namenode
  2. Install secondary namenode on slave machine

    yum -y install hadoop-hdfs-secondarynamenode
  3. Install resource manager on slave machine

    yum -y install hadoop-yarn-resourcemanager
  4. Install nodemanager, datanode & mapreduce on slave node

    yum -y install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce
  5. Install history server and yarn proxyserver on master machine

    yum -y install hadoop-mapreduce-historyserver hadoop-yarn-proxyserver
  6. On both the machine you can install hadoop-client package

    yum -y install hadoop-client

Now we are done with the installation, it’s time to deploy HDFS!

HDFS Deployment:

  1. On each node, execute below commands:

    [root@master ~]# cp -r /etc/hadoop/conf.empty /etc/hadoop/conf.my_cluster
    [root@master ~]# alternatives –install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50
    [root@master ~]# alternatives –set hadoop-conf /etc/hadoop/conf.my_cluster[root@slave ~]# cp -r /etc/hadoop/conf.empty /etc/hadoop/conf.my_cluster
    [root@slave ~]# alternatives –install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50
    [root@slave ~]# alternatives –set hadoop-conf /etc/hadoop/conf.my_cluster
  2. Let’s configure hdfs properties now:

    Goto /etc/hadoop/conf/ dir on master node and edit below property files:

      1. vim /etc/hadoop/conf/core-site.xml

    Add below lines in it under <configuration> tag

    fs.defaultFS
    hdfs://master.hadoop.com:8020
    1. vim /etc/hadoop/conf/hdfs-site.xml

      dfs.permissions.superusergroup
      hadoop
      dfs.namenode.name.dir
      file:///data/1/dfs/nn,file:///nfsmount/dfs/nn
      dfs.datanode.data.dir
      file:///data/1/dfs/dn,file:///data/2/dfs/dn,file:///data/3/dfs/dn,file:///data/4/dfs/dn
      dfs.namenode.http-address
      192.168.111.130:50070The address and the base port on which the dfs NameNode Web UI will listen.
  1. scp core-site.xml and hdfs-site.xml to slave.hadoop.com at /etc/hadoop/conf/

    [root@master conf]# scp core-site.xml hdfs-site.xml slave.hadoop.com:/etc/hadoop/conf/
    core-site.xml 100% 1001 1.0KB/s 00:00
    hdfs-site.xml 100% 1669 1.6KB/s 00:00
    [root@master conf]#
  2. Create local directories
    On master host run below commands:

    mkdir -p /data/1/dfs/nn /nfsmount/dfs/nn
    chown -R hdfs:hdfs /data/1/dfs/nn /nfsmount/dfs/nn
    chmod 700 /data/1/dfs/nn /nfsmount/dfs/nn
    chmod go-rx /data/1/dfs/nn /nfsmount/dfs/nn

    On slave host run below commands:

    mkdir -p /data/1/dfs/dn /data/2/dfs/dn /data/3/dfs/dn /data/4/dfs/dn
    chown -R hdfs:hdfs /data/1/dfs/dn /data/2/dfs/dn /data/3/dfs/dn /data/4/dfs/dn
  3. Format the namenode:

    sudo -u hdfs hdfs namenode -format
  4. Start hdfs services
    Run below commands on master and slave

    for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do service $x start ; done
  5. Create hdfs tmp dir
    Run on any of the hadoop node

    [root@slave ~]# sudo -u hdfs hadoop fs -mkdir /tmp
    [root@slave ~]# sudo -u hdfs hadoop fs -chmod -R 1777 /tmp

    Congratulations! You have deployed HDFS successfully

Yarn Deployment:

  1. Prepare yarn configuration properties

    Replace your /etc/hadoop/conf/mapred-site.xml with below contents on master host

    [root@master conf]# cat mapred-site.xml

    mapreduce.framework.name
    yarn
    yarn.app.mapreduce.am.staging-dir
    /user
  2. Replace your /etc/hadoop/conf/yarn-site.xml with below contents on master host

    [root@master conf]# cat yarn-site.xml


    yarn.nodemanager.aux-services
    mapreduce_shuffleyarn.nodemanager.aux-services.mapreduce_shuffle.class
    org.apache.hadoop.mapred.ShuffleHandleryarn.log-aggregation-enable
    trueList of directories to store localized files in.
    yarn.nodemanager.local-dirs
    file:///var/lib/hadoop-yarn/cache/${user.name}/nm-local-dirWhere to store container logs.
    yarn.nodemanager.log-dirs
    file:///var/log/hadoop-yarn/containersWhere to aggregate logs to.
    yarn.nodemanager.remote-app-log-dir
    hdfs://master.hadoop.com:8020/var/log/hadoop-yarn/appsClasspath for typical applications.
    yarn.application.classpath$HADOOP_CONF_DIR,
    $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
    $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
    $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
    $HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*yarn.resourcemanager.hostname
    slave.hadoop.com
    yarn.nodemanager.local-dirs
    file:///data/1/yarn/local,file:///data/2/yarn/local,file:///data/3/yarn/local
    yarn.nodemanager.log-dirs
    file:///data/1/yarn/logs,file:///data/2/yarn/logs,file:///data/3/yarn
    /logs
  3. Copy modified files to slave machine

    [root@master conf]# scp mapred-site.xml yarn-site.xml slave.hadoop.com:/etc/hadoop/conf/
    mapred-site.xml 100% 1086 1.1KB/s 00:00
    yarn-site.xml 100% 2787 2.7KB/s 00:00
    [root@master conf]#
  4. Configure local directories for yarn

    To be done on yarn machine i.e. slave.hadoop.com in our case

    [root@slave ~]# mkdir -p /data/1/yarn/local /data/2/yarn/local /data/3/yarn/local /data/4/yarn/local
    [root@slave ~]# mkdir -p /data/1/yarn/logs /data/2/yarn/logs /data/3/yarn/logs /data/4/yarn/logs
    [root@slave ~]# chown -R yarn:yarn /data/1/yarn/local /data/2/yarn/local /data/3/yarn/local /data/4/yarn/local
    [root@slave ~]# chown -R yarn:yarn /data/1/yarn/logs /data/2/yarn/logs /data/3/yarn/logs /data/4/yarn/logs
  5. Configure the history server

    Add below properties in mapred-site.xml

    mapreduce.jobhistory.address
    master.hadoop.com:10020
    mapreduce.jobhistory.webapp.address
    master.hadoop.com:19888
  6. Configure proxy settings for history server

    Add below properties in /etc/hadoop/conf/core-site.xml

    hadoop.proxyuser.mapred.groups
    *
    hadoop.proxyuser.mapred.hosts
    *
  7. Configure modified files to slave.hadoop.com

    [root@master conf]# scp mapred-site.xml core-site.xml slave.hadoop.com:/etc/hadoop/conf/
    mapred-site.xml 100% 1299 1.3KB/s 00:00 core-site.xml 100% 1174 1.2KB/s 00:00 [root@master conf]#
  8. Create history directories and set permissions

    [root@master conf]# sudo -u hdfs hadoop fs -mkdir -p /user/history
    [root@master conf]# sudo -u hdfs hadoop fs -chmod -R 1777 /user/history
    [root@master conf]# sudo -u hdfs hadoop fs -chown mapred:hadoop /user/history
  9. Create log directories and set permissions

    [root@master conf]# sudo -u hdfs hadoop fs -mkdir -p /var/log/hadoop-yarn
    [root@master conf]# sudo -u hdfs hadoop fs -chown yarn:mapred /var/log/hadoop-yarn
  10. Verify hdfs file structure

    [root@master conf]# sudo -u hdfs hadoop fs -ls -R /
    drwxrwxrwt – hdfs hadoop 0 2015-04-25 01:16 /tmp
    drwxr-xr-x – hdfs hadoop 0 2015-04-25 02:52 /user
    drwxrwxrwt – mapred hadoop 0 2015-04-25 02:52 /user/history
    drwxr-xr-x – hdfs hadoop 0 2015-04-25 02:53 /var
    drwxr-xr-x – hdfs hadoop 0 2015-04-25 02:53 /var/log
    drwxr-xr-x – yarn mapred 0 2015-04-25 02:53 /var/log/hadoop-yarn
    [root@master conf]#
  11. Start yarn and Jobhistory server

    On slave.hadoop.com

    [root@slave ~]# sudo service hadoop-yarn-resourcemanager start
    starting resourcemanager, logging to /var/log/hadoop-yarn/yarn-yarn-resourcemanager-slave.hadoop.com.out
    Started Hadoop resourcemanager: [ OK ] [root@slave ~]#[root@slave ~]# sudo service hadoop-yarn-nodemanager start
    starting nodemanager, logging to /var/log/hadoop-yarn/yarn-yarn-nodemanager-slave.hadoop.com.out
    Started Hadoop nodemanager: [ OK ] [root@slave ~]#

    On master.hadoop.com

    [root@master conf]# sudo service hadoop-mapreduce-historyserver start
    starting historyserver, logging to /var/log/hadoop-mapreduce/mapred-mapred-historyserver-master.hadoop.com.out
    15/04/25 02:56:01 INFO hs.JobHistoryServer: STARTUP_MSG:
    /************************************************************
    STARTUP_MSG: Starting JobHistoryServer
    STARTUP_MSG: host = master.hadoop.com/192.168.111.130
    STARTUP_MSG: args = [] STARTUP_MSG: version = 2.6.0-cdp.4.0
    STARTUP_MSG: classpath =
    STARTUP_MSG: build = http://github.com/cloudera/hadoop -r c788a14a5de9ecd968d1e2666e8765c5f018c271; compiled by ‘jenkins’ on 2015-04-21T19:18Z
    STARTUP_MSG: java = 1.7.0_79



    ************************************************************/
    Started Hadoop historyserver: [ OK ] [root@master conf]#
  12. Create user for running mapreduce jobs

    [root@master conf]# sudo -u hdfs hadoop fs -mkdir /user/kuldeep
    [root@master conf]# sudo -u hdfs hadoop fs -chown kuldeep /user/kuldeep

Important Note: Don’t forget to set core hadoop services to auto start when OS boot ups

On master.hadoop.com

[root@master conf]# sudo chkconfig hadoop-hdfs-namenode on
[root@master conf]# sudo chkconfig hadoop-mapreduce-historyserver on

On slave.hadoop.com

[root@slave ~]# sudo chkconfig hadoop-yarn-resourcemanager on
[root@slave ~]# sudo chkconfig hadoop-hdfs-secondarynamenode on
[root@slave ~]# sudo chkconfig hadoop-yarn-nodemanager on
[root@slave ~]# sudo chkconfig hadoop-hdfs-datanode on

Final step: Check UIs

  • Namenode UI

namenode

  • Datanode UI

datanode

  • Secondary Namenode UI

sec_nn

  • Yarn UI

    yarn

Kuldeep Kulkarni

Sr. Engineer

Big Data Platform

DataMetica

Leave a Comment

POST COMMENT Back to Top
*
Contact Us

We're not around right now. But you can send us an email and we'll get back to you, asap.