The abbreviation YARN stands for "Yet Another Resource Negotiator." As the framework for Hadoop's resource management and task scheduling, YARN is a part of the Apache Hadoop ecosystem. The prior MapReduce-specific resource management system was replaced by it in Hadoop 2.x.
YARN is a framework that manages resources and schedules jobs in a Hadoop cluster, providing a flexible and scalable platform for various data processing applications.
Step 1: Download Apache Hadoop
Download the latest distribution from the Hadoop website (http://hadoop.apache.org/). For example, as root do the following:
# cd /root
# wget http://mirrors.ibiblio.org/apache/hadoop/common/hadoop-2.2.0/hadoop-2.2.0.tar.gz
Next create and extract the package in /opt/yarn:
# mkdir –p /opt/yarn
# cd /opt/yarn
# tar xvzf /root/hadoop-2.2.0.tar.gz
Step 2: Set JAVA_HOME
# echo "export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/" > /etc/ profile.d/java.sh
Step 3: Create Users and Groups
It is best to run the various daemons with separate accounts. Three accounts (yarn, hdfs, mapred) in the group hadoop can be created as follows:
# groupadd hadoop
# useradd -g hadoop yarn
# useradd -g hadoop hdfs
# useradd -g hadoop mapred
Step 4: Make Data and Log Directories
# mkdir -p /var/data/hadoop/hdfs/nn
# mkdir -p /var/data/hadoop/hdfs/snn
# mkdir -p /var/data/hadoop/hdfs/dn
# chown hdfs:hadoop /var/data/hadoop/hdfs –R
# mkdir -p /var/log/hadoop/yarn
# chown yarn:hadoop /var/log/hadoop/yarn -R
Next, move to the YARN installation root and create the log directory and set the owner and group as follows:
# cd /opt/yarn/hadoop-2.2.0
# mkdir logs
# chmod g+w logs
# chown yarn:hadoop . -R
Step 5: Configure core-site.xml
From the base of the Hadoop installation path (e.g., /opt/yarn/hadoop-2.2.0), edit the etc/hadoop/core-site.xml file.
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.http.staticuser.user</name>
<value>hdfs</value>
</property>
</configuration>
Step 6: Configure hdfs-site.xml
From the base of the Hadoop installation path, edit the etc/hadoop/hdfs-site.xml file.
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/var/data/hadoop/hdfs/nn</value>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>file:/var/data/hadoop/hdfs/snn</value>
</property>
<property>
<name>fs.checkpoint.edits.dir</name>
<value>file:/var/data/hadoop/hdfs/snn</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/var/data/hadoop/hdfs/dn</value>
</property>
</configuration>
Step 7: Configure mapred-site.xml
From the base of the Hadoop installation, edit the etc/hadoop/mapred-site.xml file.
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Step 8: Configure yarn-site.xml
From the base of the Hadoop installation, edit the etc/hadoop/yarn-site.xml file.
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
Step 9: Modify Java Heap Sizes
The Hadoop installation uses several environment variables that determine the heap sizes for each Hadoop process.
HADOOP_HEAPSIZE="500"
HADOOP_NAMENODE_INIT_HEAPSIZE="500"
HADOOP_JOB_HISTORYSERVER_HEAPSIZE=250
JAVA_HEAP_MAX=-Xmx500m
YARN_HEAPSIZE=500
Step 10: Format HDFS
For the HDFS NameNode to start, it needs to initialize the directory where it will hold its data.
# su - hdfs
$ cd /opt/yarn/hadoop-2.2.0/bin
$ ./hdfs namenode -format
Step 11: Start the HDFS Services
$ jps
15140 SecondaryNameNode
15015 NameNode
15335 Jps
15214 DataNode
The same can be done for the NameNode and SecondaryNameNode.
Step 12: Start YARN Services
# su - yarn
$ cd /opt/yarn/hadoop-2.2.0/sbin
$ ./yarn-daemon.sh start resourcemanager
$ ./yarn-daemon.sh start nodemanager
starting nodemanager, logging to /opt/yarn/hadoop-2.2.0/logs/yarn-yarn-nodemanager-limulus.out