Installing HBase 0.94.x

on a Multi-node cluster with Ubuntu 14.04

Martin Brugnara martin.brugnara@gmail.com
Sabeur Aridhi sabeur.aridhi@gmail.com

Introduction

This tutorial is a sequel of of Matteo Lissandrini's "Installing HDFS and Hadoop 2.X on a Multi-node cluster with Ubuntu 14.0.

That guide can also be used to install Hadoop 1.x (with minor if none modification); in this work we will assume that you have followed that tutorial and have installed Hadoop 1.x and HDFS.

Even thought HBase 0.94.x can run against both Hadoop 1.x and 2.x versions (see HBase 0.94 book) we highly recommend to use Hadoop 1.x for HBase 0.x and Hadoop 2.x for HBase 1.x and 2.x.

We wish also to inform you that also this tutorial can be applied to HBase 1.x and 2.x (with minor if none modification).

Installing

The following steps will be needed only once. Download HBase 0.94.X stable, to do so navigate in the List of Mirrors select one and decide which version to download. For the sake of simplicity from now on we will assume tho have chosen version 0.94.27.
For example wget can be used:

# from eu
wget https://www.eu.apache.org/dist/hbase/hbase-0.94.27/hbase-0.94.27.tar.gz
# from us
wget https://www.us.apache.org/dist/hbase/hbase-0.94.27/hbase-0.94.27.tar.gz

Then extract the tar to the final installation directory, fix also permission and create a version agnostic symlink.
In this tutorial we will use the standard /usr/local/ as installation directory but obviously you are free to chose the one you prefer.

# extract & copy
sudo tar -zxf hbase-0.94.27.tar.gz -C /usr/local/
# fix permission
sudo chown -R hduser:hadoop /usr/local/hbase-hbase-0.94.27/
# create symlink
sudo ln -s /usr/local/hbase-0.94.27/ /usr/local/hbase

Configuration

Now write the main HBase configuration file.
In the template that follow, you will find few variables that you have to substitute with the corresponding one from you setup.

$HADOOP_MASTER = IP address of the Hadoop master node (Namenode)
$HBASE_MASTER = IP adress of the HBase master node (HMaster)
$ZOOKEEPER_QUORUM = Comma separeted list of IP that belong to the zookeeper quorum.

# FQDN can be used instead of IP
    

This template was created following the official HBase 0.94 book and after few web articles and posts... More configuration options are available, an exaustive list can be found on the official book.

<configuration>
    <property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
    </property>
    <property>
        <name>hbase.rootdir</name>
        <value>hdfs://$HADOOP_MASTER:8020/hbase</value>
    </property>
    <property>
        <name>hbase.zookeeper.quorum</name>
        <value>$ZOOKEEPER_QUORUM</value>
    </property>
    <property>
        <name>hbase.zookeeper.property.dataDir</name>
        <value>/usr/local/zookeeper</value>
    </property>
    <property>
        <name>hbase.master</name>
        <value>$HBASE_MASTER:60000</value>
        <description>The host and port that the HBase master runs at.</description>
    </property>
    <property>
        <name>hbase.master.port</name>
        <value>16000</value>
    </property>
    <property>
        <name>hbase.master.info.port</name>
        <value>16010</value>
        <description>web ui port</description>
    </property>
    <property>
        <name>hbase.regionserver.port</name>
        <value>16099</value>
    </property>
    <property>
        <name>hbase.regionserver.info.port</name>
        <value>16030</value>
    </property>

<!-- http://hbase-perf-optimization.blogspot.it/2013/03/hbase-configuration-optimization.html -->
    <property>
        <name>hbase.regionserver.lease.period</name>
        <value>1200000</value>
    </property>
    <property>
        <name>hbase.rpc.timeout</name>
        <value>1200000</value>
    </property>
    <property>
        <name>zookeeper.session.timeout</name>
        <value>20000</value>
    </property>
    <property>
        <name>hbase.regionserver.handler.count</name>
        <value>50</value>
    </property>
    <property>
        <name>hbase.zookeeper.property.maxClientCnxns</name>
        <value>1000</value>
    </property>
    <property>
        <name>hbase.client.scanner.caching</name>
        <value>100</value>
    </property>
    <property>
        <name>hbase.hregion.max.filesize</name>
        <value>10737418240</value>
    </property>
    <property>
        <name>hbase.hregion.majorcompaction</name>
        <value>0</value>
    </property>
    <property>
        <name>hbase.hregion.memstore.flush.size</name>
        <value>134217728</value>
    </property>
    <property>
        <name>hbase.hregion.memstore.block.multiplier</name>
        <value>4</value>
    </property>
    <property>
        <name>hbase.hstore.blockingStoreFiles</name>
        <value>30</value>
    </property>
</configuration>

One last step: tell HBase which Java installation has to be used. To do that modify the configuration file conf/hbase-env.sh to make the row specifying the JAVA_HOME

# to look like that
export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")

# one line sed command:
sed -i.bak 's@.*export JAVA_HOME=.*@export JAVA_HOME=\$(readlink -f /usr/bin/java | sed "s:bin/java::")@g' $HADOOP_INSTALL/etc/hadoop/hadoop-env.sh

Bonus: in order to use HBase coprocessor framework the coprocessor classes have to be loaded into HBase; this could be done either via shell or via configuration file. Going for the second option, edit also HBASE_CLASSPATH to include the compiled jar.

# example
export HBASE_CLASSPATH=/srv/myapp/MyCoprocessors.jar

Nodes Setup

Finally configure and initialize the other cluster nodes. List the machines that will act as region server in conf/regionservers, one address per line line.

If needed update /etc/hosts according to Hadoop tutorial hints.

Once done, propagate the setup throw the cluster:

 #!/bin/bash

 # Build configured HBase tar.
 mkdir -p /tmp/distr/
 tar -czf /tmp/distr/hbase.tgz /usr/local/hbase-0.94.27

 # Distribute to each region node
 while IFS='' read -r node_ip; do
     scp /etc/hosts hduser@$node_ip:~/
     scp ~/.profile ~/.vimrc hduser@$node_ip:~/

    scp hbase.tgz hduser@$node_ip:~/

    ssh -o StrictHostKeyChecking=no -tt hduser@$node_ip <<EOF
 sudo mv $HOME/hosts /etc/

 # Install & link & fix permission
 sudo tar -zxf $HOME/hbase.tgz -C /
 sudo ln -s /usr/local/hbase-0.94.27 /usr/local/hbase
 sudo chown -R hduser:hadoop /usr/local/hbase*

 # Create zookeeper directory (even if not needed)
 sudo mkdir -p /usr/local/zookeeper
 # Fix permission
 sudo chown -R hduser:hadoop /usr/local/zookeeper

 # Raise the limit for max opened files (DB srv)
 sudo sysctl -w fs.file-max=100000

 # Required due to -tt option
 exit
 EOF
 done < /usr/local/hbase/conf/regionservers

Start

That's the end of the journey: enjoy your new HBase cluster!

Start it running start-hbase.sh