This tutorial is a sequel of of Matteo Lissandrini's "Installing HDFS and Hadoop 2.X on a Multi-node cluster with Ubuntu 14.0.
That guide can also be used to install Hadoop 1.x (with minor if none modification); in this work we will assume that you have followed that tutorial and have installed Hadoop 1.x and HDFS.
Even thought HBase 0.94.x can run against both Hadoop 1.x and 2.x versions (see HBase 0.94 book) we highly recommend to use Hadoop 1.x for HBase 0.x and Hadoop 2.x for HBase 1.x and 2.x.
We wish also to inform you that also this tutorial can be applied to HBase 1.x and 2.x (with minor if none modification).
The following steps will be needed only once. Download HBase 0.94.X stable, to do so navigate in the List of Mirrors select one and decide which version to download. For the sake of simplicity from now on we will assume tho have chosen version 0.94.27
.
For example wget
can be used:
# from eu wget https://www.eu.apache.org/dist/hbase/hbase-0.94.27/hbase-0.94.27.tar.gz # from us wget https://www.us.apache.org/dist/hbase/hbase-0.94.27/hbase-0.94.27.tar.gz
Then extract the tar
to the final installation directory, fix also permission and create a version agnostic symlink.
In this tutorial we will use the standard /usr/local/
as installation directory but obviously you are free to chose the one you prefer.
# extract & copy sudo tar -zxf hbase-0.94.27.tar.gz -C /usr/local/ # fix permission sudo chown -R hduser:hadoop /usr/local/hbase-hbase-0.94.27/ # create symlink sudo ln -s /usr/local/hbase-0.94.27/ /usr/local/hbase
Now write the main HBase configuration file.
In the template that follow, you will find few variables that you have to substitute with the corresponding one from you setup.
$HADOOP_MASTER = IP address of the Hadoop master node (Namenode) $HBASE_MASTER = IP adress of the HBase master node (HMaster) $ZOOKEEPER_QUORUM = Comma separeted list of IP that belong to the zookeeper quorum. # FQDN can be used instead of IP
This template was created following the official HBase 0.94 book and after few web articles and posts... More configuration options are available, an exaustive list can be found on the official book.
<configuration> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.rootdir</name> <value>hdfs://$HADOOP_MASTER:8020/hbase</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>$ZOOKEEPER_QUORUM</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/usr/local/zookeeper</value> </property> <property> <name>hbase.master</name> <value>$HBASE_MASTER:60000</value> <description>The host and port that the HBase master runs at.</description> </property> <property> <name>hbase.master.port</name> <value>16000</value> </property> <property> <name>hbase.master.info.port</name> <value>16010</value> <description>web ui port</description> </property> <property> <name>hbase.regionserver.port</name> <value>16099</value> </property> <property> <name>hbase.regionserver.info.port</name> <value>16030</value> </property> <!-- http://hbase-perf-optimization.blogspot.it/2013/03/hbase-configuration-optimization.html --> <property> <name>hbase.regionserver.lease.period</name> <value>1200000</value> </property> <property> <name>hbase.rpc.timeout</name> <value>1200000</value> </property> <property> <name>zookeeper.session.timeout</name> <value>20000</value> </property> <property> <name>hbase.regionserver.handler.count</name> <value>50</value> </property> <property> <name>hbase.zookeeper.property.maxClientCnxns</name> <value>1000</value> </property> <property> <name>hbase.client.scanner.caching</name> <value>100</value> </property> <property> <name>hbase.hregion.max.filesize</name> <value>10737418240</value> </property> <property> <name>hbase.hregion.majorcompaction</name> <value>0</value> </property> <property> <name>hbase.hregion.memstore.flush.size</name> <value>134217728</value> </property> <property> <name>hbase.hregion.memstore.block.multiplier</name> <value>4</value> </property> <property> <name>hbase.hstore.blockingStoreFiles</name> <value>30</value> </property> </configuration>
One last step: tell HBase which Java installation has to be used.
To do that modify the configuration file conf/hbase-env.sh
to make the row specifying the JAVA_HOME
# to look like that export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::") # one line sed command: sed -i.bak 's@.*export JAVA_HOME=.*@export JAVA_HOME=\$(readlink -f /usr/bin/java | sed "s:bin/java::")@g' $HADOOP_INSTALL/etc/hadoop/hadoop-env.sh
Bonus: in order to use HBase coprocessor framework the coprocessor classes have to be loaded into HBase;
this could be done either via shell or via configuration file.
Going for the second option, edit also HBASE_CLASSPATH
to include the compiled jar.
# example export HBASE_CLASSPATH=/srv/myapp/MyCoprocessors.jar
Finally configure and initialize the other cluster nodes.
List the machines that will act as region server in conf/regionservers
,
one address per line line.
If needed update /etc/hosts
according to Hadoop tutorial hints.
Once done, propagate the setup throw the cluster:
#!/bin/bash # Build configured HBase tar. mkdir -p /tmp/distr/ tar -czf /tmp/distr/hbase.tgz /usr/local/hbase-0.94.27 # Distribute to each region node while IFS='' read -r node_ip; do scp /etc/hosts hduser@$node_ip:~/ scp ~/.profile ~/.vimrc hduser@$node_ip:~/ scp hbase.tgz hduser@$node_ip:~/ ssh -o StrictHostKeyChecking=no -tt hduser@$node_ip <<EOF sudo mv $HOME/hosts /etc/ # Install & link & fix permission sudo tar -zxf $HOME/hbase.tgz -C / sudo ln -s /usr/local/hbase-0.94.27 /usr/local/hbase sudo chown -R hduser:hadoop /usr/local/hbase* # Create zookeeper directory (even if not needed) sudo mkdir -p /usr/local/zookeeper # Fix permission sudo chown -R hduser:hadoop /usr/local/zookeeper # Raise the limit for max opened files (DB srv) sudo sysctl -w fs.file-max=100000 # Required due to -tt option exit EOF done < /usr/local/hbase/conf/regionservers
That's the end of the journey: enjoy your new HBase cluster!
Start it running start-hbase.sh