HBase has a native export/import feature which is actually two MR jobs. These jobs are located in the HBase codebase (contained in hbase-0.xx.x.jar under Import and Export), and so, as a normal MR job, you simple need to configure the Hadoop classpath to be aware of the jars that these MR jobs need, and then run them.
The easiest way to alter Hadoop’s classpath is to configure hadoop-env.sh. By default, hadoop-env.sh has a commented-out line near the top, that looks like:
# Extra Java CLASSPATH elements. Optional.
#export HADOOP_CLASSPATH=
Using this information, I updated hadoop-env.sh to:
export HBASE_HOME=/path/to/apache-hbase export HADOOP_CLASSPATH=$HBASE_HOME/hbase-0.20.3.jar:$HBASE_HOME:$HBASE_HOME/lib/zookeeper-3.2.2.jar:$HBASE_HOME/conf
Note: once you update hadoop-env.sh, you will need to restart tasktracker and jobtracker, which is most easily done by running stop-mapred.sh then start-mapred.sh located in /path/to/apache-hadoop/bin.
Note: don’t try to set HBASE_HOME or HADOOP_CLASSPATH in your local terminal profile if this is the first time trying this–stick to configuring them in hadoop-env.sh. The reason is that hadoop runs jobs by creating an SSH terminal, and depending on your system’s settings, your local terminal settings may or may not take effect in this hadoop-created SSH terminal.
Ok, now you are ready to export your table. To see all of the export options, you can run:
bin/hadoop jar /path/to/hbase-0.20.3.jar export
Usage: Export <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]]
To export the current version of the entire table, just supply tablename and outputdir. Note that outputdir will actually export to HDFS, not your local filesystem. In my case, the hbase cluster that I was exporting the table from had no direct IP access to my destination hbase cluster, so I had to get the files out of hdfs and into a local directory so that I could move them manually.
So, let’s export:
bin/hadoop jar /path/to/hbase-0.20.3.jar export your_table /export/your_table
Our hbase table is in HDFS under /export/your_table. Let’s get the entire table into your local filesystem (hopefully your hbase table isn’t huge!), by running:
bin/hadoop -copyToLocal /export/your_table /somewhere/local
I then scp -r /somewhere/local to my local dev station (where my other hbase ‘cluster’ is). With the file on my dev system after the copy, we need to get the file back into HDFS, and then run the import MR job.
Copy from your system’s harddrive to HDFS:
bin/hadoop -copyFromLocal /somewhere/local /import/your_table
bin/hadoop jar /path/to/hbase-0.20.3.jar import your_table /import/your_table
Note: If you were copying large amounts of data between clusters, SCP copying probably isn’t a real good solution. Hadoop has a feature called distcp, which can be used to leverage the power of the two clusters to copy large amounts of data. You can read more about that here.
Finally, I want to give a huge amount of thanks to jdcryans, who literally bootstrapped me through the entire process. Without him and his constant help in IRC, I don’t know what I’d do (and many others, I’m sure).