Import NOAA data into Apache HBase
Apache HBase provides tools to import data from flat files into
HBase tables. We recently used the tools to load NOAA Station data into
HBase. The HBase table is called “station” with one column family “d”.
NOAA station data is in the input file 201212station.txt. It has 15
fields separated by “|”. The first field, station id, will be used as
row key in HBase. The rest of fields will be added as HBase columns. The
following is the walk-through of the steps.
1. Create an HBase table
1. Create an HBase table
hbase> create table 'station', 'd'2. Create an HDFS folder to hold the temporary data for bulk load
$hdfs dfs –mkdir /user/john/hbase3. Run importtsv to generate temporary data
$hadoop jar /usr/lib/hbase/hbase.jar importtsv '-Dimporttsv.separator=|' -Dimporttsv.bulk.output=/user/john/hbase/tmp -Dimporttsv.columns=HBASE_ROW_KEY,d:c1,d:c2,d:c3,d:c4,d:c5,d:c6,d:c7,d:c8,d:c9,d:c10,d:c11,d:c12,d:c13,d:c14 station /user/john/noaa/201212station.txt4. Change the temporary folder permission
$hdfs dfs -chmod -R +rwx /user/john/hbase5. Run bulk load
$hadoop jar /usr/lib/hbase/hbase.jar completebulkload /user/john/hbase/tmp stationThe station data is now successfully loaded from file to HBase table. This can be confirmed by running an HBase shell command to get the station information with station ID 94994.
hbase> get 'station', '94994'
This entry was posted in Business Intelligence and tagged Big Data, Data Import, HBase, news. Bookmark the permalink.
No comments:
Post a Comment