Step-by-Step Guide: Hive Integration with HBase

9/11/2025
All Articles

Step-by-Step Guide to Hive Integration with HBase

Step-by-Step Guide: Hive Integration with HBase

Step-by-Step Guide: Hive Integration with HBase

Hive integration with HBase allows users to query and manipulate data stored in HBase tables using Hive's SQL-like interface. This integration bridges the gap between Hive's analytical capabilities and HBase's real-time, low-latency data access.


Step 1: Why Integrate Hive with HBase?

  • HBase is a NoSQL database for real-time read/write of large datasets.

  • Hive provides an SQL-like interface for querying large datasets.

  • Integration allows you to query HBase data using HiveQL.


Step 2: Prerequisites

  • Apache Hive and Apache HBase installed

  • Hadoop and HDFS configured

  • HBase and Hive services running

  • Hive should have hive-hbase-handler JAR in its classpath


Step 3: Configure Hive to Connect with HBase

  • Add the following properties to hive-site.xml:

<property>
  <name>hive.hbase.table.default.storage.type</name>
  <value>hbase</value>
</property>
<property>
  <name>hive.zookeeper.quorum</name>
  <value>localhost</value>
</property>
  • Ensure hbase-site.xml is placed in Hive’s conf directory.


Step 4: Create an HBase Table

Use the HBase shell:

hbase shell
create 'employees', 'info'
put 'employees','1','info:name','John'
put 'employees','1','info:salary','5000'

Step 5: Create a Hive Table over HBase Table

Use the HBase storage handler in Hive:

CREATE EXTERNAL TABLE hbase_employees(
  key STRING,
  name STRING,
  salary STRING
)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
  "hbase.columns.mapping" = ":key,info:name,info:salary"
)
TBLPROPERTIES("hbase.table.name" = "employees");

Step 6: Query HBase Data from Hive

SELECT * FROM hbase_employees;

Output Example:

key name salary
1 John 5000

Step 7: Insert Data into HBase via Hive

INSERT INTO TABLE hbase_employees VALUES('2','Alice','7000');

This data will be stored in HBase.


Step 8: Best Practices

  • Use external tables to avoid accidental deletion of HBase data.

  • Keep HBase schema flat for easier mapping.

  • Use optimized SerDe and storage handlers for large datasets.

  • Ensure ZooKeeper is running before integration.


Summary

  • Use HBaseStorageHandler to connect Hive with HBase.

  • Query and insert HBase data from Hive.

  • Combine Hive’s SQL power with HBase’s real-time storage.


This tutorial helps you integrate Apache Hive with Apache HBase to leverage SQL querying on real-time NoSQL data.

Article