Step-by-Step Guide to Creating Tables in Hive

9/11/2025

All Articles

Creating Tables in Hive - Step by Step

Step-by-Step Guide to Creating Tables in Hive

Step-by-step: Creating Tables in Hive

This guide walks you through creating Hive tables from zero to production-ready, with commands you can copy-paste into Hive CLI or Beeline.

Before creating tables, understand the two main types:

Managed (Internal) Table — Hive owns the data and metadata. Dropping the table deletes both.
External Table — Hive stores only metadata; data remains at the original location even if the table is dropped.

1. Prerequisites

Hadoop HDFS running and accessible to the Hive user.
Hive installed and Hive Metastore configured.
Java and required services running on your cluster.
Proper HDFS permissions for any LOCATION you use.

2. Connect to Hive

Hive CLI (legacy):

hive

Beeline (recommended):

beeline -u "jdbc:hive2://<host>:10000/default" -n <user> -p <password>

3. Create or switch to a database

CREATE DATABASE IF NOT EXISTS analytics;
USE analytics;

4. Create a managed table (step-by-step)

Create the table (managed):

CREATE TABLE employees (
  id INT,
  name STRING,
  department STRING,
  salary DOUBLE
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;

Put data into HDFS and load:

hdfs dfs -mkdir -p /user/hive/warehouse/employees
hdfs dfs -put employees.csv /user/hive/warehouse/employees/

LOAD DATA INPATH '/user/hive/warehouse/employees/employees.csv' INTO TABLE employees;

Verify:

SELECT COUNT(*) FROM employees;

5. Create an external table (step-by-step)

Upload files to a shared HDFS location:

hdfs dfs -mkdir -p /data/customers
hdfs dfs -put customers.csv /data/customers/

Create the external table:

CREATE EXTERNAL TABLE customers (
  id INT,
  name STRING,
  email STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/data/customers';

Verify and note behavior on DROP:

SELECT * FROM customers LIMIT 10;
DROP TABLE customers; removes metadata only — files remain at /data/customers.

6. Create Table As Select (CTAS)

CREATE TABLE sales_summary
STORED AS PARQUET
AS
SELECT store_id, SUM(amount) total_sales
FROM sales
GROUP BY store_id;

7. Partitioned tables (static & dynamic)

Create partitioned table:

CREATE TABLE logs (
  event_time STRING,
  event_type STRING,
  message STRING
)
PARTITIONED BY (dt STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;

Static load into partition:

LOAD DATA INPATH '/data/logs/2025-09-11/logs.tsv' INTO TABLE logs PARTITION (dt='2025-09-11');

Dynamic partitioning insert:

SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;

INSERT INTO TABLE logs PARTITION (dt)
SELECT event_time, event_type, message, dt
FROM staging_logs;

8. Bucketing (clustering)

Create bucketed table:

CREATE TABLE users_bucketed (
  id INT,
  name STRING,
  email STRING
)
CLUSTERED BY (id) INTO 8 BUCKETS
STORED AS ORC;

Enable enforcement and insert:

SET hive.enforce.bucketing = true;
INSERT INTO TABLE users_bucketed SELECT id, name, email FROM users;

9. SerDe and file formats

Parquet/ORC for analytics:

CREATE TABLE events_parquet (id INT, type STRING, payload STRING) STORED AS PARQUET;

JSON SerDe example:

CREATE EXTERNAL TABLE events_json (id INT, type STRING, payload STRING)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION '/data/events/json';

10. View metadata & maintenance

Inspect schema and location:

DESCRIBE FORMATTED table_name;
SHOW CREATE TABLE table_name;
SHOW PARTITIONS table_name;

Repair partitions:

MSCK REPAIR TABLE table_name;
ALTER TABLE table_name RECOVER PARTITIONS;

11. Alter and drop table

Add a column:

ALTER TABLE employees ADD COLUMNS (birth_date STRING);

Change location:

ALTER TABLE customers SET LOCATION '/new/path/customers';

Drop behavior:
- DROP TABLE employees; (managed) deletes data and metadata.
- DROP TABLE customers; (external) deletes metadata only.