Step-by-Step Guide: Handling Text Files in Hive

9/11/2025
All Articles

Text Files in Hive

Step-by-Step Guide: Handling Text Files in Hive

Step-by-Step Guide: Handling Text Files in Hive

In this artcle , we explain on Text files are a fundamental and widely supported file format within Apache Hive, serving as a basic and easily understandable way to store data.

Step 1: Understanding TextFile Format in Hive

  • The TextFile format is the default storage format in Hive.

  • Data is stored as plain text, typically CSV or TSV.

  • Each line represents one row, and columns are separated by delimiters (like comma , or tab \t).

Pros: Easy to create and read
Cons: Not optimized for performance or compression


Step 2: Creating a Table with TextFile Format

Use the STORED AS TEXTFILE clause when creating a table.

CREATE TABLE employees (
  id INT,
  name STRING,
  department STRING,
  salary DOUBLE
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
  • FIELDS TERMINATED BY ',' specifies the column delimiter.

  • Data will be stored in the default Hive warehouse directory.


Step 3: Loading Text Data into Hive

Use the LOAD DATA command to load text data from HDFS or local file system.

LOAD DATA INPATH '/user/hive/input/employees.csv'
INTO TABLE employees;
  • The file should be plain text (CSV/TSV) without headers.

  • Hive moves the file into the table’s directory.


Step 4: Creating an External Table for Text Files

If your text files already exist in HDFS and are shared with other systems, use an external table.

CREATE EXTERNAL TABLE customers (
  id INT,
  name STRING,
  email STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/user/hive/external/customers';
  • Dropping this table will not delete the actual files.


Step 5: Querying Text Data

You can query text data like any other Hive table:

SELECT name, department FROM employees WHERE salary > 50000;

Step 6: Handling Complex Delimiters

You can define custom delimiters for complex text data.

CREATE TABLE logs (
  id INT,
  message STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;

Step 7: Converting TextFile to Optimized Format

Since TextFile is not performance optimized, you can convert it to ORC or Parquet for faster queries.

CREATE TABLE employees_orc STORED AS ORC AS
SELECT * FROM employees;

✅ Best Practices

  • Use TextFile for small, raw, or initial data ingestion.

  • Convert to ORC/Parquet for production workloads.

  • Ensure consistent delimiters in text files.

  • Avoid loading very large text files directly; split them if needed.

  • Validate data formats before loading.


This step-by-step tutorial helps you handle text files efficiently in Hive for smooth data ingestion and querying.

Article