Step-by-Step Guide: Handling Text Files in Hive
Text Files in Hive
In this artcle , we explain on Text files are a fundamental and widely supported file format within Apache Hive, serving as a basic and easily understandable way to store data.
The TextFile format is the default storage format in Hive.
Data is stored as plain text, typically CSV or TSV.
Each line represents one row, and columns are separated by delimiters (like comma , or tab \t).
Pros: Easy to create and read
Cons: Not optimized for performance or compression
Use the STORED AS TEXTFILE clause when creating a table.
CREATE TABLE employees (
id INT,
name STRING,
department STRING,
salary DOUBLE
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
FIELDS TERMINATED BY ',' specifies the column delimiter.
Data will be stored in the default Hive warehouse directory.
Use the LOAD DATA command to load text data from HDFS or local file system.
LOAD DATA INPATH '/user/hive/input/employees.csv'
INTO TABLE employees;
The file should be plain text (CSV/TSV) without headers.
Hive moves the file into the table’s directory.
If your text files already exist in HDFS and are shared with other systems, use an external table.
CREATE EXTERNAL TABLE customers (
id INT,
name STRING,
email STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/user/hive/external/customers';
Dropping this table will not delete the actual files.
You can query text data like any other Hive table:
SELECT name, department FROM employees WHERE salary > 50000;
You can define custom delimiters for complex text data.
CREATE TABLE logs (
id INT,
message STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;
Since TextFile is not performance optimized, you can convert it to ORC or Parquet for faster queries.
CREATE TABLE employees_orc STORED AS ORC AS
SELECT * FROM employees;
Use TextFile for small, raw, or initial data ingestion.
Convert to ORC/Parquet for production workloads.
Ensure consistent delimiters in text files.
Avoid loading very large text files directly; split them if needed.
Validate data formats before loading.
This step-by-step tutorial helps you handle text files efficiently in Hive for smooth data ingestion and querying.