Step-by-Step Guide: Built-in Functions in Hive

9/11/2025
All Articles

Built-in Functions in Hive

Step-by-Step Guide: Built-in Functions in Hive

Step-by-Step Guide: Built-in Functions in Hive

Apache Hive provides a rich set of built-in functions to perform various operations on data. These functions are categorized for easier understanding and usage.

Hive Built-in Functions are predefined functions readily available within Apache Hive for performing various operations on data stored in Hadoop Distributed File System (HDFS). These functions allow users to process and manipulate data using SQL-like queries without needing to write complex custom code.

 


Step 1: What are Built-in Functions in Hive?

  • Hive provides a wide range of built-in functions to manipulate and analyze data.

  • These functions simplify operations like string processing, date calculations, mathematical computations, and aggregations.


Step 2: String Functions

Common string functions used to manipulate text data.

SELECT CONCAT('Hello', ' ', 'World');           -- Output: Hello World
SELECT LENGTH('Hive');                         -- Output: 4
SELECT LOWER('HIVE');                          -- Output: hive
SELECT UPPER('hive');                          -- Output: HIVE
SELECT SUBSTR('Hadoop', 2, 4);                 -- Output: adoo
SELECT TRIM('  data  ');                       -- Output: data
Function Description Example Output
CONCAT Concatenate strings CONCAT('Hello','World') HelloWorld
LENGTH Returns string length LENGTH('Hive') 4
LOWER Converts to lowercase LOWER('HIVE') hive
UPPER Converts to uppercase UPPER('hive') HIVE
SUBSTR Extract substring SUBSTR('Hadoop',2,4) adoo
TRIM Remove leading/trailing spaces TRIM(' data ') data

Step 3: Date Functions

Useful for working with date and time data.

SELECT CURRENT_DATE();                          -- Returns current date
SELECT CURRENT_TIMESTAMP();                     -- Returns current timestamp
SELECT YEAR('2025-09-12');                      -- Output: 2025
SELECT MONTH('2025-09-12');                     -- Output: 9
SELECT DAY('2025-09-12');                       -- Output: 12
SELECT DATEDIFF('2025-09-12','2025-09-01');     -- Output: 11
Function Description Example Output
CURRENT_DATE Returns current system date CURRENT_DATE() 2025-09-12
CURRENT_TIMESTAMP Returns current system timestamp CURRENT_TIMESTAMP() 2025-09-12 12:00:00
YEAR Extract year from date YEAR('2025-09-12') 2025
MONTH Extract month from date MONTH('2025-09-12') 9
DAY Extract day from date DAY('2025-09-12') 12
DATEDIFF Difference between two dates DATEDIFF('2025-09-12','2025-09-01') 11

Step 4: Mathematical Functions

Perform arithmetic and advanced mathematical calculations.

SELECT ROUND(123.456, 2);                       -- Output: 123.46
SELECT CEIL(4.3);                               -- Output: 5
SELECT FLOOR(4.9);                              -- Output: 4
SELECT SQRT(25);                                -- Output: 5
SELECT POWER(2, 3);                             -- Output: 8
SELECT ABS(-15);                                -- Output: 15
Function Description Example Output
ROUND Rounds number to decimals ROUND(123.456,2) 123.46
CEIL Ceiling value CEIL(4.3) 5
FLOOR Floor value FLOOR(4.9) 4
SQRT Square root SQRT(25) 5
POWER Exponentiation POWER(2,3) 8
ABS Absolute value ABS(-15) 15

Step 5: Conditional Functions

Used to apply conditional logic inside queries.

SELECT IF(10 > 5, 'TRUE', 'FALSE');              -- Output: TRUE
SELECT COALESCE(NULL, 'Hive', 'Spark');           -- Output: Hive
SELECT NVL(NULL, 'default');                      -- Output: default
SELECT CASE WHEN salary > 50000 THEN 'High' ELSE 'Low' END FROM employees;
Function Description Example Output
IF Conditional check IF(10>5,'TRUE','FALSE') TRUE
COALESCE Returns first non-null value COALESCE(NULL,'Hive') Hive
NVL Replaces NULL with given value NVL(NULL,'default') default
CASE Multiple condition check CASE WHEN salary>50000 THEN 'High' ELSE 'Low' END High/Low

Step 6: Aggregation Functions

Used to summarize data across rows.

SELECT COUNT(*) FROM employees;                   -- Returns total rows
SELECT SUM(salary) FROM employees;                -- Returns total salary
SELECT AVG(salary) FROM employees;                -- Returns average salary
SELECT MAX(salary) FROM employees;                -- Returns highest salary
SELECT MIN(salary) FROM employees;                -- Returns lowest salary
Function Description Example Output
COUNT Counts total rows COUNT(*) 100
SUM Sum of values SUM(salary) 500000
AVG Average of values AVG(salary) 5000
MAX Maximum value MAX(salary) 90000
MIN Minimum value MIN(salary) 10000

Best Practices

  • Use appropriate data types to avoid type conversion errors.

  • Apply functions on filtered data to improve performance.

  • Use built-in functions in SELECT, WHERE, and GROUP BY clauses for efficiency.

  • Refer to SHOW FUNCTIONS; in Hive to see all available functions.


This tutorial helps you understand and use Hive’s built-in functions effectively to clean, transform, and analyze your data.

Article