Step-by-Step Guide: Built-in Functions in Hive
Built-in Functions in Hive
Apache Hive provides a rich set of built-in functions to perform various operations on data. These functions are categorized for easier understanding and usage.
Hive Built-in Functions are predefined functions readily available within Apache Hive for performing various operations on data stored in Hadoop Distributed File System (HDFS). These functions allow users to process and manipulate data using SQL-like queries without needing to write complex custom code.
Hive provides a wide range of built-in functions to manipulate and analyze data.
These functions simplify operations like string processing, date calculations, mathematical computations, and aggregations.
Common string functions used to manipulate text data.
SELECT CONCAT('Hello', ' ', 'World'); -- Output: Hello World
SELECT LENGTH('Hive'); -- Output: 4
SELECT LOWER('HIVE'); -- Output: hive
SELECT UPPER('hive'); -- Output: HIVE
SELECT SUBSTR('Hadoop', 2, 4); -- Output: adoo
SELECT TRIM(' data '); -- Output: data
| Function | Description | Example | Output |
|---|---|---|---|
| CONCAT | Concatenate strings |
CONCAT('Hello','World') |
HelloWorld |
| LENGTH | Returns string length |
LENGTH('Hive') |
4 |
| LOWER | Converts to lowercase |
LOWER('HIVE') |
hive |
| UPPER | Converts to uppercase |
UPPER('hive') |
HIVE |
| SUBSTR | Extract substring |
SUBSTR('Hadoop',2,4) |
adoo |
| TRIM | Remove leading/trailing spaces |
TRIM(' data ') |
data |
Useful for working with date and time data.
SELECT CURRENT_DATE(); -- Returns current date
SELECT CURRENT_TIMESTAMP(); -- Returns current timestamp
SELECT YEAR('2025-09-12'); -- Output: 2025
SELECT MONTH('2025-09-12'); -- Output: 9
SELECT DAY('2025-09-12'); -- Output: 12
SELECT DATEDIFF('2025-09-12','2025-09-01'); -- Output: 11
| Function | Description | Example | Output |
|---|---|---|---|
| CURRENT_DATE | Returns current system date |
CURRENT_DATE() |
2025-09-12 |
| CURRENT_TIMESTAMP | Returns current system timestamp |
CURRENT_TIMESTAMP() |
2025-09-12 12:00:00 |
| YEAR | Extract year from date |
YEAR('2025-09-12') |
2025 |
| MONTH | Extract month from date |
MONTH('2025-09-12') |
9 |
| DAY | Extract day from date |
DAY('2025-09-12') |
12 |
| DATEDIFF | Difference between two dates |
DATEDIFF('2025-09-12','2025-09-01') |
11 |
Perform arithmetic and advanced mathematical calculations.
SELECT ROUND(123.456, 2); -- Output: 123.46
SELECT CEIL(4.3); -- Output: 5
SELECT FLOOR(4.9); -- Output: 4
SELECT SQRT(25); -- Output: 5
SELECT POWER(2, 3); -- Output: 8
SELECT ABS(-15); -- Output: 15
| Function | Description | Example | Output |
|---|---|---|---|
| ROUND | Rounds number to decimals |
ROUND(123.456,2) |
123.46 |
| CEIL | Ceiling value |
CEIL(4.3) |
5 |
| FLOOR | Floor value |
FLOOR(4.9) |
4 |
| SQRT | Square root |
SQRT(25) |
5 |
| POWER | Exponentiation |
POWER(2,3) |
8 |
| ABS | Absolute value |
ABS(-15) |
15 |
Used to apply conditional logic inside queries.
SELECT IF(10 > 5, 'TRUE', 'FALSE'); -- Output: TRUE
SELECT COALESCE(NULL, 'Hive', 'Spark'); -- Output: Hive
SELECT NVL(NULL, 'default'); -- Output: default
SELECT CASE WHEN salary > 50000 THEN 'High' ELSE 'Low' END FROM employees;
| Function | Description | Example | Output |
|---|---|---|---|
| IF | Conditional check |
IF(10>5,'TRUE','FALSE') |
TRUE |
| COALESCE | Returns first non-null value |
COALESCE(NULL,'Hive') |
Hive |
| NVL | Replaces NULL with given value |
NVL(NULL,'default') |
default |
| CASE | Multiple condition check |
CASE WHEN salary>50000 THEN 'High' ELSE 'Low' END |
High/Low |
Used to summarize data across rows.
SELECT COUNT(*) FROM employees; -- Returns total rows
SELECT SUM(salary) FROM employees; -- Returns total salary
SELECT AVG(salary) FROM employees; -- Returns average salary
SELECT MAX(salary) FROM employees; -- Returns highest salary
SELECT MIN(salary) FROM employees; -- Returns lowest salary
| Function | Description | Example | Output |
|---|---|---|---|
| COUNT | Counts total rows |
COUNT(*) |
100 |
| SUM | Sum of values |
SUM(salary) |
500000 |
| AVG | Average of values |
AVG(salary) |
5000 |
| MAX | Maximum value |
MAX(salary) |
90000 |
| MIN | Minimum value |
MIN(salary) |
10000 |
Use appropriate data types to avoid type conversion errors.
Apply functions on filtered data to improve performance.
Use built-in functions in SELECT, WHERE, and GROUP BY clauses for efficiency.
Refer to SHOW FUNCTIONS; in Hive to see all available functions.
This tutorial helps you understand and use Hive’s built-in functions effectively to clean, transform, and analyze your data.