Step-by-Step Guide: Aggregate Functions in Hive
Step-by-Step Guide: Aggregate Functions in Hive
Hive aggregate functions operate on a set of rows and return a single summary value. These functions are fundamental for data analysis and reporting in Hive. They are often used in conjunction with the GROUP BY clause to perform aggregations on specific groups of data.
Aggregate functions in Hive are used to perform calculations on multiple rows and return a single result.
These are commonly used for summarizing data like totals, averages, and counts.
Counts the number of rows.
SELECT COUNT(*) AS total_employees
FROM employees;
Output Example: 100
Calculates the total of numeric column values.
SELECT SUM(salary) AS total_salary
FROM employees;
Output Example: 500000
Returns the average value of a numeric column.
SELECT AVG(salary) AS avg_salary
FROM employees;
Output Example: 5000
Finds the smallest and largest values.
SELECT MIN(salary) AS min_salary, MAX(salary) AS max_salary
FROM employees;
Output Example: min_salary=10000, max_salary=90000
GROUP BY is used to apply aggregates on groups of rows.
SELECT department, COUNT(*) AS emp_count, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;
Output Example:
| department | emp_count | avg_salary |
|---|---|---|
| HR | 10 | 3000 |
| IT | 20 | 7000 |
HAVING filters groups after aggregation.
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 5000;
Output Example: Departments with average salary greater than 5000.
Aggregate functions ignore NULL values.
They are often used with GROUP BY to get meaningful summaries.
You can combine multiple aggregate functions in a single query.
Use GROUP BY for logical grouping before applying aggregate functions.
Avoid using aggregate functions on non-numeric fields unless counting.
Use aliases for better readability of result sets.
Ensure data is cleaned (no unexpected NULLs) before aggregation.
This tutorial helps you efficiently summarize and analyze data using Hive’s aggregate functions.