Step-by-Step Guide: Aggregate Functions in Hive

9/11/2025
All Articles

Step-by-Step Guide: Aggregate Functions in Hive

Step-by-Step Guide: Aggregate Functions in Hive

Step-by-Step Guide: Aggregate Functions in Hive

Hive aggregate functions operate on a set of rows and return a single summary value. These functions are fundamental for data analysis and reporting in Hive. They are often used in conjunction with the GROUP BY clause to perform aggregations on specific groups of data.


Step 1: What Are Aggregate Functions?

  • Aggregate functions in Hive are used to perform calculations on multiple rows and return a single result.

  • These are commonly used for summarizing data like totals, averages, and counts.


Step 2: COUNT()

Counts the number of rows.

SELECT COUNT(*) AS total_employees
FROM employees;

Output Example: 100


Step 3: SUM()

Calculates the total of numeric column values.

SELECT SUM(salary) AS total_salary
FROM employees;

Output Example: 500000


Step 4: AVG()

Returns the average value of a numeric column.

SELECT AVG(salary) AS avg_salary
FROM employees;

Output Example: 5000


Step 5: MIN() and MAX()

Finds the smallest and largest values.

SELECT MIN(salary) AS min_salary, MAX(salary) AS max_salary
FROM employees;

Output Example: min_salary=10000, max_salary=90000


Step 6: Using Aggregate Functions with GROUP BY

  • GROUP BY is used to apply aggregates on groups of rows.

SELECT department, COUNT(*) AS emp_count, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;

Output Example:

department emp_count avg_salary
HR 10 3000
IT 20 7000

Step 7: Using HAVING with Aggregate Functions

  • HAVING filters groups after aggregation.

SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 5000;

Output Example: Departments with average salary greater than 5000.


Step 8: Important Notes

  • Aggregate functions ignore NULL values.

  • They are often used with GROUP BY to get meaningful summaries.

  • You can combine multiple aggregate functions in a single query.


Best Practices

  • Use GROUP BY for logical grouping before applying aggregate functions.

  • Avoid using aggregate functions on non-numeric fields unless counting.

  • Use aliases for better readability of result sets.

  • Ensure data is cleaned (no unexpected NULLs) before aggregation.


This tutorial helps you efficiently summarize and analyze data using Hive’s aggregate functions.

Article