Step-by-Step Guide: User-Defined Functions (UDFs) in Hive

9/11/2025
All Articles

User-Defined Functions (UDFs) in Hive

Step-by-Step Guide: User-Defined Functions (UDFs) in Hive

Step-by-Step Guide: User-Defined Functions (UDFs) in Hive

User-Defined Functions (UDFs) in Apache Hive allow users to extend Hive's functionality by creating custom functions to perform specific data transformations or calculations that are not available through built-in functions. These UDFs can be written in various programming languages, most commonly Java, and then integrated into Hive queries. You can follow below step :-


Step 1: What Are User-Defined Functions (UDFs)?

  • Hive provides many built-in functions, but you may need custom logic.

  • User-Defined Functions (UDFs) allow developers to extend Hive’s capabilities using Java.


Step 2: Types of UDFs in Hive

  • UDF (Simple): Operates on a single row and returns a single value.

  • UDAF (User-Defined Aggregate Function): Works on multiple rows and returns a single aggregated value.

  • UDTF (User-Defined Table-Generating Function): Takes a single row and outputs multiple rows.


Step 3: Create a Simple UDF in Java

  1. Create a Java class extending org.apache.hadoop.hive.ql.exec.UDF.

  2. Implement the evaluate() method.

package com.example;
import org.apache.hadoop.hive.ql.exec.UDF;

public class UpperCaseUDF extends UDF {
    public String evaluate(String input) {
        if (input == null) return null;
        return input.toUpperCase();
    }
}

Step 4: Compile and Create JAR

  • Compile your Java file and create a JAR file.

javac -cp $(hive --auxpath) UpperCaseUDF.java
jar -cf UpperCaseUDF.jar com/example/UpperCaseUDF.class

Step 5: Register the UDF in Hive

  • Add the JAR to the Hive session and create a temporary function.

ADD JAR /path/to/UpperCaseUDF.jar;
CREATE TEMPORARY FUNCTION to_upper AS 'com.example.UpperCaseUDF';

Step 6: Use the UDF in Queries

SELECT to_upper(name) FROM employees;

Output Example: Converts all names to uppercase.


Step 7: Creating a Permanent Function

  • To make the function available for all sessions:

CREATE FUNCTION to_upper AS 'com.example.UpperCaseUDF'
USING JAR 'hdfs:///user/hive/udfs/UpperCaseUDF.jar';

Step 8: Best Practices

  • Test your UDF thoroughly before using in production.

  • Handle null and invalid inputs gracefully.

  • Use appropriate data types to avoid type conversion errors.

  • Maintain versioned JARs for easy updates.


Summary

  • UDFs extend Hive’s functionality using Java.

  • They can be temporary or permanent.

  • Register and use them like built-in functions.


This tutorial helps you build and use User-Defined Functions (UDFs) to add custom logic to your Hive queries.

Article