Step-by-Step Guide: User-Defined Functions (UDFs) in Hive
User-Defined Functions (UDFs) in Hive
User-Defined Functions (UDFs) in Apache Hive allow users to extend Hive's functionality by creating custom functions to perform specific data transformations or calculations that are not available through built-in functions. These UDFs can be written in various programming languages, most commonly Java, and then integrated into Hive queries. You can follow below step :-
Hive provides many built-in functions, but you may need custom logic.
User-Defined Functions (UDFs) allow developers to extend Hive’s capabilities using Java.
UDF (Simple): Operates on a single row and returns a single value.
UDAF (User-Defined Aggregate Function): Works on multiple rows and returns a single aggregated value.
UDTF (User-Defined Table-Generating Function): Takes a single row and outputs multiple rows.
Create a Java class extending org.apache.hadoop.hive.ql.exec.UDF.
Implement the evaluate() method.
package com.example;
import org.apache.hadoop.hive.ql.exec.UDF;
public class UpperCaseUDF extends UDF {
public String evaluate(String input) {
if (input == null) return null;
return input.toUpperCase();
}
}
Compile your Java file and create a JAR file.
javac -cp $(hive --auxpath) UpperCaseUDF.java
jar -cf UpperCaseUDF.jar com/example/UpperCaseUDF.class
Add the JAR to the Hive session and create a temporary function.
ADD JAR /path/to/UpperCaseUDF.jar;
CREATE TEMPORARY FUNCTION to_upper AS 'com.example.UpperCaseUDF';
SELECT to_upper(name) FROM employees;
Output Example: Converts all names to uppercase.
To make the function available for all sessions:
CREATE FUNCTION to_upper AS 'com.example.UpperCaseUDF'
USING JAR 'hdfs:///user/hive/udfs/UpperCaseUDF.jar';
Test your UDF thoroughly before using in production.
Handle null and invalid inputs gracefully.
Use appropriate data types to avoid type conversion errors.
Maintain versioned JARs for easy updates.
UDFs extend Hive’s functionality using Java.
They can be temporary or permanent.
Register and use them like built-in functions.
This tutorial helps you build and use User-Defined Functions (UDFs) to add custom logic to your Hive queries.