Apache Hive Customization Tutorial Series

Hire me to supercharge your Hadoop and Spark projects

I help businesses improve their return on investment from big data projects. I do everything from software architecture to staff training. Learn More

Apache Hive is a SQL-on-Hadoop framework that levereges both MapReduce and Tez to execute queries. It is possible to extend hive with your own code. Hive has a very flexible API, so you can write code to do a whole bunch of things, unfortunately the flexibility comes at the expense of complexity.

There are three types of function APIs in Hive, UDF, UDTF, and UDAF which all do very different things. Only by having a solid grasp of all of them will you truly be able to bend Hive to your will. Below are links to tutorials for each function type.

Hive Tutorials

Normal Functions (UDF)

Normal functions take inputs from a single row, and output a single value. Examples of built-in functions include unix_timestamp(), round(), and cos()

Click here for my UDF tutorial

Table Functions (UDTF)

Table functions are similar to UDF functions, but they can output both multiple columns AND multiple rows of data (which is pretty nifty). Examples of built-in table functions include explode(), json_tuple(), and inline()

Click here for my UDTF tutorial

Aggregate Functions (UDAF)

Aggregate functions can operate over an entire table at once to perform some sort of aggregation. This sounds confusing, but it’s very useful in practice. Examples of built-in aggregate functions include sum(), count(), min(), and histogram_numeric()

Click here for my UDAF tutorial

Matthew Rathbone's Picture

Matthew Rathbone

Consultant Big Data Infrastructure Engineer at Rathbone Labs. British. Data Nerd. Lucky husband and father.

Hire me to supercharge your Hadoop and Spark projects

I help businesses improve their return on investment from big data projects. I do everything from software architecture to staff training. Learn More

Join the discussion