Apache Hive Customization Tutorial Series
Learning Hadoop and Spark?
I've scoured the internet and I think this free Big Data course from UC San Diego is a great way to jump in. It's hosted on Coursera, so you can audit the course for free.
Apache Hive is a SQL-on-Hadoop framework that levereges both MapReduce and Tez to execute queries. It is possible to extend hive with your own code. Hive has a very flexible API, so you can write code to do a whole bunch of things, unfortunately the flexibility comes at the expense of complexity.
There are three types of function APIs in Hive, UDF, UDTF, and UDAF which all do very different things. Only by having a solid grasp of all of them will you truly be able to bend Hive to your will. Below are links to tutorials for each function type.
Normal Functions (UDF)
Normal functions take inputs from a single row, and output a single value. Examples of built-in functions include
Table Functions (UDTF)
Table functions are similar to UDF functions, but they can output both multiple columns AND multiple rows of data (which is pretty nifty). Examples of built-in table functions include
Aggregate Functions (UDAF)
Aggregate functions can operate over an entire table at once to perform some sort of aggregation. This sounds confusing, but it’s very useful in practice. Examples of built-in aggregate functions include