Adding JARS to hive without using ADD JAR

Learning Hadoop and Spark?

I've scoured the internet and I think this free Big Data course from UC San Diego is a great way to jump in. It's hosted on Coursera, so you can audit the course for free.

Say you’ve built some library you want to use in Hive, or even in Hadoop. If this library is a UDF for use in hive queries you can load it like this:

ADD JAR ‘s3n://matthewsbucket/superudf.jar’;

CREATE TEMPORARY FUNCTION super as ‘com.matthewrathbone.SuperFunction’;

If you’re creating a bunch of these you don’t want to have to ‘ADD JAR’ _every_single_time_ you want the function, you want it to be in the library already.

To do that either put it in hive/lib, or hadoop/lib on all the nodes. If you’re using Elastic Mapreduce you can do this in a bootstrap script:

sudo apt-get install wget

wget -o /home/hadoop/lib/super.jar http://somewhere.com/superudf.jar

Now you can skip the ADD JAR step in function creation (which is much faster by the way):

CREATE TEMPORARY FUNCTION super as ‘com.matthewrathbone.SuperFunction’;

Matthew Rathbone's Picture

Matthew Rathbone

CEO of Beekeeper Data. British. Data Nerd. Lucky husband and father. More about me

Need More Hadoop Reading?

I've collected a list of the top Hadoop books on the market

Join the discussion