Adding JARS to hive without using ADD JAR

Hire me to supercharge your Hadoop and Spark projects

I help businesses improve their return on investment from big data projects. I do everything from software architecture to staff training. Learn More

Say you’ve built some library you want to use in Hive, or even in Hadoop. If this library is a UDF for use in hive queries you can load it like this:

ADD JAR ‘s3n://matthewsbucket/superudf.jar’;

CREATE TEMPORARY FUNCTION super as ‘com.matthewrathbone.SuperFunction’;

If you’re creating a bunch of these you don’t want to have to ‘ADD JAR’ _every_single_time_ you want the function, you want it to be in the library already.

To do that either put it in hive/lib, or hadoop/lib on all the nodes. If you’re using Elastic Mapreduce you can do this in a bootstrap script:

sudo apt-get install wget

wget -o /home/hadoop/lib/super.jar http://somewhere.com/superudf.jar

Now you can skip the ADD JAR step in function creation (which is much faster by the way):

CREATE TEMPORARY FUNCTION super as ‘com.matthewrathbone.SuperFunction’;

Matthew Rathbone's Picture

Matthew Rathbone

Consultant Big Data Infrastructure Engineer at Rathbone Labs. British. Data Nerd. Lucky husband and father.

Hire me to supercharge your Hadoop and Spark projects

I help businesses improve their return on investment from big data projects. I do everything from software architecture to staff training. Learn More

Join the discussion