Adding JARS to hive without using ADD JAR

Say you’ve built some library you want to use in Hive, or even in Hadoop. If this library is a UDF for use in hive queries you can load it like this:

ADD JAR ‘s3n://matthewsbucket/superudf.jar’;

CREATE TEMPORARY FUNCTION super as ‘com.matthewrathbone.SuperFunction’;

If you’re creating a bunch of these you don’t want to have to ‘ADD JAR’ _every_single_time_ you want the function, you want it to be in the library already.

To do that either put it in hive/lib, or hadoop/lib on all the nodes. If you’re using Elastic Mapreduce you can do this in a bootstrap script:

sudo apt-get install wget

wget -o /home/hadoop/lib/super.jar http://somewhere.com/superudf.jar

Now you can skip the ADD JAR step in function creation (which is much faster by the way):

CREATE TEMPORARY FUNCTION super as ‘com.matthewrathbone.SuperFunction’;

Matthew Rathbone's Picture

Matthew Rathbone

CEO of Beekeeper Data. British. Data Nerd. Lucky husband and father. More about me

Need More Hadoop Reading?

I've collected a list of the top Hadoop books on the market

Join the discussion

comments powered by Disqus