Adding JARS to hive without using ADD JAR
Hire me to supercharge your Hadoop and Spark projects
I help businesses improve their return on investment from big data projects. I do everything from software architecture to staff training. Learn More
Say you’ve built some library you want to use in Hive, or even in Hadoop. If this library is a UDF for use in hive queries you can load it like this:
ADD JAR ‘s3n://matthewsbucket/superudf.jar’;
CREATE TEMPORARY FUNCTION super as ‘com.matthewrathbone.SuperFunction’;
If you’re creating a bunch of these you don’t want to have to ‘ADD JAR’ _every_single_time_ you want the function, you want it to be in the library already.
To do that either put it in hive/lib, or hadoop/lib on all the nodes. If you’re using Elastic Mapreduce you can do this in a bootstrap script:
sudo apt-get install wget
wget -o /home/hadoop/lib/super.jar http://somewhere.com/superudf.jar
Now you can skip the ADD JAR step in function creation (which is much faster by the way):
CREATE TEMPORARY FUNCTION super as ‘com.matthewrathbone.SuperFunction’;