Some fun Hadoop and Hive Bugs
Hire me to supercharge your Hadoop and Spark projects
I help businesses improve their return on investment from big data projects. I do everything from software architecture to staff training. Learn More
If you’re running Hadoop 0.20 with Hive 0.7 here are a couple of bugs that it’s useful to know about:
NullPointerException
If you have an external partitioned table, this could mean you forgot to recover the partitions before running the query:
ALTER TABLE sample RECOVER PARTITIONS;
MR jobs hanging on 0/0 completed map tasks
Creating an external table that points to an empty location will cause hive to generate mapreduce jobs that hang *forever*. It’s because the map tasks stay at 0% complete (0/0 completed).
There is a Hadoop patch for this (so long as you have the ability to patch your cluster), and it should already be integrated into hadoop version 0.21.
Bonus:
If you have some sort of delimited data (eg, tab delimited) in a Hive external table, and you want to find all records where a particular string field is non-existent, you need to test for empty string and not NULL:
select * from events where venue IS NULL <= Won’t work
select * from events where venue = “” <= Will work