tip.top.blog

the blog of matthew rathbone

0 notes &

Some fun Hadoop and Hive Bugs

If you’re running Hadoop 0.20 with Hive 0.7 here are a couple of bugs that it’s useful to know about:

NullPointerException

If you have an external partitioned table, this could mean you forgot to recover the partitions before running the query:

ALTER TABLE sample RECOVER PARTITIONS;

MR jobs hanging on 0/0 completed map tasks

Creating an external table that points to an empty location will cause hive to generate mapreduce jobs that hang *forever*. It’s because the map tasks stay at 0% complete (0/0 completed).

There is a Hadoop patch for this (so long as you have the ability to patch your cluster), and it should already be integrated into hadoop version 0.21.

Bonus:

If you have some sort of delimited data (eg, tab delimited) in a Hive external table, and you want to find all records where a particular string field is non-existent,  you need to test for empty string and not NULL:

select * from events where venue IS NULL <= Won’t work

select * from events where venue = “” <= Will work