Some fun Hadoop and Hive Bugs

Learning Hadoop and Spark?

I've scoured the internet and I think this free Big Data course from UC San Diego is a great way to jump in. It's hosted on Coursera, so you can audit the course for free.

If you’re running Hadoop 0.20 with Hive 0.7 here are a couple of bugs that it’s useful to know about:

NullPointerException

If you have an external partitioned table, this could mean you forgot to recover the partitions before running the query:

ALTER TABLE sample RECOVER PARTITIONS;

MR jobs hanging on 0/0 completed map tasks

Creating an external table that points to an empty location will cause hive to generate mapreduce jobs that hang *forever*. It’s because the map tasks stay at 0% complete (0/0 completed).

There is a Hadoop patch for this (so long as you have the ability to patch your cluster), and it should already be integrated into hadoop version 0.21.

Bonus:

If you have some sort of delimited data (eg, tab delimited) in a Hive external table, and you want to find all records where a particular string field is non-existent,  you need to test for empty string and not NULL:

select * from events where venue IS NULL <= Won’t work

select * from events where venue = “” <= Will work

Matthew Rathbone's Picture

Matthew Rathbone

CEO of Beekeeper Data. British. Data Nerd. Lucky husband and father. More about me

Need More Hadoop Reading?

I've collected a list of the top Hadoop books on the market

Join the discussion