Some fun Hadoop and Hive Bugs

By Matthew Rathbone on September 20 2011 Share Tweet Post

Hire me to supercharge your Hadoop and Spark projects

I help businesses improve their return on investment from big data projects. I do everything from software architecture to staff training. Learn More

If you’re running Hadoop 0.20 with Hive 0.7 here are a couple of bugs that it’s useful to know about:

NullPointerException

If you have an external partitioned table, this could mean you forgot to recover the partitions before running the query:

ALTER TABLE sample RECOVER PARTITIONS;

MR jobs hanging on 0/0 completed map tasks

Creating an external table that points to an empty location will cause hive to generate mapreduce jobs that hang *forever*. It’s because the map tasks stay at 0% complete (0/0 completed).

There is a Hadoop patch for this (so long as you have the ability to patch your cluster), and it should already be integrated into hadoop version 0.21.

Bonus:

If you have some sort of delimited data (eg, tab delimited) in a Hive external table, and you want to find all records where a particular string field is non-existent, you need to test for empty string and not NULL:

select * from events where venue IS NULL <= Won’t work

select * from events where venue = “” <= Will work

Some fun Hadoop and Hive Bugs

Hire me to supercharge your Hadoop and Spark projects

Matthew Rathbone

Hire me to supercharge your Hadoop and Spark projects

Join the discussion

Beekeeper Studio

Some fun Hadoop and Hive Bugs

Hire me to supercharge your Hadoop and Spark projects

Matthew Rathbone

Hire me to supercharge your Hadoop and Spark projects

Previous

Next

Related Hadoop Articles

Join the discussion

Join my newsletter

Beekeeper Studio

Related Articles