NullPointerException when performing a join with Hadoop Hive 0.50

Hire me to supercharge your Hadoop and Spark projects

I help businesses improve their return on investment from big data projects. I do everything from software architecture to staff training. Learn More

If you get this NullPointer exception when joining two tables in hadoop hive, the problem may be that in one of the two tables the join key value is “” (blank string).

For example, if you’re running this query:

select users.id, locations.address from users left outer join locations on users.location_id = locations.id;

and users.location_id happens to be “” somewhere, then you will get this error.

(sometimes I’ve even had it happen because another, non-join column was “”)

To get around this an easy workaround is to create a temporary table holding users with location_id’s that aren’t “”, and with only the columns absolutely needed to process the job.

This drove me crazy for hours today, so hopefully it won’t happen again

Matthew Rathbone's Picture

Matthew Rathbone

Consultant Big Data Infrastructure Engineer at Rathbone Labs. British. Data Nerd. Lucky husband and father.

Hire me to supercharge your Hadoop and Spark projects

I help businesses improve their return on investment from big data projects. I do everything from software architecture to staff training. Learn More

Join the discussion