NullPointerException when performing a join with Hadoop Hive 0.50
Learning Hadoop and Spark?
I've scoured the internet and I think this free Big Data course from UC San Diego is a great way to jump in. It's hosted on Coursera, so you can audit the course for free.
If you get this NullPointer exception when joining two tables in hadoop hive, the problem may be that in one of the two tables the join key value is “” (blank string).
For example, if you’re running this query:
select users.id, locations.address from users left outer join locations on users.location_id = locations.id;
and users.location_id happens to be “” somewhere, then you will get this error.
(sometimes I’ve even had it happen because another, non-join column was “”)
To get around this an easy workaround is to create a temporary table holding users with location_id’s that aren’t “”, and with only the columns absolutely needed to process the job.
This drove me crazy for hours today, so hopefully it won’t happen again