15+ Great Books for Hadoop

Hire me to supercharge your Hadoop and Spark projects

I help businesses improve their return on investment from big data projects. I do everything from software architecture to staff training. Learn More

If you want to learn more about Hadoop there are many resources at your disposal, one such resource is books. I keep a list of Hadoop books privately, so I thought I’d put it on-line to save other people having to do the same research

Feb 22nd 2014 - Updated

3 new books added to the list!

Books for Hadoop & Map Reduce

  • Hadoop: The Definitive Guide by Tom White

    The Definitive guide is in some ways the ‘hadoop bible’, and can be an excellent reference when working on Hadoop, but do not expect it to provide a simple getting started tutorial for writing a Map Reduce. This book is great for really understanding how everything works and how all the systems fit together.

  • Hadoop Operations by Eric Sammer

    This is the book if you need to know the ins and outs of prototyping, deploying, configuring, optimizing, and tweaking a production Hadoop system. Eric Sammer is a very knowledgeable engineer, so this book is chock full of goodies.

  • Map Reduce Design Patterns by Donald Miller and Adam Shook

    Design Patterns is a great resource to get some insight into how to do non-trivial things with Hadoop. This book goes into useful detail on how to design specific types of algorithms, outlines why they should be designed that way, and provides examples.

  • Hadoop in Action by Chuck Lam

    One of the few non-O’Reilly books in this list, Hadoop in Action is similar to the definitive guide in that it provides a good reference for what Hadoop is and how to use it. It seems like this book provides a more gentle introduction to Hadoop compared to the other books in this list.

  • Hadoop in Practice by Alex Holmes

    A slightly more advanced guide to running Hadoop. It includes chapters that detail how to best move data around, how to think in Map Reduce, and (importantly) how to debug and optimize your jobs.

  • Pro Hadoop by Jason Venner

    This A-Press book claims it will guide you through initial hadoop set up while also helping you avoid many of the pitfalls that usual Hadoop novices encounter. Again it is similar in contents to Hadoop in Action and The Definitive Guide

  • Hadoop Essentials: A Quantitative Approach by Henry Liu

    Another Hadoop intro book, Hadoop Essentials focuses on providing a more practical introduction to Hadoop which seems ideal for a CS classroom setting

  • Real World Hadoop Solutions Cookbook by Jonathan Owens, Brian Femiano & Jon Lentz

    A book which aims to provide real-world examples of common hadoop problems. It also covers building integrated solutions using surrounding tools (hive, pig, girafe, etc)

  • Hadoop Map Reduce Cookbook by Srinath Perera

    The cookbook provides an introduction to installing / configuring Hadoop along with ‘more than 50 ready-to-use Hadoop MapReduce recipes’.

  • Enterprise Data Workflows with Cascading

    Released July 2013 this book promises to guide readers through writing and testing Cascading based workflows. This is one of the few books written about higher level Map Reduce frameworks, so I’m excited to give it a read.

  • Apache Hadoop Yarn by Arun Murthy et al.

    A front to back guide to YARN, the next generation task management layer for Hadoop. This book is written (in part) by the YARN project founder, and the project lead.

  • Instant Map Reduce Patterns by Srinath Perera

    This book is built around seven map reduce ‘recipes’ to learn from. It aims to be a consise, practical guide to get you coding.

Bonus

  • Agile Data by Russell Jurney

    Russell introduces his own version of an agile tool-set for data analysis and exploration. The book covers both investigative tools (like Apache Pig), and visualization tools like D3. His pitch is pretty compelling

That’s It

There are many, many books on more general topics of big data, data science, analytics, etc, but I think I’ve covered the main books that specifically focus on Hadoop and related projects. Please email me or tweet me if I’ve missed anything!

Matthew Rathbone's Picture

Matthew Rathbone

Consultant Big Data Infrastructure Engineer at Rathbone Labs. British. Data Nerd. Lucky husband and father.

Hire me to supercharge your Hadoop and Spark projects

I help businesses improve their return on investment from big data projects. I do everything from software architecture to staff training. Learn More

Join the discussion