Join My Big Data Newsletter

Hadoop Tutorials
Engineering Walkthroughs

Book Recommendations
Open Source Code

Recent Articles View All

Should you use Parquet?

Parquet provides significant benefits for sparse reads of large datasets, but is it always the file format to use?

Beginners Guide to Columnar File Formats in Spark and Hadoop

File formats can be confusing, so lets delve into Columnar file formats (like Parquet) and explain why they're different to regular formats (like CSV, JSON, or Avro)

A Quick Guide to Concurrency in Scala

I'll talk through the basics of Threads, Akka, Futures, and Timers in this quick overview of concurrency for Scala. Great for those building apps in Scala.

4 Fun and Useful Things to Know about Scala's apply() functions

Scala's apply functions are commonly seen alongside case classes, but they can do so much more. Here are 4 fun ways they are used in Scala.

10+ Great Books and Resources for Learning and Perfecting Scala

While Scala is amazing it has an overwhelming number of features. These books and on-line resources will help you learn and perfect Scala whether you're coming from Java, Python, Ruby, or any other language.

10+ Great Books for Apache Spark

Apache Spark is a powerful technology with some fantastic books. I'll help you choose which book to buy with my guide to the top 10+ Spark books on the market.