Spark

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

My Take When it comes to doing analytics and data science at scale, it seems like you can either have...

Choosing Your Data Science Architecture

We data people love our architecture. We obsess over it. Every time Apache announces a new top-level project, we fawn...

What’s a Hadoop, Anyway?

To Hadoop and Beyond is a series dedicated to exploring the basics of distributed computing as it stands today, and to...

Understanding the Core of Hadoop: the MapReduce Algorithm

To Hadoop and Beyond is a series dedicated to exploring the basics of distributed computing as it stands today, and to...