Startups

Introducing ETL Markup Toolkit (EMT)

TL;DR – I developed an open source toolkit for writing Spark-native ETL using configurations in a highly sub-scriptable and transparent...

On Real Data Science and the Future of the Business Analyst

What is “real data science” anyway? tl;dr: most data scientists at Facebook are business analysts and that’s perfectly fine One...

Data Science and Product Management are like Chocolate and Peanut Butter

“Hey, you got chocolate in my peanut butter!” “You got peanut butter in my chocolate!” “Delicious!” So goes the old...

Choosing Your Data Science Architecture

We data people love our architecture. We obsess over it. Every time Apache announces a new top-level project, we fawn...

What’s a Hadoop, Anyway?

To Hadoop and Beyond is a series dedicated to exploring the basics of distributed computing as it stands today, and to...

Understanding the Core of Hadoop: the MapReduce Algorithm

To Hadoop and Beyond is a series dedicated to exploring the basics of distributed computing as it stands today, and to...

Structuring Your Data Science Workflow

Now What? Congratulations,  you’ve successfully recruited and hired a few data scientists, positioned them in the right place in your...

Academic Profiles for Data Science

If you’re interested in Data Science (DS) as a field and have read enough job postings, you start to pick...

Lessons to be Learned from the Most Recent Facebook Hack

A few days ago, Orange Tsai, who works with Devcore, a group of Taiwanese hackers (in the white-hat sense), posted...

Placing Data Scientists Within Your Organization

There are a number of different ways in which Data Science (DS) teams can be structured, but if your organization...