Data

Introducing ETL Markup Toolkit (EMT)

TL;DR – I developed an open source toolkit for writing Spark-native ETL using configurations in a highly sub-scriptable and transparent...

On Real Data Science and the Future of the Business Analyst

What is “real data science” anyway? tl;dr: most data scientists at Facebook are business analysts and that’s perfectly fine One...

Data Science and Product Management are like Chocolate and Peanut Butter

“Hey, you got chocolate in my peanut butter!” “You got peanut butter in my chocolate!” “Delicious!” So goes the old...

Using Machine Learning to Classify the Quality of Wine

Using an open-source dataset, I’ve written up a Jupyter notebook below that explores the performance of several commonly used decision...

Recognizing the Limits of Visualization

Visualization (viz) is an incredibly hot topic in the business analytics/data science (DS) world right now. In every job description,...

Choosing Your Data Science Architecture

We data people love our architecture. We obsess over it. Every time Apache announces a new top-level project, we fawn...

What’s a Hadoop, Anyway?

To Hadoop and Beyond is a series dedicated to exploring the basics of distributed computing as it stands today, and to...

Academic Profiles for Data Science

If you’re interested in Data Science (DS) as a field and have read enough job postings, you start to pick...

Measuring the Economics of Skyrim

Recently, I’ve started on a new play-through of Skyrim, the excellent 2011 game that is the most recent single player...

Placing Data Scientists Within Your Organization

There are a number of different ways in which Data Science (DS) teams can be structured, but if your organization...