Introducing ETL Markup Toolkit (EMT)
TL;DR – I developed an open source toolkit for writing Spark-native ETL using configurations in a highly sub-scriptable and transparent...
TL;DR – I developed an open source toolkit for writing Spark-native ETL using configurations in a highly sub-scriptable and transparent...
What is “real data science” anyway? tl;dr: most data scientists at Facebook are business analysts and that’s perfectly fine One...
“Hey, you got chocolate in my peanut butter!” “You got peanut butter in my chocolate!” “Delicious!” So goes the old...
Using an open-source dataset, I’ve written up a Jupyter notebook below that explores the performance of several commonly used decision...
Visualization (viz) is an incredibly hot topic in the business analytics/data science (DS) world right now. In every job description,...
We data people love our architecture. We obsess over it. Every time Apache announces a new top-level project, we fawn...
To Hadoop and Beyond is a series dedicated to exploring the basics of distributed computing as it stands today, and to...
If you’re interested in Data Science (DS) as a field and have read enough job postings, you start to pick...
Recently, I’ve started on a new play-through of Skyrim, the excellent 2011 game that is the most recent single player...
There are a number of different ways in which Data Science (DS) teams can be structured, but if your organization...