Introducing ETL Markup Toolkit (EMT)
TL;DR – I developed an open source toolkit for writing Spark-native ETL using configurations in a highly sub-scriptable and transparent...
TL;DR – I developed an open source toolkit for writing Spark-native ETL using configurations in a highly sub-scriptable and transparent...
What is “real data science” anyway? tl;dr: most data scientists at Facebook are business analysts and that’s perfectly fine One...
Using an open-source dataset, I’ve written up a Jupyter notebook below that explores the performance of several commonly used decision...
Visualization (viz) is an incredibly hot topic in the business analytics/data science (DS) world right now. In every job description,...
We data people love our architecture. We obsess over it. Every time Apache announces a new top-level project, we fawn...
To Hadoop and Beyond is a series dedicated to exploring the basics of distributed computing as it stands today, and to...
To Hadoop and Beyond is a series dedicated to exploring the basics of distributed computing as it stands today, and to...
Now What? Congratulations, you’ve successfully recruited and hired a few data scientists, positioned them in the right place in your...
If you’re interested in Data Science (DS) as a field and have read enough job postings, you start to pick...
Recently, I’ve started on a new play-through of Skyrim, the excellent 2011 game that is the most recent single player...