Introducing ETL Markup Toolkit (EMT)
TL;DR – I developed an open source toolkit for writing Spark-native ETL using configurations in a highly sub-scriptable and transparent...
TL;DR – I developed an open source toolkit for writing Spark-native ETL using configurations in a highly sub-scriptable and transparent...
What is “real data science” anyway? tl;dr: most data scientists at Facebook are business analysts and that’s perfectly fine One...
There’s a term in engineering called a “1% Solution”. 1% solutions solve a problem that only 1% of the population...
We data people love our architecture. We obsess over it. Every time Apache announces a new top-level project, we fawn...
To Hadoop and Beyond is a series dedicated to exploring the basics of distributed computing as it stands today, and to...
To Hadoop and Beyond is a series dedicated to exploring the basics of distributed computing as it stands today, and to...
Now What? Congratulations, you’ve successfully recruited and hired a few data scientists, positioned them in the right place in your...
If you’re interested in Data Science (DS) as a field and have read enough job postings, you start to pick...
Recently, I’ve started on a new play-through of Skyrim, the excellent 2011 game that is the most recent single player...
A few days ago, Orange Tsai, who works with Devcore, a group of Taiwanese hackers (in the white-hat sense), posted...