Introducing ETL Markup Toolkit (EMT)
TL;DR – I developed an open source toolkit for writing Spark-native ETL using configurations in a highly sub-scriptable and transparent...
TL;DR – I developed an open source toolkit for writing Spark-native ETL using configurations in a highly sub-scriptable and transparent...
What is “real data science” anyway? tl;dr: most data scientists at Facebook are business analysts and that’s perfectly fine One...
“Hey, you got chocolate in my peanut butter!” “You got peanut butter in my chocolate!” “Delicious!” So goes the old...
We data people love our architecture. We obsess over it. Every time Apache announces a new top-level project, we fawn...
To Hadoop and Beyond is a series dedicated to exploring the basics of distributed computing as it stands today, and to...
To Hadoop and Beyond is a series dedicated to exploring the basics of distributed computing as it stands today, and to...
Now What? Congratulations, you’ve successfully recruited and hired a few data scientists, positioned them in the right place in your...
If you’re interested in Data Science (DS) as a field and have read enough job postings, you start to pick...
A few days ago, Orange Tsai, who works with Devcore, a group of Taiwanese hackers (in the white-hat sense), posted...
There are a number of different ways in which Data Science (DS) teams can be structured, but if your organization...