Technology

Introducing ETL Markup Toolkit (EMT)

TL;DR – I developed an open source toolkit for writing Spark-native ETL using configurations in a highly sub-scriptable and transparent...

On Real Data Science and the Future of the Business Analyst

What is “real data science” anyway? tl;dr: most data scientists at Facebook are business analysts and that’s perfectly fine One...

Gear Review: Happy Hacking Keyboard Professional 2 (HHKB2)

There’s a term in engineering called a “1% Solution”. 1% solutions solve a problem that only 1% of the population...

Choosing Your Data Science Architecture

We data people love our architecture. We obsess over it. Every time Apache announces a new top-level project, we fawn...

What’s a Hadoop, Anyway?

To Hadoop and Beyond is a series dedicated to exploring the basics of distributed computing as it stands today, and to...

Understanding the Core of Hadoop: the MapReduce Algorithm

To Hadoop and Beyond is a series dedicated to exploring the basics of distributed computing as it stands today, and to...

Structuring Your Data Science Workflow

Now What? Congratulations,  you’ve successfully recruited and hired a few data scientists, positioned them in the right place in your...

Academic Profiles for Data Science

If you’re interested in Data Science (DS) as a field and have read enough job postings, you start to pick...

Measuring the Economics of Skyrim

Recently, I’ve started on a new play-through of Skyrim, the excellent 2011 game that is the most recent single player...

Lessons to be Learned from the Most Recent Facebook Hack

A few days ago, Orange Tsai, who works with Devcore, a group of Taiwanese hackers (in the white-hat sense), posted...