Data science

Engineers Shouldn’t Write ETL: A Guide to Building a High Functioning Data Science Department

Engineers Shouldn’t Write ETL: A Guide to Building a High Functioning Data Science Department

My Take StitchFix is one of those companies where the algorithm is the product. Anyone can sell you clothes, but...

An overview of proxy-label approaches for semi-supervised learning

My Take The relative expense and unavailability of labelled datasets is a major detractor from the utility of supervised learning...

Using Machine Learning to Classify the Quality of Wine

Using an open-source dataset, I’ve written up a Jupyter notebook below that explores the performance of several commonly used decision...

Choosing Your Data Science Architecture

We data people love our architecture. We obsess over it. Every time Apache announces a new top-level project, we fawn...

What’s a Hadoop, Anyway?

To Hadoop and Beyond is a series dedicated to exploring the basics of distributed computing as it stands today, and to...

Understanding the Core of Hadoop: the MapReduce Algorithm

To Hadoop and Beyond is a series dedicated to exploring the basics of distributed computing as it stands today, and to...

Structuring Your Data Science Workflow

Now What? Congratulations,  you’ve successfully recruited and hired a few data scientists, positioned them in the right place in your...

Academic Profiles for Data Science

If you’re interested in Data Science (DS) as a field and have read enough job postings, you start to pick...

The Different Types of Data Scientists

Previously, we established that the definition of the term Data Science requires you to understand the difference between the key...

What is Data Science Anyway?

One of the biggest enduring problems with the way that Data Science (DS) is managed within institutions of all sizes...