Engineers Shouldn’t Write ETL: A Guide to Building a High Functioning Data Science Department

My Take

StitchFix is one of those companies where the algorithm is the product. Anyone can sell you clothes, but only StitchFix can sell you clothes with their (very good) algorithm. So, it should come as no surprise that they’ve got the Pipeline to Production for their algorithms down pat. One of the persistent questions in the data science field is “Should Data Scientists be Engineers?” At a functional level, the reason why they SHOULD be engineers is obvious; it gets things done. However, there’s an argument to be made that comparative advantage dictates that those responsibilities should be specialized – ie, they should NOT be engineers.

This piece answers the question in the affirmative – Data Scientists should be engineers because writing ETL and pipelines for data science workflows is pretty thankless work from an engineer’s perspective, and they believe that any comparative advantage is overcome by the ability for Data Science teams to act independently and with ownership.

Their Take

If you read the recruiting propaganda of data science and algorithm development departments in the valley, you might be convinced that the relationship between data scientists and engineers is highly collaborative, organic, and creative. Just like peas and carrots.

However, it’s not a well kept secret that this is seldom the case. Most shops foster a relationship between engineers and scientists that lies somewhere in the spectrum between non-existent1 and highly dysfunctional.

https://multithreaded.stitchfix.com/blog/2016/03/16/engineers-shouldnt-write-etl/

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.