What is Data Science Anyway?
One of the biggest enduring problems with the way that Data Science (DS) is managed within institutions of all sizes has to do with the way that it’s defined. If you think about it, the term of Data Science has no meaning internal to itself. That is, the phrase itself conveys nothing about what it actually means. As a result, the absence of rigorous definition for what Data Science entails leads to inconsistent or un-achievable expectations – if you don’t know what you’re trying to do, the default might just be “everything”.
The simple expansion of the term “Data Science” indicates that its practitioners use data to “do science”. What does that even mean? Contrast that with, say “Database Engineering” or “Predictive Analytics”. In each of these terms, the meaning of the phrase is encapsulated within it. Database Engineering involves building applications that involve databases, and and predictive analytics uses analysis and analytical methods to predict other things.
On the other hand, the most that might be said for the term of Data Science is that its practitioners apply scientific methods to the use of data. Sounds good, but there’s nothing particularly special about that – informed and prudent inquiry requires that we use the Scientific Method, or else our results may not be generally applicable or accurate.
What is Data Science?
Let’s think of Data Science as the unified product of a framework. Frameworks are basic structures that underlie a larger concept. In this case, we can conceive of components to what constitutes “Data Science” and how they combine to form the practice as a concept of its own. You can think of it similarly to how there are many different aspects to a typical “Finance” practice – accountants, financial analysts, bankers and cash managers, and so on and so forth.
In the same way, there are multiple roles or skillsets that must be filled to have a functioning DS practice – think of these as specialties, or flavors of the individuals that make up your practice, if you must call them Data Scientists (nobody in finance has the actual job title of “Financier”, except maybe in Hollywood). In no particular order…
Predictive Analytics
One component of the Data Science framework is Predictive Analytics. Predictive Analytics involves the use of statistical methods to model the relationships within historical datasets to make a prediction about future outcomes. Traditionally, the actuarial field uses these techniques heavily. However, as the accessibility of such analytical methods increased, it has expanded into such fields as marketing, retail, travel, healthcare, and communications. Statistical techniques such as regression, some kinds of decision-tree modeling, and time series analysis fall into this category.
Finally, one important feature of predictive analytics is that its outputs are typically deterministic and not stochastic (or probabilistic). Deterministic models, when provided the same input multiple times in a row, produce the same result.
Machine Learning
Another component of the DS framework is Machine Learning. Predictive Analytics and Machine Learning are quite similar, so much so that usually people group them together as the same class of techniques. However, it is useful to treat them separately, because they have very different implications for business. Whereas predictive analytics typically produces models of a deterministic nature, machine learning may produce stochastic (or probabilistic) models, where the output is not guaranteed to be the same for the same input. In theory, the two outputs should be similar in nature, but they are not guaranteed to be the same.
This is an important distinction because, while using a probabilistic model can have superior explanatory power, there are business situations in which it is not appropriate. Credit modeling, for example, requires (by law) that a lender be able to explain exactly why a prospective lendee was denied for a loan, which may not be possible when using some machine learning methods.
Business Intelligence
The final component of Data Science is Business Intelligence. This term has largely gained the same level of inherent and pervasive understanding as the other two, but it does suffer some of the same problem as the term “Data Science” in that its meaning is not neatly contained within itself. Alternative terms might be business analytics or management analytics. In short, Business Intelligence (or BI) involves the application of statistical techniques to answer questions about the state and trajectory of the business. Is revenue increasing? Will it keep doing so? Why is it behaving this way?
It’s worth noting that BI generally requires less of a technical touch than do predictive analytics or machine learning. This does not make it any lesser in importance, necessarily.
That Last Bit of Je-Ne-Sais-Quoi
This is all well and good, but none of it alone is particularly dazzling, and Data Science as a field seems to have a peculiar shine to it. Some go so far as to equate the search for a “true” Data Scientist to searching for a “Unicorn”. It’s an interesting possibility, but it doesn’t play out in reality. Practitioners in the above three fields have been solving business problems with increasing competence for years, so clearly these people actually exist (unlike Unicorns).
I submit that the missing “secret ingredient” to Data Science is actually Interdisciplinary Communication. To the degree that this is a binary feature (it’s not, but let’s think of it as one for now), the ability to communicate across disciplines and competencies is actually incredibly important for technically minded people. The notion that business people should trust what their analysts are saying is so outmoded – these days, we expect to be shown, and to understand at least at some level, the technical concepts with which we work.
Intuitively, a system is more robust and less prone to error when its actors know why they are doing the things that they are doing. The ability of technical analysts to explain to non-technical people how they achieved the result that they achieved, and why it makes sense, is the “secret ingredient” to data science. These days, there are more and more educational programs that purport to teach data science. Many of them instead are teaching predictive analytics and/or machine learning, but it remains to be seen whether or not this core skill can be taught, or if its something that people have to develop on their own.
Applying the Framework in Management
Now that we understand what Data Science actually is, we can use that understanding to progress the way that we hire, manage, and develop data science talent. If you haven’t already, read some of the other pieces that I have written on the topic of managing data scientists, or get in touch if you’re interested in exchanging some ideas.