On Real Data Science and the Future of the Business Analyst

What is “real data science” anyway?

tl;dr: most data scientists at Facebook are business analysts and that’s perfectly fine

One of the things that most frustrates data scientists and others who work in and around the field of data science (DS) is the nebulousness of the contours of the field. If you look at job descriptions for DS jobs you’ll find a broad variety of responsibilities and experience requirements. Sometimes these requirements don’t even make sense – imagine asking for 10 years of experience with Spark when Spark, as a framework, was first open sourced in 2010! (Yes, this has actually happened)

The nebulousness of where DS starts is something that is worth looking more into. Our collective inability to understand what truly constitutes DS is a hamper on advancement in the field and finding sustainable and repeatable career pathing for practitioners.

One of the most common things I hear about any given job is that it’s not “real data science”. I think Facebook is most commonly called out as hiring data scientists to do things that are not “real data science”. To that end, I’ve come to better understand what exactly Facebook is trying to achieve when they hire “data scientists”, and I think their doing so is understandable in the context of the industry at large.

What Facebook understands that we don’t

To wit – Facebook has come to understand that products and product managers are better when they are developed hand-in-hand with people who are comfortable with statistics and thinking algorithmically, even when those products are not themselves inherently algorithmic.

Think about the triad of core competencies that make up a good product manager – you’re looking for business acumen, soft skills, and technical competence. This is very much a bit of an iron triangle – trade-offs and constraints are a necessary part of the equation.

Given that product managers tend to have predictably unpredictable backgrounds, we can understand these constraints through that lens. Perhaps candidates with business acumen and soft skills come from a marketing background, or are fresh MBA grads. On the other hand, product managers with technical competence and soft-skills might come from a software engineering management background.

To the degree that any element of the iron triangle is more important than the others, business acumen is intuitively the most important. However, given that the iron triangle is exactly that, you have to be willing accept some kind of trade-off. I submit that Facebook’s innovation in this field is to pair their product managers with resources that can supplement the technical competence component of the triangle, while preferring to hire product managers with stronger expressions of business acumen and soft skills.

The implication of this innovation is that the technical resources, which they call “Data Scientist” tends to focus substantially all of their time and energy on helping product managers think algorithmically, using sound statistical thinking. There are other people at Facebook who can spend their time implementing algorithms and designing new ones (the former comprises their “machine learning” group, and the latter their “Core Data Science” group, which admittedly is a bit confusing).

Environmental and Strategic Considerations

So, if the responsibilities of this product-focused position are substantially different from what a prototypical Data Scientist might do, why call it that? To some degree, I think is explainable by looking at how the DS industry on the whole has changed, but we can also discuss this strategically from a game theoretic perspective, looking at labor as a marketplace.

The Ground Beneath Us

To the first point, it is undeniable that the volume of traditional DS-type work to be done over time has been constant or slightly increasing. There is no reason to believe that it has decreased, nor will decrease at some point in the foreseeable future. On the other hand, the number of people called “Data Scientists” has mushroomed. This is because the title has, to some degree, subsumed titles such as “Statistician” and “Actuarial Analyst” (itself a title of ill repute – many people with actuarial certifications just happen to work at insurers, but do not actually use them for actuarial purposes).

However, the origin of the term “Data Science” irreparably has its origins in unrealistic expectations. Consider how we now tend to facetiously refer to a billion-dollar startup as a “Unicorn”. The term “Unicorn” implies a scarcity and rarity, and the title of “Data Scientist” used to have the same connotation. However, commoditization is inevitable with this kind of stuff (if you think about it, the modern tech industry is wholly built on commoditizing things that previously used to be rare and hard to get) – in this case, the professionalization of data science drives the expansion in scope and size of the population, much like an abundance of VC money has driven the glut of unicorns.

Psychologically, I also think that it is very convenient to have a single subsuming title for the wide variety of analytical, mathematical, and statistical things that we do. This is valuable for workers, of course – whereas previously we might have been “Analytics Professional in the Telecom Industry”, we are now “Data Scientist with domain experience in Telecom”, the implication being that the skills are largely transferable across industries, and that domain experience can be taught. Understanding analytical skills to be generalizable is of course a boon to employers as well, providing a larger pool of talent from which to recruit.

Enter the Business Analyst

From a game theoretic perspective, consider the strategic equilibrium for messaging job seekers in a recruitment context. Alvin Roth at Stanford characterizes a healthy marketplace as exhibiting thickness (having an abundance of participants on both sides), low congestion (latency or noise in conveying intent), and safety (commitments made in the marketplace are likely to be honored). Facebook’s approach to data science is primarily focused on bolstering the thickness of the market.

First, by using data science in an industry agnostic fashion they are able to expand the world of candidates beyond just that strictly defined in the scope of the role – in other words, a data scientist for an ad product need not have prior domain experience.

Facebook’s definition of DS within a product development context also helps to bolster thickness. Traditionally, providing analytical support for product managers is either outsourced to a centralized BI function or delegated to an embedded business analyst. In some (fairly antiquated) waterfall methods of software development, the business analyst is even responsible for eliciting and documenting product requirements.

The problem is that in most waterfall models there’s not a huge amount of upward mobility for business analysts. Ideation comes from the top down and execution is delegated to business managers, rather than a product function. As a result, I think you see specialization pressures for business analysts – they might learn to code and become engineers, sharpen their soft skills and go into management, or leave the organization and join a consulting firm.

From Facebook’s perspective, business analysts looking for upward mobility make a ton of sense for filling their product manager support function. They are already relatively good at working with technology and understanding business requirements, with strong analytical skills. Like peanut butter and chocolate, this is a great pairing with a product manager who has a strong vision and excellent business acumen and soft skills!

Finally, when it comes time for Facebook to hire for this function it makes a lot of sense to call it “Data Science”. Recall that business analysts have relatively strong technical competence, so the growth of machine learning as a functional feature of modern products means that there is plenty of interest in the working in the field. Substantively, the work they’re doing is not dissimilar from the traditional business analyst role, albeit with a little more ownership, and of course, from a comp perspective (at least according to Glassdoor), machine learning and data science roles at Facebook don’t actually pay that differently, so you have an outlet for the upwardly-mobile section of business analysts, especially those with less interest in going into engineering.

What should we do about the Uber-Analyst?

In short – the failure of waterfall models to provide compelling growth opportunities to business analysts opens the door for companies like Facebook to expand the scope of Data Science beyond simply machine learning implementations and academic resource. Indeed, the majority of data scientists at Facebook play this sort of advanced business analyst role, most certainly because there is simply more work to be done in this domain than in the implementation scope.

Is this a good thing for the Data Science world as a whole? Personally, I’m of two minds on this topic. One the one hand, the notion that in order to competently build algorithmic products you have to have academic credentials is patently ridiculous. That is unnecessary gatekeeping, pure and simple. The modern tech industry, for all of its problems, does get one thing extremely right – the emphasis on meritocracy means that if you can do your job well then we don’t care where you went to school, or how you got here, or what you used to be doing.

On the other hand, as a relentless taxonomist, I can’t help but think that maybe we can come up with a better, more descriptive name not just for what the uber-business analyst function is doing, but also many of the functional roles within data science in general. DS as a term is vague and domain agnostic, but I think only the latter actually provides value for the field as a whole. Perhaps it is possible to define DS roles in such a way that have specific functionality, but can still be generalized across industries?

I think we see a model for this in the software engineering world – database engineers can work in any domain, as can front end or back end developers… My thoughts prior do imply that the phenomenon of the “full stack developer” might be misguided in the long term, which, I think, is not an unheard-of opinion. From the DS perspective, specialization would help to address the difference in expectations that a candidate used to, say, a startup DS position would experience when they take the same position at Facebook.

To some degree, we do see this in action today – many of the highest and most academic positions in DS are now “machine learning scientists” or “business operations researchers”. Intuitively, it seems that the next logical step is to redefine the uber-business analyst role in the same way. While specialization inherently trades off against the meritocratic environment that is worth preserving, perhaps it can be done in a way that minimizes that trade-off.

If we can treat the uber-analyst who helps the product manager to build better products with the same reverence as we do the “unicorns” that build our news feeds and recommender systems, then we will have successfully retained some of that value.

Time (and our efforts) will tell if that is possible.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.