The One Skill That Data Scientists Are Not Being Trained On

After attending the Toronto Machine Learning Micro-Summit this past week, one theme came up repeatedly during the presentations -  communicate with the business team early, and often, or you'll need to go back and re-do your work. 

There was the story of an insurance company that created a model that recommended whether to replace or fix a car after a damage claim. It sounded great - the Data Scientists got a prototype up and running and had business team buy-in. But, the problem was that their models weren't very accurate. Usually when that happens it means that your data is noisy or the algorithm isn't powerful enough. They went back to their business team and it turns out that they missed 2 key features: the age of the vehicle and if it's a luxury model. 

Another example was a telecom that built a model to optimize call center efficiency. The data science team spent a month building the model and everyone was excited to get it in production. Then, they were told that the call center runs on an outdated application. It turns out that integrating with the application would cost more than the ROI of the project.

I think these situations are happening for 2 reasons: (i) companies are still learning to develop machine learning as a core competency and (ii) don't always have a clear agenda because they don't know what's feasible. As a result, Data Scientists are being hired for their laundry list of hard skills and educational background, but don't always have the domain expertise to understand the business. Even in my Data Science Certificate courses, the focus is on the programming tools, algorithms, and statistics. So we're seeing Data Scientists joining companies and actively looking for problems where they can apply machine learning, then jump into building models too quickly so that they can show traction to management.

The one skill that Data Scientists are not being trained on is Product Discovery - the ability to validate ideas in the cheapest, fastest, way possible. It's about prototyping - starting with low-fidelity and getting feedback at every stage. Stakeholder feedback and buy-in is just as much part of the solution as the outcome decided by a model. 

I can relate. 

As a Product Manager, much of my time is spent evangelizing and educating stakeholders of the products that I'm building. I'm trying to understand how my work impacts them. We are all inherently visual and so, at the very beginning, I use the most basic prototype - a flow diagram. It's just so much easier to explain a diagram over a call to someone rather than describe a solution. And after almost every call, I get asked if I can send them my diagram so that they can look it over again.

I think the examples I described earlier will become less of an issue as the field of machine learning matures and Data Scientists get more domain expertise. It does go to show though, why soft skills and communication are still the most important skills in a workplace.