People Skills for Data Science Projects: Lessons LearnedLast updated:
- Know that DS and modelling are just tools
- Ground decisions in data
- Understand the business problem and how your work will be used
- Provide alternatives
- Get a V0 (version zero) out as soon as possible
- Provide lots of visualizations and examples
WIP Alert This is a work in progress. Current information is correct but more content may be added in the future.
Know that DS and modelling are just tools
The business problem is what is important.
It may turn out that you can solve your clients' problems with a set of rules that require no modelling at all. Maybe they don't know that. It is your job to say it.
- "What if we handled these easy cases with hard rules instead of modelling?"
- TODO more examples
Ground decisions in data
- Give stakeholders more insight into how you arrived at a decision point (threshold, cutoff, sample size, etc)
- Provide an opportunity for stakeholders to provide business insight and other feedback
- Help stakeholders see you work based on data, not magic
- "Given the performance of the model for group A, we decided not to use that and focus on group B for now."
- "We suggest the policy should be at a threshold over 0.5 because "
Understand the business problem and how your work will be used
What specific business problem needs to be solved? How is it currently handled?
If you build a model, will it be used in an online or batch setting?
If scores generated by your model are going to be used directly by humans, who are these people?
When facing a problem, don't say it can't be done. There are always alternatives that are not perfect but can still help stakeholders.
- You are not paid to show off Python skills on a notebook. You're paid to solve business problems
- "There's not enough data to train a model for all classes. Would it help to train a model for the top 5 most common classes instead?"
- "The model's accuracy is not very good. But maybe we can add a post processing step to fine-tune the outputs with known business rules."
- "We can't give you a model that predicts the correct class, but we can train a model that outputs the probabilities for the top 3 classes. Is that OK?"
Get a V0 (version zero) out as soon as possible
- TODO derisk, get feedback form users, surface miscommunications and false assumptions
- If you are building a realtime model, have an API serving a dummy version of the model ASAP
- If you building a batch model, have it run end-to-end with a dummy version of the model ASAP
Provide lots of visualizations and examples
Always break visualizations over time
Look. at. the. data. Not just aggregates but also at individual points. Weird patterns are extra useful here.
- TODO it's a way to force interaction with the customers and clients. Like a V0 of a solution, but you have no solution.
- Plot distribution of classes over time (TODO add link to normalized stacked bar plot here)
- Plot raw counts over time.
- Plot distribution of subpopulations over time (same link as above)
- Show individual data points that represent the "common scenario"
- Show individual data points that are "pathological" and "weird"