Original content: Linear Digressions Podcast: Disciplined Data Science
Here are a couple of principles to keep in mind to if you want to avoid getting stuck in low-level data analytics, in any organization, but particularly larger ones.
These principles were taken from the Linear Digressions podcast, in an episode where suggest we apply these principles so as to add more "method" or "discipline" to the practice of data science.
You need to make your work as a data scientist easily available and findable by other teams, otherwise people may not fully benefit from what you do, or worse, someone may redo work you have already done.
A couple of easy ways to make your work available are Project Wikis, APIs or dashboarding.
Once you've solved a problem, automate it (e.g. Cron Jobs, Workflows) so that you can move on to other tasks.
Data science is very often a team sport. Knowing how to collaborate with other team members (across roles) is very important.
Use tools (for data science, machine learning, etc) that are widely used by the community to minimize risk and make it easier for other teams to verify your work or help you out.
People have expectations on how accurate and reliable your work is.
When you publish or otherwise make your work available as a data product, be careful not to let them down so that they know they can trust you.
The points above were taken from the podcast I mentioned at the beginning of the post.
Here are a couple of tips of my own:
Data science code is software too
- I.e. all known good practices from software engineering still apply (low coupling, high cohesion, testable code, comments and documentation)
Focus on business needs
- Data science is only as good as the business problems it solves. Sometimes a simple exploratory analysis or simple model is a cost-effective solution to a simple problem. There's no place for egos here.
This short post is part of the Data Newsletter. Click here to sign up.