Helping Data Science Projects Succeed: 5 Tips on how to Avoid Becoming a Statistic

Helping Data Science Projects Succeed: 5 Tips on how to Avoid Becoming a Statistic

Last updated:
Table of Contents

WIP Alert This is a work in progress. Current information is correct but more content may be added in the future.

Data science projects are complex and very often fail.

What does mean for a project to fail?

One way to know if a project has failed is to ask the client: "Would you have done this project if you had known how it ended?" and they answer No.

Reasons projects fail

  • Data scientists/project managers failed to iron out questions before the project

  • Impossible expectations from clients

  • Out of control complexity

  • Mismatch between train time / inference time data access and functioning

  • Unmet assumptions

Avoiding failures

So what can team members (any role) do to increase chances of success?

Following are some worthwhile tips that you can do to help avoid failure in data science projects, either as a practitioner or as a project lead:

Understand why and how your solution will be used

It is critical that you understand the scenarios under with the solution you are building will be used in practice.

You need to understand:

  • What business problem are you trying to solve? How does this help the customer make money? How is this process done currently?

  • How will the solution be used, technically (realtime APIs? batch runs? something else?)

  • Who will consume the output your solution produces (Other systems? Humans? What is their level of expertise?)

Make sure there is data

It is also very often the case that clients have problems to solve but the data that they have is not useful to build solutions on top of.

It is very important to understand what sort of data you will have and what quality it is. Here are some questions to ask:

  • How far back does our data go?

  • How was it collected? Does it cover all cases or some cases are missing?

  • What does it look like in terms of distributions, etc?

When building models, it is likely the case that you will need a couple months' data in advance for you to be able to model anything.

  • So if you have to start collecting data you don't have today, you will have to wait a couple months until you can model anything on top of that.

Avoid misunderstandings

It is often the case that clients (non-technical or otherwise) will misunderstand and assume all sorts of things before and during a project. You want to be very clear when communicating with them:

  • Reframe concepts and ideas

    One way to help avoid misunderstandings is to describe things in another way, i.e. to reframe a concept or idea.

    In data science projects we usually deal with complex and highly abstract concepts. It is not

    Examples of ways to do this are: "So what you mean is...", "Would I be right in saying that..."

  • Draw things

    TODO diagrams

  • Give examples

    TODO

Have people explain how they work to you

Have you ever had someone ask you for help and, when they had finished explaining their problem to you, they said: "Nevermind, I understood this now. Thanks for listening"?

This happens because the mere act of saying things out loud helps whoever is saying it better understand what they are thinking.

It is very often that you need to create solutions for tasks that don't have a well-defined process, so you have to dig for information:

Whenever possible, ask clients to explain their work to you. Here are some questions you may want to ask:

  • What exactly do you do in case XYZ happen?

  • Walk me through an average task. What things do you look at, what tools do you use, etc.

  • What is the difference between X and Y?

Push an MVP ASAP

"Can't we do a simpler version first?"

This can be anything like an initial version of an ML model, with hardcoded rules just so clients can see what the output will look like.