# 11 Types of Data Products, with Examples

Last updated:- 1) Sort items into predefined classes
- 2) Estimate a numeric value at a specific time
- 3) Predict the behaviour of a value in the future
- 4) Sort items into similar groups
- 5) Recommend items to users
- 6) Generate artificial text
- 7) Choose from alternative strategies, acting on feedback
- 8) Choose from alternative strategies, acting on existing data
- 9) Outlier detection
- 10) Estimate the probability of an event happening
- 11) Rank items to prioritize human action

When you have data but it's not clear which business objectives you can address with it, it is useful to know what are some common ways businesses leverage their data to achieve their objectives.

These are **11** types of **business problems** you can solve with data products:

## 1) Sort items into predefined classes

This is what is usually called *classification* in machine learning circles.

**Example:**Predict which genre a movie is in (thriller, drama, comedy, etc) based on its review text data.**Suggested Techniques:**- Gradient Boosting (easy to use, good performance)
- Neural networks (for unstructured data such as text, audio and images)

## 2) Estimate a numeric value at a specific time

This is usually called *regression*. You want to predict a *quantity* (integer or real-valued number) based on some features.

**Example:**Predict the market price for a house based on characteristics such as its area, location, number of bedrooms, etc.**Suggested Techniques:**- Gradient Boosting
- Neural networks (for unstructured data such as text, audio and images)

## 3) Predict the behaviour of a value in the future

This is different from point **2** because here we are interested in predicting multiple points in the future, not just a single one.

This is usually called *time-series analysis* or *forecasting* in scientific papers.

**Example:**Predict how the price of a stock will perform each day over the next two months.**Suggested Techniques:**- ARIMA models and variations
- Linear Models
- Recurrent Neural Networks (RNNs)

## 4) Sort items into similar groups

This is usually called *clustering* in the data-science field. It is one example of *unsupervised* learning problem because you do not need annotated data (labels) in order to do it.

**Example:**Group users into similar groups using data such as age, gender, country of birth and annual income so that you target them with different email campaigns.**Suggested Techniques:**- K-means and variations
- Expectation-Maximization (EM)

## 5) Recommend items to users

You have a selection of resources (films, products, images, etc) and you want to present to users those items that they are more likely to interact with, to maximize engagement.

**Example:**Suggest movies for people to watch based on what people similar to them have enjoyed watching in the past.**Suggested Techniques:**- Collaborative Filtering
- Model-based approaches (train an ML model based on user features)

## 6) Generate artificial text

Cases where you need a model to generate text that makes sense, given some context (or *prompt*)

**Example:**Using a chatbot to interact with users, interpret messages and help them solve simple problems, while delegating more complex cases to human operators.**Suggested Techniques**- Recurrent Neural Nets (RNNs, LSTMs, etc)
- Transformer models

## 7) Choose from alternative strategies, acting on feedback

You need to work out a *strategy* or policy to achieve a particular goal, adapting as you get feedback on the actions.

**Example:**Define an optimal sequence of contact strategies (e.g. letter at the beginning of the month + follow-up call after 5 days) to increase the conversion of media campaigns for a car dealership.**Suggested Techniques**- A/B tests
- Multi-armed Bandits
- Reinforcement learning

## 8) Choose from alternative strategies, acting on existing data

Find out the strategy that maximizes/minimizes some metric based on historical data.

This is called *Mathematical Optimization* or *Mathematical Programming* in academic circles

**Example:**Define the optimal policy^{2}to give out credit cards based on past data, such that the profit after 5 years is maximized.**Suggested Techniques:**- Curve fitting
- Simulations
- Solvers (MS Excel, Simplex Algorithm, etc)

## 9) Outlier detection

In outlier detection, you want to automatically separate data points that seem different or "weird" in some way, from those that represent typical or normal behaviour.

**Example**: You have clickstream^{1}data for multiple users who visited your website. You want to automatically detect user sessions that seem to be different from the norm, so that human analysts can look at them and see if they look like automated access or fraud attempts.**Suggested Techniques:**- +/- 3 Standard deviations (for one-dimensional data)
- K-means clustering (for simple data)
- Isolation Forest (for more complex data)

## 10) Estimate the probability of an event happening

Estimate the probability of an event so that the business can reason about the probabilities and define a threshold to act on the scores. This is usually called *calibrated regression* in the literature.

**Example:**Scoring default risk to choose who to give credit cards to, then defining a risk threshold below which you will give our credit cards**Suggested Techniques**- Logistic regression
- Any calibrated regression

## 11) Rank items to prioritize human action

Cases where you have a finite amount of human resources and you want to assign them to the most valuable tasks possible.

**Example:**Ranking online purchases according to the likelihood of it being a fraud incident, so that human analysts can analyze only the top 10 riskiest cases each day.**Suggested Techniques:**Any regression or probabilistic classification

### Other uses

There are some other uses that (in my opinion) fall slightly outside the realm of what is generally called **Machine Learning** in the literature. These include:

Establishing Causality between events based on historic data

Numerical simulations

### Footnotes

1: **Clickstream data** refers to logs of steps a given user has gone through while navigating a website or app. May include which pages they viewed, buttons they interacted with, at what time, etc.

2: A **policy** is a business rule that takes a model score and defines which decisions are to be made with it. Example: *"give out cards to applicants whose calibrated model score is equal to or below 0.1"* is a credit underwriting policy.