# Evaluation Metrics for Classification Problems: Quick Examples + References

Last updated:- Metric: Accuracy
- Metric: Precision
- Metric: Recall
- Metric: F1
- Metric: AUC (Area under ROC Curve)
- Metric: Gini Impurity Index

**True Positives (TP)**: should be TRUE, you predicted TRUE

**False Positives (FP)**: should be FALSE, you predicted TRUE

**True Negative (TN)**: should be FALSE, you predicted FALSE

**False Negatives (FN)**: should be TRUE, you predicted FALSE

All machine learning toolkits provide these model evaluation metrics.

## Metric: Accuracy

"What percentage of my predictions are correct?"

```
Accuracy = (TP + TN) / (TP + FP + TN + FN)
```

Good for single label, binary classifcation.

Not good for imbalanced datasets.

- If, in the dataset, 99% of samples are TRUE and you blindly predict TRUE for everything, you'll have 0.99 accuracy, but you haven't actually learned anything.

## Metric: Precision

"Of the points that I predicted TRUE, how many are actually TRUE?"

```
Precision = TP / (TP + FP)
```

Good for multi-label / multi-class classification and information retrieval

Good for unbalanced datasets

## Metric: Recall

"Of all the points that are actually TRUE, how many did I correctly predict?"

```
Recall = TP / (TP + FN)
```

Good for multi-label / multi-class classification and information retrieval

Good for unbalanced datasets

## Metric: F1

"Can you give me a single metric that balances precision and recall?"

```
F1 = (Precision * Recall) / (Precision + Recall)
```

Gives equal weight to precision and recall

Good for unbalanced datasets

## Metric: AUC (Area under ROC Curve)

"Is my model better than just random guessing?"

- The ROC curve is obtained plotting your model's true-positive and false-positive rates at different points.

*If your model scores less than 0.5 AUC, it's no better than just random guessing.*

Source: http://gim.unmc.edu/dxtests/roc3.htm

Source: http://gim.unmc.edu/dxtests/roc3.htm

- Good for cases when you need to estimate how well your model is at
**discriminating**TRUE from FALSE values.

## Metric: Gini Impurity Index

Gini impurity is a different kind of metric because it does not apply to general machine learning methods but mostly to **decision trees**, where it is used as a split criterium (i.e. the Gini index can be used to decide what's the best split at a given tree level).

In short, the gini impurity index measures how **diverse** your dataset is.

Examples:

If your dataset has two classes and 50% of the dataset belongs to one class and 50% to another, you have a perfect split, which means that the Gini index is at a maximum (gini=1.0)

If your dataset has two classes and all instances belong to a single class, you have no diversity, and the Gini Index is at a minimum (gini=0)

*The gini index is at its maximum (red highlight) when 50% of samples have class=True and 50% have class=False.*

Conversely, the gini index is at its minimum (green highlight) when all samples are of the same class (either True of False)

Conversely, the gini index is at its minimum (green highlight) when all samples are of the same class (either True of False)

This short post is part of the

data newsletter. Click here to sign up.

### References:

Felipe
TECHNOLOGY

data-newsletter-4
machine-learning