Evaluation Metrics for Classification Problems: Quick Examples + References

Last updated:
Evaluation Metrics for Classification Problems: Quick Examples + References
Source

True Positives (TP): should be TRUE, you predicted TRUE

False Positives (FP): should be FALSE, you predicted TRUE

True Negative (TN): should be FALSE, you predicted FALSE

False Negatives (FN): should be TRUE, you predicted FALSE

All machine learning toolkits provide these model evaluation metrics.

Metric: Accuracy

"What percentage of my predictions are correct?"

Accuracy = (TP + TN) / (TP + FP + TN + FN)
  • Good for single label, binary classifcation.

  • Not good for imbalanced datasets.

    • If, in the dataset, 99% of samples are TRUE and you blindly predict TRUE for everything, you'll have 0.99 accuracy, but you haven't actually learned anything.

Metric: Precision

"Of the points that I predicted TRUE, how many are actually TRUE?"

Precision = TP / (TP + FP)
  • Good for multi-label / multi-class classification and information retrieval

  • Good for unbalanced datasets

Metric: Recall

"Of all the points that are actually TRUE, how many did I correctly predict?"

Recall = TP / (TP + FN)
  • Good for multi-label / multi-class classification and information retrieval

  • Good for unbalanced datasets

Metric: F1

"Can you give me a single metric that balances precision and recall?"

F1 = (Precision * Recall) / (Precision + Recall)
  • Gives equal weight to precision and recall

  • Good for unbalanced datasets

Metric: AUC (Area under ROC Curve)

"Is my model better than just random guessing?"

  • The ROC curve is obtained plotting your model's true-positive and false-positive rates at different points.

area under roc curve If your model scores less than 0.5 AUC, it's no better than just random guessing.

Source: http://gim.unmc.edu/dxtests/roc3.htm

  • Good for cases when you need to estimate how well your model is at discriminating TRUE from FALSE values.

This short post is part of the data newsletter. Click here to sign up.


References:

Dialogue & Discussion