Data Visualization for Machine Learning: Examples with Scikit-learn and Matplotlib

Last updated:
Table of Contents

Plot ROC Curve and calculate AUC for scikit-learn model

For more detailed information on the ROC curve see AUC and Calibrated models

The ROC curve and the AUC (the Area Under the Curve) are simple ways to view the results of a classifier.

The ROC curve is good for viewing how your model behaves on different levels of false-positive rates and the AUC is useful when you need to report a single number to indicate how good your model is.

import matplotlib.pyplot as plt
from sklearn import metrics
from sklearn.linear_model import LogisticRegression

# load data
X_train, X_test, y_train, y_test = load_my_data()

# model can be any trained classifier that supports predict_proba()
clf = LogisticRegression(), y_train)

y_preds = clf.predict_proba(X_test)

# take the second column because the classifier outputs scores for
# the 0 class as well
preds = y_preds[:,1]

# fpr means false-positive-rate
# tpr means true-positive-rate
fpr, tpr, _ = metrics.roc_curve(y_test, preds)

auc_score = metrics.auc(fpr, tpr)

plt.title('ROC Curve')
plt.plot(fpr, tpr, label='AUC = {:.2f}'.format(auc_score))

# it's helpful to add a diagonal to indicate where chance 
# scores lie (i.e. just flipping a coin)

plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')

plt.legend(loc='lower right')

roc-auc-score Results may vary, especially in real-life problems.
This is a dummy dataset.

TODO using subplots and looping

Dialogue & Discussion