# Data Visualization for Machine Learning: Examples with Scikit-learn and Matplotlib

Last updated:

## Plot ROC Curve and calculate AUC for scikit-learn model

For more detailed information on the ROC curve see AUC and Calibrated models

The ROC curve and the AUC (the Area Under the Curve) are simple ways to view the results of a classifier.

The ROC curve is good for viewing how your model behaves on different levels of false-positive rates and the AUC is useful when you need to report a single number to indicate how good your model is.

import matplotlib.pyplot as plt
from sklearn import metrics
from sklearn.linear_model import LogisticRegression

X_train, X_test, y_train, y_test = load_my_data()

# model can be any trained classifier that supports predict_proba()
clf = LogisticRegression()
clf.fit(X_train, y_train)

y_preds = clf.predict_proba(X_test)

# take the second column because the classifier outputs scores for
# the 0 class as well
preds = y_preds[:,1]

# fpr means false-positive-rate
# tpr means true-positive-rate
fpr, tpr, _ = metrics.roc_curve(y_test, preds)

auc_score = metrics.auc(fpr, tpr)

plt.title('ROC Curve')
plt.plot(fpr, tpr, label='AUC = {:.2f}'.format(auc_score))

# scores lie (i.e. just flipping a coin)
plt.plot([0,1],[0,1],'r--')

plt.xlim([-0.1,1.1])
plt.ylim([-0.1,1.1])
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')

plt.legend(loc='lower right')
plt.show()


Results may vary, especially in real-life problems.
This is a dummy dataset.

TODO using subplots and looping