Cross-Validation Examples with Scikit-Learn

Last updated:
Table of Contents

WIP Alert This is a work in progress. Current information is correct but more content may be added in the future.

K-Fold: Manual Splits

from sklearn.model_selection import KFold

kf = KFold(n_splits=5,random_state=42,shuffle=True)

# these are you training data points:
# features and targets
X = ....
y = ....

for train_index, test_index in kf.split(X):

    data_train   = X[train_index]
    target_train = y[train_index]

    data_test    = X[test_index]
    target_test  = y[test_index]

    # if needed, do preprocessing here

    clf = LogisticRegression(),target_train)

    preds = clf.predict(data_test)

    # accuracy for the current fold only    
    accuracy = accuracy_score(target_test,preds)


# this is the average accuracy over all folds
average_accuracy = np.mean(accuracies)