Cross-Validation Examples with Scikit-Learn

Last updated:
Table of Contents

WIP Alert This is a work in progress. Current information is correct but more content may be added in the future.

K-Fold: Manual Splits

from sklearn.model_selection import KFold

kf = KFold(n_splits=5,random_state=42,shuffle=True)


# these are you training data points:
# features and targets
X = ....
y = ....


for train_index, test_index in kf.split(X):

    data_train   = X[train_index]
    target_train = y[train_index]

    data_test    = X[test_index]
    target_test  = y[test_index]

    # if needed, do preprocessing here

    clf = LogisticRegression()
    clf.fit(data_train,target_train)

    preds = clf.predict(data_test)

    # accuracy for the current fold only    
    accuracy = accuracy_score(target_test,preds)

    accuracies.append(accuracy)

# this is the average accuracy over all folds
average_accuracy = np.mean(accuracies)