Cross-Validation Examples with Scikit-Learn
Last updated:Table of Contents
WIP Alert This is a work in progress. Current information is correct but more content may be added in the future.
K-Fold: Manual Splits
from sklearn.model_selection import KFold
kf = KFold(n_splits=5,random_state=42,shuffle=True)
# these are you training data points:
# features and targets
X = ....
y = ....
accuracies = []
for train_index, test_index in kf.split(X):
data_train = X[train_index]
target_train = y[train_index]
data_test = X[test_index]
target_test = y[test_index]
# if needed, do preprocessing here
clf = LogisticRegression()
clf.fit(data_train,target_train)
preds = clf.predict(data_test)
# accuracy for the current fold only
accuracy = accuracy_score(target_test,preds)
accuracies.append(accuracy)
# this is the average accuracy over all folds
average_accuracy = np.mean(accuracies)