K-fold cross validation is used in the field of machine learning to determine how accurately a learning algorithm will be able to predict data that it was not trained on. When using the k-fold method, the training dataset is randomly partitioned into k groups. The learning algorithm is then trained k times, using all of the training set data points except those in the kth group. The form of the algorithm is as follows:
- Divide the training set into k partitions.
- For each k:
- Make T the dataset that contains all training data points except those in the kth group.
- Train the algorithm using T as the training set.
- Test the trained algorithm, using the kth set as the test set. Record the number of errors.
- Report the mean error over all k test sets.
K-fold cross validation is extremely useful, if the correct value of k is chosen. It is less 'wasteful' of data than test set cross validation, and less 'expensive' than leave-one-out cross validation. In general, if the correct value of k is used, k-fold cross validation provides the best estimate cross validation error.
Unfortunately, there is no theoretically 'perfect' way of determining the appropriate k value. Using the value k = 10 seems to be a good rule of thumb, although the true best value differs for each algorithm and each dataset. It is interesting to note that when k is allowed to increase until it is the size of the total dataset, k-fold cross validation behaves identically to leave-one-out cross validation.