K Folds Algorithm

The K-Folds Algorithm is a widely-used cross-validation technique in machine learning and statistical modeling, primarily for assessing the performance of a predictive model or the accuracy of an estimation method. The main objective of cross-validation is to evaluate the model's ability to generalize to unseen data, which is crucial for ensuring the model's robustness and reliability. The K-Folds Algorithm achieves this by splitting the dataset into 'K' equal-sized smaller subsets or "folds", and iteratively training and testing the model on these subsets. In the K-Folds Algorithm, the model is trained on K-1 folds and tested on the remaining one fold. This process is repeated K times, with each fold being used as the test set exactly once. The performance of the model is evaluated by averaging the results of each iteration. This method allows for a comprehensive evaluation of the model, as it utilizes the entire dataset for both training and testing, ensuring that the model's performance is not biased by a single train-test split. By using multiple train-test splits, the K-Folds Algorithm helps in reducing the variance in model performance and provides a more accurate estimate of the model's generalization ability.
# K folds cross validation is essential for machine learning
# createFolds function in package caret is easy to use
# here we write our own function

get_k_folds<-function(y = c(),k = 10, isList = TRUE, seed = 123){
  set.seed(seed)
  folds<-sample(1:length(y), length(y))
  every_n<-ceiling(length(y)/k)
  matFolds<-suppressWarnings(matrix(folds, ncol=every_n, byrow = T))
  
  if(isList){
    value<-NULL
    rownames(matFolds)<-paste("Folds",1:k,sep="")
    value<-lapply(1:k, function(x){
      if(x == k){
        return(matFolds[x,][1:(length(y)-every_n*(k-1))])
      }else{
        return(matFolds[x,])
      }
    })
  }else{
    value<-c()
    for(i in 1:length(y)){
      value[i]<-ceiling(i/every_n)
    }
  }
  
  return(value)
}

LANGUAGE:

DARK MODE: