Light GBM Algorithm

Light GBM (Light Gradient Boosting Machine) is a scalable and efficient gradient boosting framework developed by Microsoft that can be used for various machine learning applications, including regression, classification, and ranking tasks. It is based on the principle of gradient boosting, which combines the predictions of multiple weak learners (typically decision trees) to create a strong learner that can generalize well and provide accurate predictions. Light GBM distinguishes itself from other gradient boosting algorithms by employing a unique tree growth strategy called "Gradient-based One-Side Sampling" (GOSS), which significantly reduces memory consumption and computational time, making it particularly suitable for large-scale datasets. The core idea behind GOSS is to select a subset of data instances that have larger gradients, as these instances contribute more to the learning process and are more informative in determining the direction of the gradient descent. By focusing on these high-gradient instances, Light GBM can reduce the overall computation cost without compromising the model's performance. Additionally, the algorithm employs an exclusive feature bundling technique that further reduces the computation cost by compressing the feature space. This is particularly useful when dealing with categorical variables or sparse features where many feature values are zero. Overall, Light GBM provides a powerful and flexible solution for machine learning practitioners who seek to tackle large-scale problems with limited resources, without compromising on accuracy and efficiency.
library(RLightGBM)
data(example.binary)
#Parameters

num_iterations <- 100
config <- list(objective = "binary",  metric="binary_logloss,auc", learning_rate = 0.1, num_leaves = 63, tree_learner = "serial", feature_fraction = 0.8, bagging_freq = 5, bagging_fraction = 0.8, min_data_in_leaf = 50, min_sum_hessian_in_leaf = 5.0)

#Create data handle and booster
handle.data <- lgbm.data.create(x)

lgbm.data.setField(handle.data, "label", y)

handle.booster <- lgbm.booster.create(handle.data, lapply(config, as.character))

#Train for num_iterations iterations and eval every 5 steps

lgbm.booster.train(handle.booster, num_iterations, 5)

#Predict
pred <- lgbm.booster.predict(handle.booster, x.test)

#Test accuracy
sum(y.test == (y.pred > 0.5)) / length(y.test)

#Save model (can be loaded again via lgbm.booster.load(filename))
lgbm.booster.save(handle.booster, filename = "/tmp/model.txt")

LANGUAGE:

DARK MODE: