Supervised Learning: Regression & Classification #1

Gradient Boosted Trees: XGBoost, LightGBM, CatBoost, GBM

XGBoost
LightGBM
CatBoost
GBM

Best for: High-performing tabular prediction Aliases: XGBoost, LightGBM, CatBoost, GBM

How it works

$$F_m(x)=F_{m-1}(x)+\nu\,h_m(x)$$

Builds an additive model stage by stage: each new tree $h_m$ is fit to the pseudo-residuals $r_i=-\left[\frac{\partial L(y_i,F)}{\partial F}\right]_{F=F_{m-1}}$, i.e. the negative gradient of the loss evaluated at the current prediction. With a learning rate $\nu$ (shrinkage), the $m$-th tree is chosen to minimise $\sum_i L\bigl(y_i,F_{m-1}(x_i)+h_m(x_i)\bigr)$, typically by a least-squares fit to the residuals followed by a line search for the optimal leaf weight.

When to use

Tabular supervised tasks where predictive performance matters more than interpretability; the default winner on structured data.

Watch out

Overfits without early stopping and regularization; slower to train than linear models; cannot extrapolate beyond the training range.

Common fields

Finance · insurance · fraud detection · pricing · marketing · medicine · churn prediction