Regularization is any technique that discourages a model from becoming too complex, so that it generalizes from its training data instead of memorizing it. The “Regularization for Deep Learning” chapter of Goodfellow, Bengio, and Courville’s “Deep Learning” defines it broadly as any modification made to a learning algorithm intended to reduce its generalization error but not its training error.
The classic forms add a penalty to the loss function based on the size of the model’s weights. L2 regularization (ridge regression) penalizes the sum of squared weights, pulling them toward zero and spreading influence across many features. L1 regularization (the lasso) penalizes the sum of absolute weights, which drives some of them to exactly zero and performs automatic feature selection. Deep learning adds further methods such as dropout, which randomly switches off units during training, and early stopping, which halts training before the model begins to fit noise.
The common thread is a deliberate trade: accept slightly worse fit on the training data in exchange for better performance on data the model has never seen. Tuning how much to regularize - the penalty strength - is one of the central knobs in practical machine learning.
Why business readers should care: regularization is the main reason a model trained on last year’s data still works on this year’s customers. Under-regularized models look impressive in testing and fail in production.