Introduction
Linear regression models a numeric target as a weighted sum of input features:
$$ \hat{y} = w_0 + w_1x_1 + \dots + w_px_p $$
It is simple, interpretable, and still useful as a baseline. Even when a more complex model wins, linear regression helps clarify the signal in the data.
Ordinary Least Squares
Ordinary least squares chooses coefficients that minimize squared error:
$$ \sum_i (y_i - \hat{y}_i)^2 $$
Squared error penalizes large mistakes strongly.
Assumptions
The classic assumptions include:
- Linear relationship between features and target.
- Independent errors.
- Constant error variance.
- No severe multicollinearity.
- Errors are approximately normal for inference.
Prediction can still work when assumptions are imperfect, but inference and interpretation become riskier.
Regularization
Regularization discourages overly large coefficients and can reduce overfitting.
Ridge Regression
Ridge adds an L2 penalty:
$$ \sum_i (y_i - \hat{y}_i)^2 + \lambda \sum_j w_j^2 $$
Ridge shrinks coefficients but usually does not make them exactly zero.
Lasso Regression
Lasso adds an L1 penalty:
$$ \sum_i (y_i - \hat{y}_i)^2 + \lambda \sum_j |w_j| $$
Lasso can set coefficients to zero, so it performs feature selection.
Elastic Net
Elastic net combines L1 and L2 penalties. It is useful when features are correlated and lasso is unstable.
Feature Scaling
Regularized regression usually needs feature scaling. Otherwise the penalty treats features differently because of units rather than importance.
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Ridge
model = make_pipeline(StandardScaler(), Ridge(alpha=1.0))
model.fit(x_train, y_train)
Evaluation
Use:
- MAE for interpretable average error.
- RMSE when large errors matter more.
- Residual plots for systematic patterns.
- Cross-validation when data is small.
- Slice evaluation when errors differ by group.
Linear regression is valuable because mistakes are often inspectable. Use that transparency before moving to a black-box model.