# Regression Models: GAM, GLM and GLMM

### Overview

Generalized linear model (GLM) is a cure to some issues posted by ordinary linear regression. In the well-known linear regression model, we often assume

### GLM

We note that GLM has three major parts:

- An exponential family of probability distributions:
, some examples include: - normal
- exponential
- gamma
- chi-squared
- beta
- Dirichlet
- Bernoulli
- categorical
- Poisson

- A function of predictor (in GLM it is
, in extended models, it can be other things, see GAM and GLMM), we can estimate via *maximum likelihood*or Bayesian methods like*laplace approximation*and*Gibbs sampling*, etc. - A link function
such that (sometime we may have tractable distribution for variance

#### 1. Pros and Cons for GLM and GLMM

Pros:

- Easy to interpret
- Easy to grasp
- Coefficients can be further used in numerical models
- Easy to extend: link functions, fixed and random effects, correlation structures

Cons:

- Not good for dynamic models (the model is not linear and transformation may not help or would loose information

### Generalized additive models (GAMs)

- GAMs are extensions to GLMs in which the linear predictor
is not restricted to be linear in the covariates but is the sum of smoothing functions applied to the each . For example,

- Is useful if relationship between Y and X is likely to be non-linear but we
**don't have any theory or any mechanistic model**to suggest a particular functional form - Each
is linked with by a **smoothing function**instead of a coefficient - GAMS are
**data-driven**rather than model-driven, that is, the resulting fitted values do not come from an a priori model (non-parametric) **All of the distribution families**allowed with GLM are available with GAM

#### 1. Pros and Cons for GAM

**Pros**:- By combining the basis functions GAMs can represent a large number of functional relationship (to do so they rely on the assumption that the true relationship is likely to be smooth, rather than wiggly)
- Particularly useful for uncovering nonlinear effects of numerical covariates, and for doing so in an "automatic" fashion
- More Flexible as now each sample's Y is associated with its X by a smoothing function instead of a coefficient

**Cons**:- Interpretability of the coefficient
need to be estimated graphically - Coefficients are not easily transferable to other datasets and parameterization
- Very sensitive to gaps in the data and outliers
- Lack underlying theory for the use of hypothesis tests
one solution is to do bootstrapping and get aggregated result for more reliable confidence bands

- Interpretability of the coefficient

#### 2. Examples of GAM (different predictor representation functions):

Loess (Locally weighted regression smoothing)

- The key factor is the
**span width**(usually set to be a proportion of the data set: 0.5 as a standard starting point) - Main idea: Split the data into separate blobs using sliding windows and fit linear regressions in each blob/interval
- Pros:
- Easily interpretable. At each test case, a local linear model is fit (eventually explained by linear behaviours)
- a popular way to see smooth trends on scatterplots

- Cons:
- If there are a lot of data points, fitting a LOESS over the entire range of the predictor can be slow because so many local linear regressions must be fit.

- The key factor is the
Regression Splines (piecewise polynomials over usually a finite range)

- Main constraint is that the splines must remain smooth and continuous at knots
- To avoid overfitting of splines, penalty terms are added
- The penalty term also reflects the
**degree of smoothness**in the regression - The less smooth the regression is (after fitting the spline functions), the higher the penalty terms
- Pros:
**cover all sorts of nonlinear trends**and are**computationally very attractive**because spline terms fit exactly into a least squares linear regression framework. Least squares models are very easy to fit computationally

- Cons:
- It is possible to create multidimensional splines by creating interactions between spline terms for different predictors. This suffers from the
**curse of dimensionality**like KNN because we are trying to**estimate a wavy surface in a large dimensional (many variable) space where data points will only sparsely cover the many many regions of the space**

- It is possible to create multidimensional splines by creating interactions between spline terms for different predictors. This suffers from the

### GLMM

The model has the form:

#### 1. Code implementation

I recommend beginners to use `statsmodels` package because the output via `.summary()`

function is very clear to read. For advanced users, you may implement the function yourself by referring to the mathematical expressions and package documentations from the following

`statsmodels`:`statsmodels.formula.api.mixedlm`

`pymc3``theano``pystan``tensorflow``keras`

#### 2. A sample code using `statsmodels`

1 | import statsmodels.formula.api as smf |

Regression Models: GAM, GLM and GLMM

https://criss-wang.github.io/post/blogs/supervised/regressions-3/