Generalized Linear Modelling
Generalized linear models (GLM) refers to conventional linear regression models for a continuous response variable given categorical and/or continuous predictors. Examples of these models include ANOVA, ANCOVA (with fixed effects only) as well as multiple linear regression. The models are fit to be used for weighted least squares and least squares. Nelder and McCullagh(1982) popularized Generalized Linear Models. This response variable in this large class of models is assumed to follow an exponential family distribution with mean.
GLMs provide a common approach to a wide range of response problems. They need a link function to be set. This allows further flexibility in the modeling. Furthermore, a common procedure can be used to fit a generalized linear model and a mechanism for hypothesis testing is available. You can check if the chosen models are accurate by performing diagnostics using deviance residuals.
Examples of Generalized Linear Models
- Simple linear regression
It models the dependence of the mean expected value of a continuous response variable and a set of explanatory variables.
- Binary logistic regression
It models the relationship between a binary response variable and a set of explanatory variables
- Log-linear model
It models the expected cell counts as a function of levels of categorical variables
Unlike logit models, log-linear models are more general. Also, some logit models are equivalent to certain log-linear models. When all explanatory variables are discrete, the log-linear model is equivalent to the Poisson regression model.
If you need further help in understanding these models, do not hesitate to hire our generalized linear modeling online tutors.
The components of any GLM
- Random component
This is the probability distribution of the response variable. For example, the normal distribution for y in linear regression or binomial distribution for y in the binary logistic regression. It is also called the error model or noise model.
- Systematic component
This component specifies the explanatory variable in the model. It highlights their linear combination in creating the so-called linear predictor.
- Link function
The link function component specifies the link between systematic and random components. It also highlights how the expected value of the response relates to the linear predictor of explanatory variables.
Get first-class help with generalized linear modeling assignment right here at Statistics Assignment Helper.
- The data should be independently distributed. The cases should be independent.
- It is not a must for the dependent variable to be normally distributed
- Although GLM does not require dependent and independent variables to have a linear relationship, it assumes there is a linear relationship between the transformed response. This is in terms of the explanatory variable and the link function
- In GLM, the explanatory variable can be the power terms of some other non-linear transformations of the original independent variables.
- Given the model structure, and overdispersion may be present, homogeneity of variance does not need to be satisfied. This is not even possible in many cases.
- Errors need not to be normally distributed but independent
- GLM estimates parameters uses maximum likelihood estimation (MLE), instead of ordinary least squares. As a result, it relies on large sample approximations.
- Measures of goodness-of-fit rely on sufficiently large samples. The heuristic rule in these samples is that not more than 20% of the expected cell counts are less than 5.
Advantages of GLMs over traditional (OLS) regression
- There is no need to transform the response to have a normal distribution
- Modeling is flexible because the choice of the link is separate from the choice of random component.
- There is no need for a constant variance if the link produces additive effects
- Maximum likelihood estimation is used to fit the models. For this reason, the properties of the estimators are optimal.
- GLM also applies all the inference tools and model checking for log-linear and logistic regression models. Examples include deviance, Wald, and likelihood ratio tests, Over-dispersion, Confidence Intervals.
- You can capture all the models suing one procedure in a software package. In SAS, we use PROC GENMOD
Limitations of Generalized Linear Modelling
Apart from the numerous pros, GLMs also have some limitations:
- It is a must for responses to be independent
- In the systematic component, the linear function can only have one linear predictor
However, there are methods you can use to bypass these restrictions. For example:
- You can use NLMIXED in SAS
- Conduct analysis for matched data
- Consider other models or other software packages