Simple and Multiple Linear Regression
Linear regression predicts the value of an independent variable. It is also called bivariate linear regression or simple linear regression. For example, a school can use linear regression to understand if performance can be predicted based on revision time. In this case, the dependent variable would be exam performance which is measured from 0-100 marks. The independent variable would be revision time measured in hours.
Multiple linear regression
There are situations where we may have two or more independent variables instead of one. In such situations, we need to use multiple regression. Alternatively, Pearson’s correlation can be used if we only want to establish whether a linear relationship exists.
Multiple linear regression is an extension of simple linear regression. It is used to predict the value of a dependent variable (outcome variable) based on the value of two more independent variables (predictor variables). Multiple regression also allows analysts to determine the overall fit of the model and the contribution of each independent variable to the total variance.
Linear regression assumptions
Linear regression tests must meet seven assumptions. Missing out any will lead you to an invalid result. In STATA, we cannot test the first and second assumptions since they are related to your choice of variables. You should make sure that your study meets the assumptions below before moving on.
- Dependent variables should be measured at a continuous level. Some of the examples of continuous variables include:
- Height – Measured in inches or feet
- Temperature – measured in degrees Celsius
- Salary – which can be in US dollars
- Reaction time – measured in milliseconds
- Test performance marks – measured from 0-100
- Sales measured in transactions per month
Our simple and multiple linear regression assignment help service caters to all assignments related to this subject. If you are unsure whether your dependent variable is continuous. Feel free to contact us.
- Independent variables should be measured in a categorical or continuous level
You can use an independent t-test (two groups) or one way ANOVA (for more than two groups) if you have a categorical independent variable. Examples of categorical variable include:
- Gender- consist of two groups, male and female
- Ethnicity – can be three groups, Hispanic, Caucasian, or African American
- Physical activity level – can be sedentary, low, moderate and high
- Profession – doctor, therapist, dentist, nurse, etc.
The next assumptions can be checked using STATA. Our online experts recommend you test them in this order. This is because you will no longer be able to use linear regression if a violation of the assumption is not correctable.
It is fairly typical for real-world data to fail one or more of these assumptions. So you shouldn’t be surprised if your data doesn’t conform. If your data fails any of these assumptions, you can overcome this by transforming your data using another statistical test instead. Also, remember that carrying out the test without considering these assumptions in linear regression might lead to incorrect results.
- The dependent and independent variables must have a linear relationship
You can verify this assumption by creating a scatterplot in STATA. Here, you can plot the dependent variable against your independent variable. You can then check the linearity by visually inspecting the scatterplot.
You will either have to transform your data or run a non-linear regression analysis if the scatterplot does not display a linear relationship.
- Your data should not have significant outliers
Outliers refer to single points in your data that do not follow the usual pattern. For example, in an exam, where the marks are recorded from 0-100 and a student scores 156. This is unusual. Outliers can have a negative effect on the regression equation. STATA can carry out case-wise diagnostics to help you detect possible outliers.
- The observations should be independent
You can easily check this using the Durbin-Watson statistic in STATA.
- Your data should show homoscedasticity
This refers to where the variances along the line of best fit remain the same as you move along the line. Real-world data is often messy. To check for homoscedasticity, you can plot the regression standardized residuals against the regression standardized predicted value.
- Errors of the regression line are approximately normally distributed
There are two methods that you can use to check this assumption. You can either use a histogram with a superimposed normal curve or a Normal P-P Plot.
Statistics Assignment Helper can help you carry both simple and multiple regression using STATA, interpret the results, and report the conclusions of the test.