Bivariate Regression
Submit your assessment as a Word or PDF file. Please include the complementary stata output (tables, graphs) in your Word or PDF document.
- Download the worldclass.dta dataset from Wattle
- You can find the codebook on the website too
- You are interested in exploring the determinants of the level of democracy states, as measured by Freedom House. The dependent variable is: fh03rev
- Use the describe command and the label list (la li) command to see how the dependent variable (only) is coded and labeled. Include the command and output on your answer sheet.
- Pick an interval/ratio or ordinal-level independent variable which you think affects the level of democracy. Explain why the independent variable should have an effect on the dependent variable and tell me the expected direction of the relationship.
- Plot a scatterplot of the independent variable and the dependent variable with a regression line. Make sure that your graph is properly titled. Note that the dependent variable should be on the Y axis. Hint “lfit”
- Run a bivariate regression and correctly interpret the coefficient, the p-value and the r-square. Include the stata output in your Word or PDF file. Make sure that it is nicely formatted and readable in the correct size and font.
- Now rerun the regression with five theoretically-informed control variables (please justify the inclusion of these variables, i.e. write a few sentences to explain why you are including these variables in the regression analysis.)
- Interpret the regression output (coefficients, p-values, and r-square) and explain how and why the multiple regression differs from the bivariate regression. Is it a better model or not? Why? Include the stata output in your Word or PDF file. Make sure that it is nicely formatted and readable in the correct size and font.
Solution
Submit your assessment as a Word or PDF file. Please include the complementary stata output (tables, graphs) in your Word or PDF document.
- Download the worldclass.dta dataset from Wattle
- You can find the codebook on the website too
- You are interested in exploring the determinants of the level of democracy states, as measured by Freedom House. The dependent variable is: fh03rev
- Use the describe command and the label list (la li) command to see how the dependent variable (only) is coded and labeled. Include the command and output on your answer sheet.
- Pick an interval/ratio or ordinal-level independent variable which you think affects the level of democracy. Explain why the independent variable should have an effect on the dependent variable and tell me the expected direction of the relationship.
- Plot a scatterplot of the independent variable and the dependent variable with a regression line. Make sure that your graph is properly titled. Note that the dependent variable should be on the Y axis. Hint “lfit”
- Run a bivariate regression and correctly interpret the coefficient, the p-value and the r-square. Include the stata output in your Word or PDF file. Make sure that it is nicely formatted and readable in the correct size and font.
Source | SS df MS Number of obs = 112
————-+———————————- F(1, 110) = 92.44
Model | 152.350622 1 152.350622 Prob > F = 0.0000
Residual | 181.290003 110 1.64809094 R-squared = 0.4566
————-+———————————- Adj R-squared = 0.4517
Total | 333.640625 111 3.0057714 Root MSE = 1.2838
——————————————————————————
fh03rev | Coef. Std. Err. t P>|t| [95% Conf. Interval]
————-+—————————————————————-
dem_oth | 3.08836 .3212154 9.61 0.000 2.451787 3.724934
_cons | 3.634005 .1893001 19.20 0.000 3.258857 4.009154
The R-squared is 45.66%, so data are moderate close to regression line.
The p-value is less than 0.05, so there is a strongly significant association.
The coefficient is 3.09, so there is a strongly significant positive association between the two factors. An increase by 1 unit in % of other countries in the region that are democratic will increase the level of democracy statesby 3.09 unit.
- Now rerun the regression with five theoretically-informed control variables
- Interpret the regression output (coefficients, p-values, and r-square) and explain how and why the multiple regression differs from the bivariate regression. Is it a better model or not? Why? Include the stata output in your Word or PDF file. Make sure that it is nicely formatted and readable in the correct size and font.
Source | SS df MS Number of obs = 69
————-+———————————- F(6, 62) = 14.65
Model | 123.618535 6 20.6030892 Prob > F = 0.0000
Residual | 87.1930589 62 1.40633966 R-squared = 0.5864
————-+———————————- Adj R-squared = 0.5464
Total | 210.811594 68 3.1001705 Root MSE = 1.1859
The R-squared is 58.64%, so data are moderate close to regression line, better and stronger model compared to previous one, dependent variable could be explained by independents in a higher level.
——————————————————————————
fh03rev | Coef. Std. Err. t P>|t| [95% Conf. Interval]
————-+—————————————————————-
dem_oth | 2.31157 .6227919 3.71 0.000 1.066627 3.556513
open | -.0031574 .0031792 -0.99 0.325 -.0095125 .0031978
gdp_1000 | .110625 .0430068 2.57 0.013 .0246557 .1965943
pop2002 | -1.50e-09 8.28e-10 -1.81 0.075 -3.15e-09 1.57e-10
smoking | .0010781 .0032418 0.33 0.741 -.0054021 .0075583
urban | -.0035752 .0105595 -0.34 0.736 -.0246832 .0175329
_cons | 3.729304 .7429361 5.02 0.000 2.244196 5.214412
——————————————————————————
The significant factors are % of other countries in the region that are democratic and GDP per cap. The % of other countries in the region that are democratic still strongly significant as p-value is below 0.05. The coefficient is a bit smaller, one unit increase % of other countries in the region that are democratic, will increase 2.31 unit region that are democratic. The second model is stronger, based on the R-squared value, and the pure effect of % of other countries in the region that are democratic could be seen as these coefficients are adjusted for the other five explanatory variables.
————————————————————————————————————————————————
name: <unnamed>
log: D:\sol.txt
log type: text
opened on: 22 Oct 2017, 20:40:08
. la li
demoth3_label:
1 <30 pct
2 30-60 pct
3 >60 pct
WRK_REST:
0 Work
1 Rest
2 Both
WOMYR_2:
0 1920 or before
1 After 1920
TYPEREL:
1 roman catholic
2 protestant
3 orthodox
4 jewish
5 muslim
6 hindu
7 eastern
8 other
9 missing
RELCAT:
1 most secular
2 moderate
3 most religious
REGION:
1 Sub-Saharan Africa
2 South Asia
3 East Asia
4 South East Asia
5 Pacific Is-Oceania
6 Middle East/N. Africa
7 Latin America
8 Caribbean/non-Iberic Amer
9 Eastern Europe
10 Industrialized
REGIME:
0 Democracy
1 Dictatorship
PR_SYS:
0 No
1 Yes
POLRTS:
1 Fewest rights
7 Most rights
PARTY:
0 None
1 One party
2 More than 1 party
OIL:
0 No
1 Yes
NATSIZE:
1 Small (under 1m)
2 Moderate (1-29m)
3 Large (30m+)
HI_GDP:
0 Low
1 High
GDPCAP2:
1 low
2 high
FH03REV:
1 Least democratic
7 Most democratic
ETH_HET3:
1 low
2 moderate
3 high
ECONDEV3:
1 Least
2 Middle
3 Most
DEMOC:
0 No
1 Yes
COMPULSE:
0 No
1 Yes
COLONY:
0 none
1 uk
2 france
3 portugal
4 spain
5 netherlands
6 soviet union
7 ottoman
14 belgium
20 other
CIVLIB:
1 Least free
7 Most free
legdom_label:
1 1 low
5 5 high
. twoway lfit fh03rev dem_oth ||scatter fh03rev dem_oth, title(“correlation”) xtitle(“% of other countries in the region that are
> democratic”) ytitle(“level of democracy states”)
. graph export “C:\Users\A\Downloads\Graph.png”, as(png) replace
(note: file C:\Users\A\Downloads\Graph.png not found)
(file C:\Users\A\Downloads\Graph.png written in PNG format)
. regress fh03rev dem_oth
Source | SS df MS Number of obs = 112
————-+———————————- F(1, 110) = 92.44
Model | 152.350622 1 152.350622 Prob > F = 0.0000
Residual | 181.290003 110 1.64809094 R-squared = 0.4566
————-+———————————- Adj R-squared = 0.4517
Total | 333.640625 111 3.0057714 Root MSE = 1.2838
——————————————————————————
fh03rev | Coef. Std. Err. t P>|t| [95% Conf. Interval]
————-+—————————————————————-
dem_oth | 3.08836 .3212154 9.61 0.000 2.451787 3.724934
_cons | 3.634005 .1893001 19.20 0.000 3.258857 4.009154
——————————————————————————
. regress fh03rev dem_oth open gdp_1000 pop2002 smoking urban
Source | SS df MS Number of obs = 69
————-+———————————- F(6, 62) = 14.65
Model | 123.618535 6 20.6030892 Prob > F = 0.0000
Residual | 87.1930589 62 1.40633966 R-squared = 0.5864
————-+———————————- Adj R-squared = 0.5464
Total | 210.811594 68 3.1001705 Root MSE = 1.1859
——————————————————————————
fh03rev | Coef. Std. Err. t P>|t| [95% Conf. Interval]
————-+—————————————————————-
dem_oth | 2.31157 .6227919 3.71 0.000 1.066627 3.556513
open | -.0031574 .0031792 -0.99 0.325 -.0095125 .0031978
gdp_1000 | .110625 .0430068 2.57 0.013 .0246557 .1965943
pop2002 | -1.50e-09 8.28e-10 -1.81 0.075 -3.15e-09 1.57e-10
smoking | .0010781 .0032418 0.33 0.741 -.0054021 .0075583
urban | -.0035752 .0105595 -0.34 0.736 -.0246832 .0175329
_cons | 3.729304 .7429361 5.02 0.000 2.244196 5.214412
——————————————————————————
. log close
name: <unnamed>
log: D:\sol.txt
log type: text
closed on: 22 Oct 2017, 23:16:05