# Bivariate Regression

Submit your assessment as a Word or PDF file. Please include the complementary stata output (tables, graphs) in your Word or PDF document.

1. You can find the codebook on the website too
2. You are interested in exploring the determinants of the level of democracy states, as measured by Freedom House. The dependent variable is: fh03rev
1. Use the describe command and the label list (la li) command to see how the dependent variable (only) is coded and labeled. Include the command and output on your answer sheet.
3. Pick an interval/ratio or ordinal-level independent variable which you think affects the level of democracy. Explain why the independent variable should have an effect on the dependent variable and tell me the expected direction of the relationship.
1. Plot a scatterplot of the independent variable and the dependent variable with a regression line. Make sure that your graph is properly titled. Note that the dependent variable should be on the Y axis.  Hint “lfit”
2. Run a bivariate regression and correctly interpret the coefficient, the p-value and the r-square. Include the stata output in your Word or PDF file. Make sure that it is nicely formatted and readable in the correct size and font.
4. Now rerun the regression with five theoretically-informed control variables (please justify the inclusion of these variables, i.e. write a few sentences to explain why you are including these variables in the regression analysis.)
1. Interpret the regression output (coefficients, p-values, and r-square) and explain how and why the multiple regression differs from the bivariate regression. Is it a better model or not? Why? Include the stata output in your Word or PDF file. Make sure that it is nicely formatted and readable in the correct size and font.

Solution

Submit your assessment as a Word or PDF file. Please include the complementary stata output (tables, graphs) in your Word or PDF document.

1. You can find the codebook on the website too
1. You are interested in exploring the determinants of the level of democracy states, as measured by Freedom House. The dependent variable is: fh03rev
1. Use the describe command and the label list (la li) command to see how the dependent variable (only) is coded and labeled. Include the command and output on your answer sheet.
2. Pick an interval/ratio or ordinal-level independent variable which you think affects the level of democracy. Explain why the independent variable should have an effect on the dependent variable and tell me the expected direction of the relationship.
1. Plot a scatterplot of the independent variable and the dependent variable with a regression line. Make sure that your graph is properly titled. Note that the dependent variable should be on the  Y axis.  Hint “lfit”
1. Run a bivariate regression and correctly interpret the coefficient, the p-value and the r-square. Include the stata output in your Word or PDF file. Make sure that it is nicely formatted and readable in the correct size and font.

Source |       SS           df       MS      Number of obs   =       112

————-+———————————-   F(1, 110)       =     92.44

Model |  152.350622         1  152.350622   Prob > F        =    0.0000

Residual |  181.290003       110  1.64809094   R-squared       =    0.4566

Total |  333.640625       111   3.0057714   Root MSE        =    1.2838

——————————————————————————

fh03rev |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

————-+—————————————————————-

dem_oth |    3.08836   .3212154     9.61   0.000     2.451787    3.724934

_cons |   3.634005   .1893001    19.20   0.000     3.258857    4.009154

The R-squared is 45.66%, so data are moderate close to regression line.

The p-value is less than 0.05, so there is a strongly significant association.

The coefficient is 3.09, so there is a strongly significant positive association between the two factors. An increase by 1 unit in % of other countries in the region that are democratic will increase the level of democracy statesby 3.09 unit.

1. Now rerun the regression with five theoretically-informed control variables
1. Interpret the regression output (coefficients, p-values, and r-square) and explain how and why the multiple regression differs from the bivariate regression. Is it a better model or not? Why? Include the stata output in your Word or PDF file. Make sure that it is nicely formatted and readable in the correct size and font.

Source |       SS           df       MS      Number of obs   =        69

————-+———————————-   F(6, 62)        =     14.65

Model |  123.618535         6  20.6030892   Prob > F        =    0.0000

Residual |  87.1930589        62  1.40633966   R-squared       =    0.5864

Total |  210.811594        68   3.1001705   Root MSE        =    1.1859

The R-squared is 58.64%, so data are moderate close to regression line, better and stronger model compared to previous one, dependent variable could be explained by independents in a higher level.

——————————————————————————

fh03rev |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

————-+—————————————————————-

dem_oth |    2.31157   .6227919     3.71   0.000     1.066627    3.556513

open |  -.0031574   .0031792    -0.99   0.325    -.0095125    .0031978

gdp_1000 |    .110625   .0430068     2.57   0.013     .0246557    .1965943

pop2002 |  -1.50e-09   8.28e-10    -1.81   0.075    -3.15e-09    1.57e-10

smoking |   .0010781   .0032418     0.33   0.741    -.0054021    .0075583

urban |  -.0035752   .0105595    -0.34   0.736    -.0246832    .0175329

_cons |   3.729304   .7429361     5.02   0.000     2.244196    5.214412

——————————————————————————

The significant factors are % of other countries in the region that are democratic and GDP per cap. The % of other countries in the region that are democratic still strongly significant as p-value is below 0.05. The coefficient is a bit smaller, one unit increase % of other countries in the region that are democratic, will increase 2.31 unit region that are democratic. The second model is stronger, based on the R-squared value, and the pure effect of % of other countries in the region that are democratic could be seen as these coefficients are adjusted for the other five explanatory variables.

————————————————————————————————————————————————

name:  <unnamed>

log:  D:\sol.txt

log type:  text

opened on:  22 Oct 2017, 20:40:08

. la li

demoth3_label:

1 <30 pct

2 30-60 pct

3 >60 pct

WRK_REST:

0 Work

1 Rest

2 Both

WOMYR_2:

0 1920 or before

1 After 1920

TYPEREL:

1 roman catholic

2 protestant

3 orthodox

4 jewish

5 muslim

6 hindu

7 eastern

8 other

9 missing

RELCAT:

1 most secular

2 moderate

3 most religious

REGION:

1 Sub-Saharan Africa

2 South Asia

3 East Asia

4 South East Asia

5 Pacific Is-Oceania

6 Middle East/N. Africa

7 Latin America

8 Caribbean/non-Iberic Amer

9 Eastern Europe

10 Industrialized

REGIME:

0 Democracy

1 Dictatorship

PR_SYS:

0 No

1 Yes

POLRTS:

1 Fewest rights

7 Most rights

PARTY:

0 None

1 One party

2 More than 1 party

OIL:

0 No

1 Yes

NATSIZE:

1 Small (under 1m)

2 Moderate (1-29m)

3 Large (30m+)

HI_GDP:

0 Low

1 High

GDPCAP2:

1 low

2 high

FH03REV:

1 Least democratic

7 Most democratic

ETH_HET3:

1 low

2 moderate

3 high

ECONDEV3:

1 Least

2 Middle

3 Most

DEMOC:

0 No

1 Yes

COMPULSE:

0 No

1 Yes

COLONY:

0 none

1 uk

2 france

3 portugal

4 spain

5 netherlands

6 soviet union

7 ottoman

14 belgium

20 other

CIVLIB:

1 Least free

7 Most free

legdom_label:

1 1 low

5 5 high

. twoway  lfit fh03rev dem_oth ||scatter fh03rev dem_oth, title(“correlation”) xtitle(“% of other countries in the region that are

> democratic”) ytitle(“level of democracy states”)

. regress fh03rev dem_oth

Source |       SS           df       MS      Number of obs   =       112

————-+———————————-   F(1, 110)       =     92.44

Model |  152.350622         1  152.350622   Prob > F        =    0.0000

Residual |  181.290003       110  1.64809094   R-squared       =    0.4566

Total |  333.640625       111   3.0057714   Root MSE        =    1.2838

——————————————————————————

fh03rev |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

————-+—————————————————————-

dem_oth |    3.08836   .3212154     9.61   0.000     2.451787    3.724934

_cons |   3.634005   .1893001    19.20   0.000     3.258857    4.009154

——————————————————————————

. regress fh03rev dem_oth open gdp_1000 pop2002 smoking urban

Source |       SS           df       MS      Number of obs   =        69

————-+———————————-   F(6, 62)        =     14.65

Model |  123.618535         6  20.6030892   Prob > F        =    0.0000

Residual |  87.1930589        62  1.40633966   R-squared       =    0.5864

Total |  210.811594        68   3.1001705   Root MSE        =    1.1859

——————————————————————————

fh03rev |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

————-+—————————————————————-

dem_oth |    2.31157   .6227919     3.71   0.000     1.066627    3.556513

open |  -.0031574   .0031792    -0.99   0.325    -.0095125    .0031978

gdp_1000 |    .110625   .0430068     2.57   0.013     .0246557    .1965943

pop2002 |  -1.50e-09   8.28e-10    -1.81   0.075    -3.15e-09    1.57e-10

smoking |   .0010781   .0032418     0.33   0.741    -.0054021    .0075583

urban |  -.0035752   .0105595    -0.34   0.736    -.0246832    .0175329

_cons |   3.729304   .7429361     5.02   0.000     2.244196    5.214412

——————————————————————————

. log close

name:  <unnamed>

log:  D:\sol.txt

log type:  text

closed on:  22 Oct 2017, 23:16:05