Full Time-Binary

Full Time-Binary

Instructions:

The purpose of this assignment is to practice the essential commands in Stata and practice interpreting the output. Create a Do-File to track the commands you use. Copy and paste your output results directly into this word document for each question. When pasting results into this word document, be sure to include the command that produced the results AND the outcome of the command as it appears in the results window. When finished, upload this word file to Canvas in MS Word, along with the Do-File, and the PDF that you create.

Dataset:

  1. Identify the number of observations and variables in the dataset.
  2. One variable in your dataset is the institutional Degree of Urbanization (urbanization). Create value labels for the variable so that the variable values have the following labels: 

a.11: City-large

  1. 12: City-midsize
  2. 13: City-small
  3. 21: Suburb-large
  4. 22: Suburb-midsize
  5. 23: Suburb-small
  6. 32: Town-distant
  7. 33: Town-remote
  8. 41: Rural-fringe
  9. 42: Rural-distant
  1. Use a command to generate the frequency distribution of the variable urbanization. Paste the output table into the word document.
  2. Use a command to find the mean, standard deviation, minimum value, and maximum value for the number of associate’s degrees conferred (associate_deg), full-time student retention rate (ft_retention), total fall enrollment (total_enroll), full-time enrollment (ft_enroll), and part-time enrollment (pt_enroll) in a table. Generate this information within the same table and paste each table into the word document.
  3. Create a histogram of the variables in Question 4 to review the distribution of the data and identify any outliers. What, if any, outliers do you observe? (Note, you do not need to paste histograms into the word document or save the histograms and submit to Canvas – just include the commands you used in your do-files).
  4. Create a new variable called total_enroll2 that represents the sum of part-time and full-time students. Compare the mean, standard deviation, minimum value, and maximum value of total_enroll2 to the variable total_enroll. How similar or different are they?
  5. Create a new variable that represents the percent of full-time students enrolled in fall 2014 and label the variable percent_full.
  6. Create a table that reports the mean, standard deviation, minimum value, and maximum value of percent_full in a table. Paste the table into the word document.
  7. Create a histogram of percent_full.
  8. Create a boxplot of percent_full.
  9. Find the median for percent_full.
  10. Describe the data based on Questions 7-11. What do the mean, standard deviation, minimum value, maximum value, median, box plot, and histogram tell you about the percent of full-time students enrolled in California Community Colleges in fall 2014? (e.g., think Center, Variation, Distribution, Outliers, and Change Over Time)
  11. Create a new binary variable called percentfull_binary that divides the values for percent_full into two categories. The first category should represent all values less than the median and be coded as 1, and the second category should represent all values greater or equal to the median and be coded as 2.
  12. Create a variable label for percentfull_binary called “Percent of Full Time-Binary”. Label the value of 1 as “Less than 50%” and 2 as “Greater than or Equal to 50%.”
  13. Create a frequency table for the variable Paste the table into the word document.
  14. Create a table with the mean, standard deviation, minimum value, and maximum value for the number of associate’s degrees (associate_deg) for colleges whose percent of full-time students is BELOW the median. Paste the table into the word document.
  15. Create a table with the mean, standard deviation, minimum value, and maximum value for the number of associate’s degrees (associate_deg) for colleges whose percent of full-time students is AT OR ABOVE the median. Paste the table into the word document.
  16. Interpret the results of the output for questions 16 and 17. What do these data tell you about the similarities and differences in the number of associate’s between colleges below the median and colleges at or above the median?
  17. Print your entire output to a PDF by typing: translate @Results assignment1.pdf
  18. Upload the PDF, this Assignment in a Word document, and the Do-File to Canvas. 

Solution

1.

The describe command indicates that there are 104 observations and 13 variables in the data set. 

2.

The commands to do this follow:

label define URBAN 11 “City-large” 12 “City-midsize” 13 “City-small” 21 “Suburb-large” 22 “Suburb-midsize” 23 “Suburb-small” 32 “Town-distant” 33 “Town-remote” 41 “Rural-fringe” 42 “Rural-distant”

label values urbanization URBAN 

3.

                The Stata command and its output follow:

. tab1 urbanization

->tabulation of urbanization

 

urbanization |      Freq.     Percent        Cum.

—————+———————————–

City-large |         20       19.23       19.23

City-midsize |         15       14.42       33.65

City-small |          9        8.65       42.31

Suburb-large |         33       31.73       74.04

Suburb-midsize |          6        5.77       79.81

Suburb-small |          3        2.88       82.69

Town-distant |          4        3.85       86.54

Town-remote |          1        0.96       87.50

Rural-fringe |         10        9.62       97.12

Rural-distant |          3        2.88      100.00

—————+———————————–

Total |        104      100.00

4.

The Stata command and its results follow:

tabstatassociate_degft_retentiontotal_enrollft_enrollpt_enroll, statistics( mean sd min max ) columns(statistics)

variable |      mean        sd       min       max

————-+—————————————-

associate_~g|  763.1538  466.6581         0      1847

ft_retention|  67.84615  9.763981        18        92

total_enroll|  12178.04  7435.589       581     36012

ft_enroll|  3933.221  2676.461       221     11086

pt_enroll|  8244.817  5076.017       101     28273

——————————————————

5. 

The histograms indicate that there are no outliers for the number of associate degrees granted (associate_degree) and full time enrollment (ft_enroll), an outlier in the 10-20% range for percent retention of full time students (ft_retention), an outlier with more than 36,000 students in total enrollment (total_enroll) andoutliers with between 22,000 and 24,000 and more than 28,000 part-time students. The histogram commands follow:

histogramassociate_deg, width(200) start(0) percent xtitle(Number of associate degrees) xlabel(0(200)2000)

histogramft_retention, width(10) start(0) percent xtitle(Percent retention of ft students) xlabel(0(10)100)

histogramtotal_enroll, width(2000) start(0) percent xtitle(Total enrollment) xlabel(0(4000)36000)

histogramft_enroll, width(1000) start(0) percent xtitle(Full time enrollment) xlabel(0(2000)12000)

histogrampt_enroll, width(2000) start(0) percent xtitle(Part-time enrollment) xlabel(0(4000)28000)

6. 

The Stata commands to create the new variable and perform the comparison follow:

generate total_enroll2 = ft_enroll + pt_enroll

tabstat total_enroll2 total_enroll, statistics( mean sd min max ) columns(statistics)

The statistics are the same for these two variables, which indicates that they are both the sum of full and part-time students.

7. 

The Stata commands to create the variable follow:

generatepercent_full = 100*(ft_enroll/total_enroll)

label variable percent_full “Percent of full time enrollment”

8.

The Stata commands and their results follow:

. tabstatpercent_full,statistics( mean sd min max ) columns(statistics)

variable |      mean        sd       min       max

————-+—————————————-

percent_full|  32.10321  9.589105  14.67338  85.46763

——————————————————

9. 

The Stata command and its results follow:

histogrampercent_full, width(10) start(0) percent xtitle(Percent full time students) xlabel(0(20)100)

10. 

The Stata command and its results follow:

graph box percent_full

11.

The median is 32.24. The Stata command and its results follow:

summarizepercent_full, detail

Percent of full time enrollment

————————————————————-

Percentiles      Smallest

1%     15.18289       14.67338

5%     17.62016       15.18289

10%     21.49006       15.58329       Obs                 104

25%     25.29904       16.92829       Sum of Wgt.         104

50%     32.24407                      Mean           32.10321

Largest       Std. Dev.      9.589105

75%     36.97675       47.98303

90%     41.52542       48.52164       Variance       91.95094

95%     46.61126       48.61709       Skewness       1.561323

99%     48.61709       85.46763       Kurtosis        10.6257

12. 

The average percentage of full time students is 32.10%, which is very close to the median 32.24%, which implies the distribution will be nearly symmetric. From the histogram, all of the observations except an outlier also tagged in the boxplot are between 10% and 50%, with the outlier being a single observation at 85.46%.  The standard deviation is 9.58, implying that the data do not vary too much about their mean.

13. 

The Stata commands to create the variable follow:

generatepercentfull_binary = 1 if percent_full< 32.24407

replacepercentfull_binary = 2 if percent_full>= 32.24407

14.

The variable label and value label commands follow:

label variable percentfull_binary “Percent of Full Time-Binary”

label define PFULLBINARY 1 “Less than 50%” 2 “Greater than or Equal to 50%”

label values percentfull_binary PFULLBINARY

15.

The Stata command and its results follow:

tab1 percentfull_binary

->tabulation of percentfull_binary

Percent of Full Time-Binary |      Freq.Percent        Cum.

—————————–+———————————–

Less than 50% |         52       50.00       50.00

Greater than or Equal to 50% |         52       50.00      100.00

—————————–+———————————–

Total |        104      100.00

16.

The Stata command and its results follow:

. tabstatassociate_deg if percentfull_binary==1, statistics( mean sd min max ) columns(statistics)

variable |      mean        sd       min       max

————-+—————————————-

associate_~g|  620.9423   408.002       101      1720

——————————————————

17.

The Stata command and its results follow:

tabstatassociate_deg if percentfull_binary==2, statistics( mean sd min max ) columns(statistics)

variable |      mean        sd       min       max

————-+—————————————-

associate_~g|  905.3654  481.7703         0      1847

——————————————————

18. 

The colleges that are at or above the median percentage of fulltime students tend grant a larger number of degrees on average and also have a slightly larger variation around that average.

19. 

The Stata commands follow:

cd “C:\STATA”

translate @Results assignment1.pdf, translator(Results2pdf)

20. 

Upload the PDF, this Assignment in a Word document, and the Do-File to Canvas.  

a1.do

set more off

describe

label define URBAN 11 “City-large” 12 “City-midsize” 13 “City-small” 21 “Suburb-large” 22 “Suburb-midsize” 23 “Suburb-small” 32 “Town-distant” 33 “Town-remote” 41 “Rural-fringe” 42 “Rural-distant”

label values urbanization URBAN

tab1 urbanization

tabstatassociate_degft_retentiontotal_enrollft_enrollpt_enroll, statistics( mean sd min max ) columns(statistics)

histogramassociate_deg, width(200) start(0) percent xtitle(Number of associate degrees) xlabel(0(200)2000)

histogramft_retention, width(10) start(0) percent xtitle(Percent retention of ft students) xlabel(0(10)100)

histogramtotal_enroll, width(2000) start(0) percent xtitle(Total enrollment) xlabel(0(4000)36000)

histogramft_enroll, width(1000) start(0) percent xtitle(Full time enrollment) xlabel(0(2000)12000)

histogrampt_enroll, width(2000) start(0) percent xtitle(Part-time enrollment) xlabel(0(4000)28000)

generate total_enroll2 = ft_enroll + pt_enroll

tabstat total_enroll2 total_enroll, statistics( mean sd min max ) columns(statistics)

generatepercent_full = 100*(ft_enroll/total_enroll)

label variable percent_full “Percent of full time enrollment”

tabstatpercent_full,statistics( mean sd min max ) columns(statistics)

histogrampercent_full, width(10) start(0) percent xtitle(Percent full time students) xlabel(0(20)100)

graph box percent_full

summarizepercent_full, detail

generatepercentfull_binary = 1 if percent_full< 32.24407

replacepercentfull_binary = 2 if percent_full>= 32.24407

label variable percentfull_binary “Percent of Full Time-Binary”

label define PFULLBINARY 1 “Less than 50%” 2 “Greater than or Equal to 50%”

label values percentfull_binary PFULLBINARY

tab1 percentfull_binary

tabstatassociate_deg if percentfull_binary==1, statistics( mean sd min max ) columns(statistics)

tabstatassociate_deg if percentfull_binary==2, statistics( mean sd min max ) columns(statistics)

cd “C:\STATA”

translate @Results assignment1.pdf, translator(Results2pdf)

cd 

assignment1.do

set more off

/* 1. Identify the number of observations and variables in the dataset. */

describe

/* 2. One variable in your dataset is the institutional Degree of       */

/* Urbanization (urbanization). Create value labels for the variable so */

/* that the variable values have the following labels:                  */

/*              a. 11: City-large                                       */

/*              b. 12: City-midsize                                     */

/*              c. 13: City-small                                       */

/*              d. 21: Suburb-large                                     */

/*              e. 22: Suburb-midsize                                   */

/*              f. 23: Suburb-small                                     */

/*              g. 32: Town-distant                                     */

/*              h. 33: Town-remote                                      */

/*              i. 41: Rural-fringe                                     */

/*              j. 42: Rural-distant                                    */

label define URBAN 11 “City-large” 12 “City-midsize” 13 “City-small” 21 “Suburb-large” 22 “Suburb-midsize” 23 “Suburb-small” 32 “Town-distant” 33 “Town-remote” 41 “Rural-fringe” 42 “Rural-distant”

label values urbanization URBAN

/* 3. Use a command to generate the frequency distribution of the       */

/* variable urbanization. Paste the output table into the word document.*/

tab1 urbanization

/* 4. Use a command to find the mean, standard deviation, minimum value, */

/* and maximum value for the number of associate’s degrees conferred     */

/* (associate_deg), full-time student retention rate (ft_retention),     */

/* total fall enrollment (total_enroll), full-time enrollment            */

/* (ft_enroll), and part-time enrollment (pt_enroll) in a table.         */

/* Generate this information within the same table and paste each table  */

/* into the word document.                                               */

tabstatassociate_degft_retentiontotal_enrollft_enrollpt_enroll, statistics( mean sd min max ) columns(statistics)

/* 5. Create a histogram of the variables in Question 4 to review the    */

/* distribution of the data and identify any outliers. What, if any,     */

/* outliers do you observe?  (Note, you do not need to paste histograms  */

/* into the word document or save the histograms and submit to Canvas –  */

/* just include the commands you used in your do-files).                 */

histogramassociate_deg, width(200) start(0) percent xtitle(Number of associate degrees) xlabel(0(200)2000)

histogramft_retention, width(10) start(0) percent xtitle(Percent retention of ft students) xlabel(0(10)100)

histogramtotal_enroll, width(2000) start(0) percent xtitle(Total enrollment) xlabel(0(4000)36000)

histogramft_enroll, width(1000) start(0) percent xtitle(Full time enrollment) xlabel(0(2000)12000)

histogrampt_enroll, width(2000) start(0) percent xtitle(Part-time enrollment) xlabel(0(4000)28000)

/* 6. Create a new variable called total_enroll2 that represents the sum */

/*  of part-time and full-time students. Compare the mean,               */

/* standard deviation, minimum value, and maximum value of total_enroll2 */

/* to the variable total_enroll. How similar or different are they?      */

generate total_enroll2 = ft_enroll + pt_enroll

tabstat total_enroll2 total_enroll, statistics( mean sd min max ) columns(statistics)

/* 7. Create a new variable that represents the percent of full-time     */

/* students enrolled in fall 2014 and label the variable percent_full.   */

generatepercent_full = 100*(ft_enroll/total_enroll)

label variable percent_full “Percent of full time enrollment”

/* 8. Create a table that reports the mean, standard deviation, minimum  */

/* value, and maximum value of percent_full in a table.  Paste the table */

/* into the word document.                                               */

tabstatpercent_full,statistics( mean sd min max ) columns(statistics)

/* 9. Create a histogram of percent_full. (Save the histogram and attach */

/* it within this document or upload to Canvas when you submit your      */

/* assignment).                                                          */

histogrampercent_full, width(10) start(0) percent xtitle(Percent full time students) xlabel(0(20)100)

/* 10. Create a boxplot of percent_full. (Save the box plot and attach   */

/* it within this document or upload to Canvas when you submit your      */

/* assignment).                                                          */

graph box percent_full

/* 11. Find the median for percent_full. (Hint: use the summary command  */

/* with the option det)                                                  */

summarizepercent_full, detail

/* 13. Create a new binary variable called percentfull_binary that       */

/* divides the values for percent_full into two categories. The first    */

/* category should represent all values less than the median and be      */

/* coded as 1, and the second category should represent all values       */

/* greater or equal to the median and be coded as 2.                     */

generatepercentfull_binary = 1 if percent_full< 32.24407

replacepercentfull_binary = 2 if percent_full>= 32.24407

/* 14. Create a variable label for percentfull_binary called “Percent    */

/* of Full Time-Binary”. Label the value of 1 as “Less than 50%” and 2   */

/* as “Greater than or Equal to 50%.”                                    */

label variable percentfull_binary “Percent of Full Time-Binary”

label define PFULLBINARY 1 “Less than 50%” 2 “Greater than or Equal to 50%”

label values percentfull_binary PFULLBINARY

/* 15. Create a frequency table for the variable percentfull_binary.     */

/* Paste the table into the word document.                               */

tab1 percentfull_binary

/* 16. Create a table with the mean, standard deviation, minimum value,  */

/* and maximum value for the number of associate’s degrees               */

/* (associate_deg) for colleges whose percent of full-time students is   */

/* BELOW the median. Paste the table into the word document.             */

tabstatassociate_deg if percentfull_binary==1, statistics( mean sd min max ) columns(statistics)

/* 17. Create a table with the mean, standard deviation, minimum value,  */

/* and maximum value for the number of associate’s degrees               */

/* (associate_deg) for colleges whose percent of full-time students is   */

/* AT OR ABOVE the median. Paste the table into the word document.       */

tabstatassociate_deg if percentfull_binary==2, statistics( mean sd min max ) columns(statistics)

/* 19. Print your entire output to a PDF by typing:                      */

/* translate @Results assignment1.pdf 20.                                */

/* Upload the PDF, this Assignment in a Word document, and the Do-File   */

/* to Canvas.                                                            */

cd “C:\STATA”

translate @Results assignment1.pdf, translator(Results2pdf)

cd