SAS Dataset Construction

SAS Dataset Construction

For problems that involve SAS, you need to turn in both the SAS program and the SAS output. In addition, some problems require you to answer questions or draw conclusions, then you need to write down (or type in) your answers based on the SAS output. For any hypothesis test problem, use 0.05 as the significance level.

  1. PersiDiaconis of Stanford University is a famous statisticianand before that, a magician. Among his many works, he conducted a study on coin flipping and found that “vigorously flipped coins tend to come up the same way they started”. The probability of a coin landing same way up as started was estimated to be roughly 50.8%. Two UC Berkeley undergraduate students, Janet and Priscilla, did 40,000 coin tosses in 2009 to verify those conclusions. They found that: of 20,000 Heads-up tosses performed by Janet, 10,231 landed Heads, and of 20,000 Tails-up tosses performed by Priscilla, 10014 landed Tails. In total, 20,245 out of 40,000 tosses landed the same side up as they started.

Now construct a SAS dataset and perform the following analyses.

  • Construct a one-way table listing the frequencies and percentages of tosses landed the same side up as they started (denoted “S”) and those that landed the opposite side up (denoted “O”).
  • Test the common belief that it is equally likely for a toss to land with the same side up and with the opposite side up.
  • Test the hypothesis of Diaconis et al. that the toss land with the same side up with a probability of 0.508.
  • Construct a two-way table listing the frequencies and percentages for the way that a toss landed (“S” or “O”) crossed by the person that conducted the toss (Janet or Priscilla).
  • Test whether the probability of getting “S” is different for the two experimenters.
  1. For this problem, we use the running time data again. There are three variables with 21 observationsin the SAS dataset. Variable “time_A” records the time that runner A spent on each kilometer in a recent half-marathon race, for which the first observation is the time the runner spent on the first kilometer, the second observation is the time spent on the second kilometer, and so on. Similarly, “time_B” records the time that runner B spent on each kilometer in the same race. Variable “segment” labels the first half and the second half of the race, the first 10 observations has value 1, and the remaining 11 observations have value 2.
  • For runner A, use PROC TTEST to test if mean time per kilometer is lower than 5 minutes 40 seconds.
  • For runner A, conduct a t-test to decide whether the mean time per kilometer for the second half (segment=2) is higher than that for the first half (segment=1).
  • Conduct an appropriate test to see whether, on average, runner B spent different amount of time from runner A on each kilometer. 

Solution

Construct a one-way table listing the frequencies and percentages of tosses landed the same side up as they started (denoted “S”) and those that landed the opposite side up (denoted “O”).

toss Frequency Percent Cumulative
Frequency
Cumulative
Percent
O 19755 49.39 19755 49.39
S 20245 50.61 40000 100.00
 

toss

Frequency Percent Cumulative
Frequency
Cumulative
Percent
O 19755 49.39 19755 49.39
S 20245 50.61 40000 100.00

Test the common belief that it is equally likely for a toss to land with the same side up and with the opposite side up.

The p-value is less, than 0.05, so the difference is significant.

Binomial Proportion
toss = O
Proportion 0.4939
ASE 0.0025
95% Lower Conf Limit 0.4890
95% Upper Conf Limit 0.4988
 
Exact Conf Limits
95% Lower Conf Limit 0.4890
95% Upper Conf Limit 0.4988
Test of H0: Proportion = 0.5
ASE under H0 0.0025
Z -2.4500
One-sided Pr<  Z 0.0071
Two-sided Pr> |Z| 0.0143
Sample Size = 40000

 

toss Frequency Percent Cumulative
Frequency
Cumulative
Percent
O 19755 49.39 19755 49.39
S 20245 50.61 40000 100.00

Test the hypothesis of Diaconis et al. that the toss land with the same side up with a probability of 0.508.

Binomial Proportion
toss = O
Proportion 0.4939
ASE 0.0025
95% Lower Conf Limit 0.4890
95% Upper Conf Limit 0.4988
 
Exact Conf Limits
95% Lower Conf Limit 0.4890
95% Upper Conf Limit 0.4988
Test of H0: Proportion = 0.508
ASE under H0 0.0025
Z -5.6507
One-sided Pr<  Z <.0001
Two-sided Pr> |Z| <.0001

The p-value is less, than 0.05, so the difference is significant.

Sample Size = 40000

 

Table of toss by perform
toss perform
Frequency
Percent
Row Pct
Col Pct
Janet Prisc Total
O 9769
24.42
49.45
48.85
9986
24.97
50.55
49.93
19755
49.39
S 10231
25.58
50.54
51.16
10014
25.04
49.46
50.07
20245
50.61
Total 20000
50.00
20000
50.00
40000
100.00

Construct a two-way table listing the frequencies and percentages for the way that a toss landed (“S” or “O”) crossed by the person that conducted the toss (Janet or Priscilla).

Test whether the probability of getting “S” is different for the two experimenters.

Table of toss by perform
toss perform
Frequency
Percent
Row Pct
Col Pct
Janet Prisc Total
O 9769
24.42
49.45
48.85
9986
24.97
50.55
49.93
19755
49.39
S 10231
25.58
50.54
51.16
10014
25.04
49.46
50.07
20245
50.61
Total 20000
50.00
20000
50.00
40000
100.00

Statistics for Table of toss by perform

 

Statistic DF Value Prob
Chi-Square 1 4.7096 0.0300
Likelihood Ratio Chi-Square 1 4.7097 0.0300
Continuity Adj. Chi-Square 1 4.6663 0.0308
Mantel-Haenszel Chi-Square 1 4.7095 0.0300
Phi Coefficient -0.0109
Contingency Coefficient 0.0109
Cramer’s V -0.0109
Fisher’s Exact Test
Cell (1,1) Frequency (F) 9769
Left-sided Pr<= F 0.0154
Right-sided Pr>= F 0.9854
 
Table Probability (P) 0.0008
Two-sided Pr<= P 0.0308

The p-value is less, than 0.05, so the difference is significant.

Sample Size = 40000

 

N Mean Std Dev Std Err Minimum Maximum
21 355.1 21.9748 4.7953 317.0 392.0

For runner A, use PROC TTEST to test if mean time per kilometer is lower than 5 minutes 40 seconds.

Mean 95% CL Mean Std Dev 95% CL Std Dev
355.1 346.8 Infty 21.9748 16.8120 31.7331
DF t Value Pr > t
20 3.15 0.0025

The p-value is less, than 0.05, so the difference is significant.

segment N Mean Std Dev Std Err Minimum Maximum
1 10 343.6 17.4305 5.5120 317.0 371.0
2 11 365.5 20.9779 6.3251 338.0 392.0
Diff (1-2) -21.9455 19.3787 8.4672

For runner A, conduct a t-test to decide whether the mean time per kilometer for the second half (segment=2) is higher than that for the first half (segment=1).

segment Method Mean 95% CL Mean Std Dev 95% CL Std Dev
1   343.6 331.1 356.1 17.4305 11.9893 31.8213
2   365.5 351.5 379.6 20.9779 14.6576 36.8148
Diff (1-2) Pooled -21.9455 -39.6674 -4.2235 19.3787 14.7373 28.3039
Diff (1-2) Satterthwaite -21.9455 -39.5140 -4.3770
Method Variances DF t Value Pr > |t|
Pooled Equal 19 -2.59 0.0179
Satterthwaite Unequal 18.866 -2.62 0.0171

The p-value is less, than 0.05, so the difference is significant.

Equality of Variances
Method Num DF Den DF F Value Pr > F
Folded F 10 9 1.45 0.5888
N Mean Std Dev Std Err Minimum Maximum
21 4.9048 41.2758 9.0071 -75.0000 107.0

Conduct an appropriate test to see whether, on average, runner B spent different amount of time from runner A on each kilometer.

Mean 95% CL Mean Std Dev 95% CL Std Dev
4.9048 -13.8838 23.6933 41.2758 31.5784 59.6051

The p-value is above 0.05, so the difference is not significant.

DF t Value Pr > |t|
20 0.54 0.5921