SAS Dataset Construction
For problems that involve SAS, you need to turn in both the SAS program and the SAS output. In addition, some problems require you to answer questions or draw conclusions, then you need to write down (or type in) your answers based on the SAS output. For any hypothesis test problem, use 0.05 as the significance level.
- PersiDiaconis of Stanford University is a famous statisticianand before that, a magician. Among his many works, he conducted a study on coin flipping and found that “vigorously flipped coins tend to come up the same way they started”. The probability of a coin landing same way up as started was estimated to be roughly 50.8%. Two UC Berkeley undergraduate students, Janet and Priscilla, did 40,000 coin tosses in 2009 to verify those conclusions. They found that: of 20,000 Heads-up tosses performed by Janet, 10,231 landed Heads, and of 20,000 Tails-up tosses performed by Priscilla, 10014 landed Tails. In total, 20,245 out of 40,000 tosses landed the same side up as they started.
Now construct a SAS dataset and perform the following analyses.
- Construct a one-way table listing the frequencies and percentages of tosses landed the same side up as they started (denoted “S”) and those that landed the opposite side up (denoted “O”).
- Test the common belief that it is equally likely for a toss to land with the same side up and with the opposite side up.
- Test the hypothesis of Diaconis et al. that the toss land with the same side up with a probability of 0.508.
- Construct a two-way table listing the frequencies and percentages for the way that a toss landed (“S” or “O”) crossed by the person that conducted the toss (Janet or Priscilla).
- Test whether the probability of getting “S” is different for the two experimenters.
- For this problem, we use the running time data again. There are three variables with 21 observationsin the SAS dataset. Variable “time_A” records the time that runner A spent on each kilometer in a recent half-marathon race, for which the first observation is the time the runner spent on the first kilometer, the second observation is the time spent on the second kilometer, and so on. Similarly, “time_B” records the time that runner B spent on each kilometer in the same race. Variable “segment” labels the first half and the second half of the race, the first 10 observations has value 1, and the remaining 11 observations have value 2.
- For runner A, use PROC TTEST to test if mean time per kilometer is lower than 5 minutes 40 seconds.
- For runner A, conduct a t-test to decide whether the mean time per kilometer for the second half (segment=2) is higher than that for the first half (segment=1).
- Conduct an appropriate test to see whether, on average, runner B spent different amount of time from runner A on each kilometer.
Solution
Construct a one-way table listing the frequencies and percentages of tosses landed the same side up as they started (denoted “S”) and those that landed the opposite side up (denoted “O”).
toss | Frequency | Percent | Cumulative Frequency |
Cumulative Percent |
O | 19755 | 49.39 | 19755 | 49.39 |
S | 20245 | 50.61 | 40000 | 100.00 |
toss |
Frequency | Percent | Cumulative Frequency |
Cumulative Percent |
O | 19755 | 49.39 | 19755 | 49.39 |
S | 20245 | 50.61 | 40000 | 100.00 |
Test the common belief that it is equally likely for a toss to land with the same side up and with the opposite side up.
The p-value is less, than 0.05, so the difference is significant.
Binomial Proportion | |
toss = O | |
Proportion | 0.4939 |
ASE | 0.0025 |
95% Lower Conf Limit | 0.4890 |
95% Upper Conf Limit | 0.4988 |
Exact Conf Limits | |
95% Lower Conf Limit | 0.4890 |
95% Upper Conf Limit | 0.4988 |
Test of H0: Proportion = 0.5 | |
ASE under H0 | 0.0025 |
Z | -2.4500 |
One-sided Pr< Z | 0.0071 |
Two-sided Pr> |Z| | 0.0143 |
Sample Size = 40000 |
toss | Frequency | Percent | Cumulative Frequency |
Cumulative Percent |
O | 19755 | 49.39 | 19755 | 49.39 |
S | 20245 | 50.61 | 40000 | 100.00 |
Test the hypothesis of Diaconis et al. that the toss land with the same side up with a probability of 0.508.
Binomial Proportion | |
toss = O | |
Proportion | 0.4939 |
ASE | 0.0025 |
95% Lower Conf Limit | 0.4890 |
95% Upper Conf Limit | 0.4988 |
Exact Conf Limits | |
95% Lower Conf Limit | 0.4890 |
95% Upper Conf Limit | 0.4988 |
Test of H0: Proportion = 0.508 | |
ASE under H0 | 0.0025 |
Z | -5.6507 |
One-sided Pr< Z | <.0001 |
Two-sided Pr> |Z| | <.0001 |
The p-value is less, than 0.05, so the difference is significant.
Sample Size = 40000 |
Table of toss by perform | |||
toss | perform | ||
Frequency Percent Row Pct Col Pct |
Janet | Prisc | Total |
O | 9769 24.42 49.45 48.85 |
9986 24.97 50.55 49.93 |
19755 49.39 |
S | 10231 25.58 50.54 51.16 |
10014 25.04 49.46 50.07 |
20245 50.61 |
Total | 20000 50.00 |
20000 50.00 |
40000 100.00 |
Construct a two-way table listing the frequencies and percentages for the way that a toss landed (“S” or “O”) crossed by the person that conducted the toss (Janet or Priscilla).
Test whether the probability of getting “S” is different for the two experimenters.
Statistics for Table of toss by perform |
Statistic | DF | Value | Prob |
Chi-Square | 1 | 4.7096 | 0.0300 |
Likelihood Ratio Chi-Square | 1 | 4.7097 | 0.0300 |
Continuity Adj. Chi-Square | 1 | 4.6663 | 0.0308 |
Mantel-Haenszel Chi-Square | 1 | 4.7095 | 0.0300 |
Phi Coefficient | -0.0109 | ||
Contingency Coefficient | 0.0109 | ||
Cramer’s V | -0.0109 |
Fisher’s Exact Test | |
Cell (1,1) Frequency (F) | 9769 |
Left-sided Pr<= F | 0.0154 |
Right-sided Pr>= F | 0.9854 |
Table Probability (P) | 0.0008 |
Two-sided Pr<= P | 0.0308 |
The p-value is less, than 0.05, so the difference is significant.
Sample Size = 40000 |
N | Mean | Std Dev | Std Err | Minimum | Maximum |
21 | 355.1 | 21.9748 | 4.7953 | 317.0 | 392.0 |
For runner A, use PROC TTEST to test if mean time per kilometer is lower than 5 minutes 40 seconds.
Mean | 95% CL Mean | Std Dev | 95% CL Std Dev | ||
355.1 | 346.8 | Infty | 21.9748 | 16.8120 | 31.7331 |
DF | t Value | Pr > t |
20 | 3.15 | 0.0025 |
The p-value is less, than 0.05, so the difference is significant.
segment | N | Mean | Std Dev | Std Err | Minimum | Maximum |
1 | 10 | 343.6 | 17.4305 | 5.5120 | 317.0 | 371.0 |
2 | 11 | 365.5 | 20.9779 | 6.3251 | 338.0 | 392.0 |
Diff (1-2) | -21.9455 | 19.3787 | 8.4672 |
For runner A, conduct a t-test to decide whether the mean time per kilometer for the second half (segment=2) is higher than that for the first half (segment=1).
segment | Method | Mean | 95% CL Mean | Std Dev | 95% CL Std Dev | ||
1 | 343.6 | 331.1 | 356.1 | 17.4305 | 11.9893 | 31.8213 | |
2 | 365.5 | 351.5 | 379.6 | 20.9779 | 14.6576 | 36.8148 | |
Diff (1-2) | Pooled | -21.9455 | -39.6674 | -4.2235 | 19.3787 | 14.7373 | 28.3039 |
Diff (1-2) | Satterthwaite | -21.9455 | -39.5140 | -4.3770 |
Method | Variances | DF | t Value | Pr > |t| |
Pooled | Equal | 19 | -2.59 | 0.0179 |
Satterthwaite | Unequal | 18.866 | -2.62 | 0.0171 |
The p-value is less, than 0.05, so the difference is significant.
Equality of Variances | ||||
Method | Num DF | Den DF | F Value | Pr > F |
Folded F | 10 | 9 | 1.45 | 0.5888 |
N | Mean | Std Dev | Std Err | Minimum | Maximum |
21 | 4.9048 | 41.2758 | 9.0071 | -75.0000 | 107.0 |
Conduct an appropriate test to see whether, on average, runner B spent different amount of time from runner A on each kilometer.
Mean | 95% CL Mean | Std Dev | 95% CL Std Dev | ||
4.9048 | -13.8838 | 23.6933 | 41.2758 | 31.5784 | 59.6051 |
The p-value is above 0.05, so the difference is not significant.
DF | t Value | Pr > |t| |
20 | 0.54 | 0.5921 |