# SAS Dataset Construction

For problems that involve SAS, you need to turn in both the SAS program and the SAS output. In addition, some problems require you to answer questions or draw conclusions, then you need to write down (or type in) your answers based on the SAS output. For any hypothesis test problem, use 0.05 as the significance level.

1. PersiDiaconis of Stanford University is a famous statisticianand before that, a magician. Among his many works, he conducted a study on coin flipping and found that “vigorously flipped coins tend to come up the same way they started”. The probability of a coin landing same way up as started was estimated to be roughly 50.8%. Two UC Berkeley undergraduate students, Janet and Priscilla, did 40,000 coin tosses in 2009 to verify those conclusions. They found that: of 20,000 Heads-up tosses performed by Janet, 10,231 landed Heads, and of 20,000 Tails-up tosses performed by Priscilla, 10014 landed Tails. In total, 20,245 out of 40,000 tosses landed the same side up as they started.

Now construct a SAS dataset and perform the following analyses.

• Construct a one-way table listing the frequencies and percentages of tosses landed the same side up as they started (denoted “S”) and those that landed the opposite side up (denoted “O”).
• Test the common belief that it is equally likely for a toss to land with the same side up and with the opposite side up.
• Test the hypothesis of Diaconis et al. that the toss land with the same side up with a probability of 0.508.
• Construct a two-way table listing the frequencies and percentages for the way that a toss landed (“S” or “O”) crossed by the person that conducted the toss (Janet or Priscilla).
• Test whether the probability of getting “S” is different for the two experimenters.
1. For this problem, we use the running time data again. There are three variables with 21 observationsin the SAS dataset. Variable “time_A” records the time that runner A spent on each kilometer in a recent half-marathon race, for which the first observation is the time the runner spent on the first kilometer, the second observation is the time spent on the second kilometer, and so on. Similarly, “time_B” records the time that runner B spent on each kilometer in the same race. Variable “segment” labels the first half and the second half of the race, the first 10 observations has value 1, and the remaining 11 observations have value 2.
• For runner A, use PROC TTEST to test if mean time per kilometer is lower than 5 minutes 40 seconds.
• For runner A, conduct a t-test to decide whether the mean time per kilometer for the second half (segment=2) is higher than that for the first half (segment=1).
• Conduct an appropriate test to see whether, on average, runner B spent different amount of time from runner A on each kilometer.

Solution

Construct a one-way table listing the frequencies and percentages of tosses landed the same side up as they started (denoted “S”) and those that landed the opposite side up (denoted “O”).

 toss Frequency Percent Cumulative Frequency Cumulative Percent O 19755 49.39 19755 49.39 S 20245 50.61 40000 100.00
 toss Frequency Percent Cumulative Frequency Cumulative Percent O 19755 49.39 19755 49.39 S 20245 50.61 40000 100.00

Test the common belief that it is equally likely for a toss to land with the same side up and with the opposite side up.

The p-value is less, than 0.05, so the difference is significant.

 Binomial Proportion toss = O Proportion 0.4939 ASE 0.0025 95% Lower Conf Limit 0.4890 95% Upper Conf Limit 0.4988 Exact Conf Limits 95% Lower Conf Limit 0.4890 95% Upper Conf Limit 0.4988
 Test of H0: Proportion = 0.5 ASE under H0 0.0025 Z -2.4500 One-sided Pr<  Z 0.0071 Two-sided Pr> |Z| 0.0143
 Sample Size = 40000

 toss Frequency Percent Cumulative Frequency Cumulative Percent O 19755 49.39 19755 49.39 S 20245 50.61 40000 100.00

Test the hypothesis of Diaconis et al. that the toss land with the same side up with a probability of 0.508.

 Binomial Proportion toss = O Proportion 0.4939 ASE 0.0025 95% Lower Conf Limit 0.4890 95% Upper Conf Limit 0.4988 Exact Conf Limits 95% Lower Conf Limit 0.4890 95% Upper Conf Limit 0.4988
 Test of H0: Proportion = 0.508 ASE under H0 0.0025 Z -5.6507 One-sided Pr<  Z <.0001 Two-sided Pr> |Z| <.0001

The p-value is less, than 0.05, so the difference is significant.

 Sample Size = 40000

 Table of toss by perform toss perform Frequency Percent Row Pct Col Pct Janet Prisc Total O 9769 24.42 49.45 48.85 9986 24.97 50.55 49.93 19755 49.39 S 10231 25.58 50.54 51.16 10014 25.04 49.46 50.07 20245 50.61 Total 20000 50.00 20000 50.00 40000 100.00

Construct a two-way table listing the frequencies and percentages for the way that a toss landed (“S” or “O”) crossed by the person that conducted the toss (Janet or Priscilla).

Test whether the probability of getting “S” is different for the two experimenters.

 Table of toss by perform toss perform Frequency Percent Row Pct Col Pct Janet Prisc Total O 9769 24.42 49.45 48.85 9986 24.97 50.55 49.93 19755 49.39 S 10231 25.58 50.54 51.16 10014 25.04 49.46 50.07 20245 50.61 Total 20000 50.00 20000 50.00 40000 100.00

Statistics for Table of toss by perform

 Statistic DF Value Prob Chi-Square 1 4.7096 0.0300 Likelihood Ratio Chi-Square 1 4.7097 0.0300 Continuity Adj. Chi-Square 1 4.6663 0.0308 Mantel-Haenszel Chi-Square 1 4.7095 0.0300 Phi Coefficient -0.0109 Contingency Coefficient 0.0109 Cramer’s V -0.0109
 Fisher’s Exact Test Cell (1,1) Frequency (F) 9769 Left-sided Pr<= F 0.0154 Right-sided Pr>= F 0.9854 Table Probability (P) 0.0008 Two-sided Pr<= P 0.0308

The p-value is less, than 0.05, so the difference is significant.

 Sample Size = 40000

 N Mean Std Dev Std Err Minimum Maximum 21 355.1 21.9748 4.7953 317.0 392.0

For runner A, use PROC TTEST to test if mean time per kilometer is lower than 5 minutes 40 seconds.

 Mean 95% CL Mean Std Dev 95% CL Std Dev 355.1 346.8 Infty 21.9748 16.8120 31.7331
 DF t Value Pr > t 20 3.15 0.0025

The p-value is less, than 0.05, so the difference is significant.

 segment N Mean Std Dev Std Err Minimum Maximum 1 10 343.6 17.4305 5.5120 317.0 371.0 2 11 365.5 20.9779 6.3251 338.0 392.0 Diff (1-2) -21.9455 19.3787 8.4672

For runner A, conduct a t-test to decide whether the mean time per kilometer for the second half (segment=2) is higher than that for the first half (segment=1).

 segment Method Mean 95% CL Mean Std Dev 95% CL Std Dev 1 343.6 331.1 356.1 17.4305 11.9893 31.8213 2 365.5 351.5 379.6 20.9779 14.6576 36.8148 Diff (1-2) Pooled -21.9455 -39.6674 -4.2235 19.3787 14.7373 28.3039 Diff (1-2) Satterthwaite -21.9455 -39.5140 -4.3770
 Method Variances DF t Value Pr > |t| Pooled Equal 19 -2.59 0.0179 Satterthwaite Unequal 18.866 -2.62 0.0171

The p-value is less, than 0.05, so the difference is significant.

 Equality of Variances Method Num DF Den DF F Value Pr > F Folded F 10 9 1.45 0.5888
 N Mean Std Dev Std Err Minimum Maximum 21 4.9048 41.2758 9.0071 -75.0000 107.0

Conduct an appropriate test to see whether, on average, runner B spent different amount of time from runner A on each kilometer.

 Mean 95% CL Mean Std Dev 95% CL Std Dev 4.9048 -13.8838 23.6933 41.2758 31.5784 59.6051

The p-value is above 0.05, so the difference is not significant.

 DF t Value Pr > |t| 20 0.54 0.5921