Homework Set 9 , given Fri 4/25, due Friday 5/2

Problem 9.1 below requires a simulation using the MATLAB program (for Problem 9.2 the program is also recommended). On Campus, Matlab is available for students in the Academic Computing Center, Engineering Library (ACCEL) in Carpenter Hall ( www.accel.cornell.edu). The Matlab program runs on many computers there; it contains the statistics package and is installed under Windows (not Unix).

The present homework will appear additionally as a text file on the website so that data and code can be copied and pasted.


Problem 9.1. Power of the signed rank test. The Matlab simulation program below is for exploring the power of the signed rank test for the "half Cauchy" data treated before (cf. p. xvii Appendix, notes).

Before you can use it, you have to input the sample size n and the shift m at the command line:
>>n=100
>>m=0
The shift can then be modified to positive values. Recall that the half Cauchy density has median 0 but is heavily skewed to the right, so the signed rank test should detect it as an alternative (i.e. as nonsymmetric) even for m=0. Here is the program "signrank1":
1: k=0
2: rejec=0
3: sigma=sqrt(pi/2)
4: while k<1000
5: b=binornd(1,0.5,1,n)
6: d=m+abs(trnd(1,1,n)).*b+abs(normrnd(0,sigma,1,n)).*(b-1)
7:[p,h]=signrank(d)
8:rejec=rejec+h
9: k=k+1
10: end

The output "rejec" at the end gives the number of 5% (two sided) rejections in 1000 runs. We can thus explore the power of the signed rank test for various alternatives (m) and various sample sizes (n).

a) Estimate the power (probability of rejection) of the signed rank test using the above simulation (one program run suffices), for sample size n=100 and values m=0, m=0.1 and m=0.2.

b) Repeat a) for sample size n=1000.

c) Write a sentence interpreting the behaviour of the power under varying shift m and sample sizen.



Problem 9.2. Are birthdays of people uniformly distributed over the year ? This problem is treated in the article: G. Berresford, "The uniformity assumption in the birthday problem", Math. Mag. 53 1980, no. 5, 286-288. The data used are on the website

http://www.dartmouth.edu/%7Echance/teaching_aids/data/birthday.txt

they give the the number of births in the U.S. for each day of the year 1978. The following summary for each month is based on these data. The numbers are in thousand, rounded to one decimal.

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
270.7 249.9 276.6 254.6 270.8 270.8 294.7 302.8 293.9 289.0 274.7 284.9

a) Carry out a chi square test of goodness of fit for uniform distribution of birthdays. Report the value of the chi square statistic, p-value and decision.

b) The births may be uniformly distributed over days of the year, but not over months due to the different lengths of the months. Recall that the lengths of the months in days are;
31 28 31 30 31 30 31 31 30 31 30 31. Carry out a test for uniformity over days, using the above data for months and adjusting the expected counts for the different lengths. Report the value of the chi square statistic, p-value and decision.

Hint: in Matlab a chi square godness of fit for uniformity, with these data, is performed by the series of commands

>> bins=1:12;
>>obsCounts=[270.7 249.9 276.6 254.6 270.8 270.8 294.7 302.8 293.9 289.0 274.7 284.9]*1000
>>n = sum(obsCounts);
>>expCounts=ones(1,12)*n/12;
>>[h,p,st] = chi2gof(bins,'frequency',obsCounts,'expected',expCounts)
Here "bins" defines indices of bins, 1,...,12. Next, "obsCounts" defines our vector of observed counts; notify that the values have to be multiplied by 1000 since we have to use the actual counts (not in units of 1000). Then n gives the total sum of births, and "expCounts" defines the vector of expected counts for uniformity, using the command "ones(1,12)" to obtain the vector [1 ...1 ] and multiply it with n/12. Finally, "[h,p,st] = " carries out the test, where output "h" is the decision at 5% significance, output "p" is the P-value and output "st" gives as first item chi2stat=, the value of the chi square statistic. Ignore the remaining output.

For b) modify "expCounts" appropriately (see "obsCounts" how to input vectors directly).