Stat 2000 Tips for Assignment 4

Published: Fri, 11/05/10

Hi ,

You are receing this e-mail because you indicated when you signed up for Grant's Updates that you are taking Stat 2000 this term. If in fact, you are not taking Stat 2000, please reply to this e-mail and let me know, and I will fix that.

Throughout the term I will send you all sorts of tips to help you study and learn the course. You probably already have done so, but, if not, I strongly recommend you purchase my Basic Stats 2 Study Book. You will find it a great resource to learn the course. I pride myself in explaining things in clear, everyday language. I also provided numerous examples of all the key concepts with step-by-step solutions. You can order my book at UMSU Digital Copy Centre at University Centre at UM campus. They make the book to order so please allow one business day. The book is split into two volumes and each volume costs $45 + tax.

If you ever want to look back over a previous tip I have sent, do note that all my tips can be found in my archive. Click this link to go straight to my archive:

Grant's Updates Archive

Tips for Assignment 4: Inference for Proportions and Two-Way Tables

Study Lessons 7 and 8 up to the end of question 4 in my book, if you have it, to prepare for this assignment.

Questions 1 to 4 are Lesson 7 stuff, and I am confident you will have no problem doing them once you study Lesson 7. You may want to review how to interpret a confidence interval in Lesson 1 of my book (remember, though, that these are confidence intervals for proportions or the difference between proportions, not means), and review how to interpret a P-value in Lesson 2 of my book (again keeping in mind you are hypothesizing about proportions, not means).

The rest of the assignment applies to Lesson 8 in my book (you need only study to the end of question 4; the remainder of Lesson 8 is covered in your next assignment).

In question 5, the joint distribution is simply the joint proportions found by dividing the appropriate cell count by the Grand Total, and the marginal distribution is the marginal proportions found by dividing the appropriate row or column total by the Grand Total. I discuss this in more detail at the start of Lesson 8 in my new edition only.

Question 6 is playing with the concepts I discuss in question 4 of Lesson 8.

In question 7, I would contend that two of the choices they give you for the hypotheses are correct, but they would disagree. When you are comparing two groups (men and women; old and young; two cities in this case), saying the rows and columns are independent is equivalent to saying the distribution for the one group is the same as the distribution for the other group. For example, in my question 2 in Lesson 8 where we are comparing the attitudes towards shopping for men and women, the null hypothesis is that your attitude towards shopping is independent of your sex. We could also say that the distribution in attitudes is the same for men and for women. Note, that is not the same as saying the percentages are all equally likely. Just that the distributions are the same. For example, if the population of men has 20% who always like shopping, 30% who sometimes like shopping, and 50% who never like shopping, then the null hypothesis would same that the women would also have the same distribution as the men (they too would have the 20% / 30% / 50% distribution. Admittedly, one of the choices they offer you in this question that I would consider correct is a little vague, and for that reason, I can see why the other one is preferred.

Question 8. To enter this data in JMP. Click New Data Table. You will need a total of three columns. Double-click Column 1 and name it "Music" and change the Data Type to "Character" and the Modeling Type to "Nominal". Double click the space to the right of the Music column to create a new column. Name that column "Movies" and change the Data Type to "Character" and the Modeling Type to "Nominal". Double click the space to the right of the Movies column to create a new column. Name that column "Count" and keep the Data Type as "Numeric" but change the Modeling Type to "Nominal".

Each row in the JMP data table is used to enter the information for a particular cell of the two-way table. The first row will represent the 1,1 cell; the second row will represent the 1,2 cell; etc. For example, your 1,1 cell gives you the observed count for the young adults who prefer Contemporary music and Action movies. In the JMP data table, in row 1 type "Contemporary" in the Music column, "Action" in the Movies column, and type the given observed count in the "Count" column. Type the info for the 1,2 cell into the second row of your JMP table. That is the observed count for the young adults who prefer Contemporary music and Comedy movies, so you will type "Contemporary" in the Music column, "Comedy" in the Movies column and the observed count in the Count column. In the third row you will type Contemporary in the Music column, Drama in the Movies column, and the observed count for the 1,3 cell in the Count column. Continue in this fashion all the way to the sixteenth row where you will type "Rock" in the Music column, "Horror" in the Movies column, and the observed count for the 4,4 cell in the Count column.

You will notice that the first two columns of the JMP table are used to specify which row and column of the two-way table you are talking about, and the third column enters the observed count for that particular cell.

Once you have entered in all the observed counts, select Analyze, Fit Y By X. Select "Movies" and click "Y, Response", select "Music" and click "X, Factor", and select "Count" and click "Freq". Click "OK". Click the red triangle next to "Contingency Analysis of Music by Movies" at the top and deselect "Mosaic Plot" to remove that from the output. You now see a Contingency Table (or two-way table) and the "Tests" below it. Click the red triangle next to Contingency Table and make sure that all that is select is "Count", "Expected" and "Cell Chi Square" to display those values in each cell of the table. Note the Pearson ChiSquare is the test statistic for the problem (in the last row of the "Tests" output) and the Prob>ChiSq is the P-value for that test.

When they ask what two cells contribute most to the test statistic, they are asking which two cells have the largest chi-square values.