Stat 2000: Tips for Assignment 4

Published: Thu, 03/15/12


Please note that the tentative date for my final exam prep seminar for Stat 2000 will be on Sunday, Apr. 15, in room 100 St. Paul's College, from 9 am to 9 pm .  I am not ready to take registrations yet as the date has not been finalized.  Please reply and let me know if that date sounds good to you or not.  Please click this link for more information about the seminar if you are interested:
Grant's Stat 2000 Exam Prep Seminars 
 
If you ever want to look back over a previous tip I have sent, do note that all my tips can be found in my archive.  Click this link to go straight to my archive:
Grant's Updates Archive
 
Did you miss my Tips on How to Do Well in this Course? Click here
 
Did you miss my Tips for Assignment 3? Click here
 
If you are taking the course by Distance/Online (Sections D01, D02, etc.), click here for my tips for your Assignment 4.
 
If you are taking the course by classroom lecture (Sections A01, A02, etc.), click here for my tips for your Assignment 4.
 
Tips for Assignment 4 (Classroom Lecture Sections A01, A02, etc.)
 
You need to study Lesson 7: Inferences about Proportions (if you are using an older edition of my book, this may be Lesson 8).  You also will need to study the first half of Lesson 8: Chi-Square Tests (up to the end of question 4, you do not need to study the Goodness-of-Fit Test at this time).
 
Question 1 is very similar to my question 1(c) and (d) in Lesson 7.  Note, as I discuss back in Lesson 6, question 7, a fraction is another way of giving you a value for p.
 
Question 2 is standard stuff, like my questions 6 to 8 in Lesson 7.  Note that part (b) is talking about the Inverse-Square Relationship for sample size which I introduced way back in Lesson 1, question 8.
 
Question 3 is a good run through of confidence intervals and hypothesis testing as I teach in Lesson 7 (see my questions 2 and 3).  Part (f) requires an alpha/beta table.  Note that you will need to use the z* critical value to compute p-hat*, the critical value for p-hat where you will reject Ho (the p-hat decision rule).  We derive p-hat* from the standardizing formula for p-hat bell curves.
Click this link to see how to find the critical value for p-hat:
P-hat Decision Rule
 
Question 4 is very similar to my questions about confidence intervals and hypothesis tests for the difference between two proportions taught in the latter half of Lesson 7.  Part (e) introduces the concept i teach in my question 4 in Lesson 8 (Chi-Square Tests).  Note that you don't really have to do any work for part (e) if you apply the concept that relates two-proportion z tests to 2 by 2 Two-Way Chi-Square analysis.
 
Question 5 is standard Two-Way Table Chi-Square analysis as taught in questions 1 through 4 in Lesson 8 of my book.
 
Question 6 requires the use of JMP.
 
Here is how to do Contingency Tables (2-Way Tables) in JMP:
 
Click New Data Table. You will need a total of three columns. Double-click Column 1 and name it "Smoking" and change the Data Type to "Character" and the Modeling Type to "Nominal". Double click the space to the right of the Smoking column to create a new column. Name that column "Drinking" and change the Data Type to "Character" and the Modeling Type to "Nominal". Double click the space to the right of the Course column to create a new column. Name that column "Count" and keep the Data Type as "Numeric" but change the Modeling Type to "Nominal". 
 
Make sure that you have the correct Data Type and Modeling Type for each of these three columns as I outline above!
 
Each row in the JMP data table is used to enter the information for a particular cell of the two-way table. The first row will represent the 1,1 cell; the second row will represent the 1,2 cell; etc. For example, your 1,1 cell gives you the observed count for the people who Don't Smoke and Never Drink.  In the JMP data table, in row 1 type "Don't Smoke" in the Smoking column, "Never Drink" in the Drinking column, and type the given observed count, 44, in the "Count" column. Type the info for the 1,2 cell into the second row of your JMP table. That is the observed count for the people who Don't Smoke and Drink Occasionally, so you will type "Don't Smoke" in the Smoking column, "Drink Occasionally" in the Drinking column and the observed count, 72, in the Count column. In the third row you will type Don't Smoke in the Smoking column, Drink Often in the Drinking column, and 12, the observed count for the 1,3 cell in the Count column. Continue in this fashion all the way to the 9th row where you will type "Heavy Smoker" in the Smoking column, "Drink Often" in the Drinking column, and 7, the observed count for the 3,3 cell in the Count column.
 
You will notice that the first two columns of the JMP table are used to specify which row and column of the two-way table you are talking about, and the third column enters the observed count for that particular cell.
 
Once you have entered in all the observed counts, select Analyze, Fit Y By X. Select "Drinking" and click "Y, Response", select "Smoking" and click "X, Factor", and select "Count" and click "Freq". Click "OK". Click the red triangle next to "Contingency Analysis of Grade by Course " at the top and deselect "Mosaic Plot" to remove that from the output. You now see a Contingency Table (or two-way table) and the "Tests" below it. (If your two-way table has the rows and columns the wrong way round compared to what the question has, that doesn't really matter, but you can fix that by changing which column you called X and which you called Y. 
 
Click the red triangle next to Contingency Table and make sure that all that is select is "Count", "Expected" and "Cell Chi Square" to display those values in each cell of the table. Note the Pearson ChiSquare is the test statistic for the problem (in the last row of the "Tests" output) and the Prob>ChiSq is the P-value for that test.  Manually highlight them with a highlighter after you print this to highlight what they request.
 
When they ask in part (c) which two cells contribute most to the test statistic, they are asking which two cells have the largest chi-square values.
 
 
Study Lesson 4 in my study book to prepare for this assignment.
 
Be sure to use your Rule of Thumb (Lesson 4) for all of the questions in this assignment to determine if your are using the pooled method or the generalized method.  Note, if you are using an older edition of my study book, you must use that insanely complicated degrees of freedom formula for any question that requires the generalized method.  Refer to #1 on the formula sheet included in your course outline to see that formula if you can't find it in my book (it is in most of the recent editions of my book, but it depends how old your book is).  Also, be sure to skim through the entire question to see if they ever specify which order they want you to subtract your means, and, if so, be sure to do as they say right from the start.
 
For Question 1, note that you have been given the Standard Errors of x-bar (the SE values), so you will have to do some algebra to determine the standard deviations.  I give you the formula for SE of x-bar back in Lesson 1 of my book and also again in Lesson 4 when I first start talking about standard errors.
 
There has clearly been a mistake in question 2.  Web Assign appears to want an exact P-value when the best you can do is put bounds on it using Table E.  You should notify your prof about this.
 
Here is a link to a neat little calculator that will compute the exact P-value for you.  Just scroll down the page to the Student t distribution and all you have to do is type in your t statistic and your df value, and click "Calc p" to get the exact two-tailed P-value.
 
Here is how to do the JMP part of Question 3:
Open a New Data Table and type the data in manually in this manner:  Name your first column "Price" or something like that, and type all the prices down that column.  Which is to say, type in the four-bedroom selling prices down the column and then continue to type all the three-bedroom selling prices below that.  Double click at the top to the right of the "Price" column heading to create a new column and name it something like "Type of Home".  Down that column type something like "four-bedroom" repeatedly down that column in all the rows that have the prices for four bedroom homes.  Then type something like "three-bedroom" repeatedly down the column in the rows that have three bedroom prices.  You may want to type the phrase once and then copy and paste it down the rest of the relevant rows to ensure there are no typos.  Once you have done that, double-click the "Type of Home" column heading and confirm that the Data Type is Character and the Modeling Type is Nominal and click OK.
 
Select Analyze, then Fit Y By X.  Highlight the numeric column "Price" and click the Y, Response button.  Highlight the character column "Type of Home" and click the X, Factor button.  Click OK.
 
You should now see a graph with two vertical arrays of dots showing the prices of three and four bedroom homes separately.  (If you don't see that graph. for example, if you see a Mosaic Plot instead, that means you do not have the Data Type and Modeling Type correct for your columns.  Go back to your data table and make sure you have the correct Data Type and Modeling Type as I outline above.)  Click the red triangle above the graph and select "Display Options" and select Box Plots to see side-by-side boxplots.  That will enable you to get a feel for the symmetry or skewness of the distributions to help you decide if use of t is acceptable.  Even if use of t is not acceptable, you are going to use it anyway.  Click the red triangle again and select "Means and Std Dev" to get a summary of the means and standard deviations of the two samples.  Click the red triangle again and select "t-Test" to get the output for a hypothesis test and  confidence interval assuming unequal variances.  Click the red triangle again and select "Means/Anova/Pooled t" to get the output that includes a hypothesis test and  confidence interval assuming equal variances.  Click the red triangle again and select "Set α level" to have the outputs change the confidence intervals to your desired level of confidence.  For example, if you want 98% confidence intervals you would set alpha to be .02, or, if you want 90% confidence intervals, you would set alpha to be .10.  In other words, α = 1 - C.
 
By the way, I have no idea what they are getting at in part (e), so your guess is as good as mine.  I think they mean they are not simple random samples (SRS) since the data is strictly from one place instead of all over the country.  There are all sorts of reasons you could give as to why you should not use t in this case, I have no idea how you could justify using t if the samples are not random.  You can take it from there.
 
When you are using JMP to do a two-sample hypothesis test or confidence interval, watch which way it is subtracting.  It may not do it the way you expected.  For example, you may have called "A" sample 1 and "B" sample 2, so you would expect to do A - B, but JMP may do B - A.  Look at the two sample means JMP computes for A and B, then check if the "difference" in its t test has computed A -  B or B - A.
 
If JMP is not subtracting the means in the order you wish it to, do this:
Once you have decided who you are naming Sample 1 and Sample 2 (and that means you will be subtracting in that order, "Sample 1" - "Sample 2") or they have told you which order they want, make sure JMP does it the way you expect.  Click Column 2 to highlight the entire column.  Select "Cols,Validation, List Check..." (in the Columns toolbar at top).  It will show the labels you have written for Column 2.  The label written first is what JMP will consider Sample 1.  If you don't like the order it shows, highlight the label and click "Move Up" or "Move Down" to change the order of the labels.
 
If you do not do this, your signs will be all wrong.  For example, the signs in your lower and upper limits for your confidence interval for the difference between the means would be the opposite of what they should be.
 
Tip:  When you want to do a one-sided test, if JMP has a positive test statistic, you must be doing an upper-tailed test; if JMP has a negative test statistic, you must be doing a lower-tailed test.  But, again, watch the way JMP has subtracted the two means to identify who is who.
 
Do the JMP in Question 4 just like I showed you what to do in Question 3 above.  Your first column should have all the SSHA scores and your second column will be a character column where you type in women and men in the appropriate rows.  Always make the numeric column Y and the character column X when you select Fit Y By X.