Stat 2000: Tips for Assignment 2

Published: Sun, 10/04/15

Midterm Exam Prep Seminar Oct. 18
Try a Free Sample of Grant's Audio Lectures
Don't have my book or audio lectures? You can download a sample containing some of these lessons here:
Did you read my tips on how to study and learn Stat 2000?  If not, here is a link to those important suggestions:
Did you read my Calculator Tips?  If not, here is a link to those important suggestions:
Did you see my tips for Assignment 1? Click here.
Tips for Assignment 2
Please note that I made major changes to my book in September 2014.  If you are using a book older than September 2014, you are missing about 100 pages of new material and an entirely new lesson on Probability.

Study Lessons 4 and 5 in my study book (if you have it) to learn the concepts involved in Assignment 2.  Remember my advice in the tips above.  Don't start working on the assignment too soon.  Study and learn the lesson first, and use the assignment to test your knowledge.  Of course, always seek out assistance from my book, your course notes, etc. if you ever hit a question you don't understand, but try not to be learning things as you do an assignment.  Learn first, then put your learning to the test.

To type in formulas you are using and to show your numbers subbed into the formulas click the button in the toolbar that looks like the Sigma Summation symbol (you have to click the "..." other options button to see the sigma formula input button.  Then click the various buttons to make your fractions and enter the symbols.

Exception: Always do any JMP stuff open-book.  Have my tips in front of you, and let me guide you step-by-step through any JMP stuff.  JMP is just "busy" work.  The sooner you get it done and can move on to productive things like understanding the concepts and interpreting the JMP outputs, the better off you will be.
Question 1
Note that you will be using Table D for this and other questions on this assignment.  Be sure to download it from the Resources section in Content, or here is a link:

You should recognize that this is a matched pairs problem.  Very similar to my questions 1 and 2 in Lesson 4

Part (a)
Note that the A and B scores in a matched pair are dependent, not independent.  However, each pair is independent of other pairs.


Part (b)
Standard confidence interval problem.

Parts (c)-(f)
Taking you through a matched pairs hypothesis test using the P-value method.
Question 2
This is obviously a two-sample problem like my questions 4 and 5 in Lesson 4.  Be sure you use the Rule of Thumb to decide whether you will use the pooled method or the unpooled method.

Part (a)
Use the appropriate method to determine the df based on your Rule of Thumb.

Part (b)
Standard confidence interval problem.

Parts (c)-(e)
Taking you through a matched pairs hypothesis test using the P-value method.  Be sure to tell them the hypotheses even though they forgot to ask for them.

Part (f)
I discuss this concept in my Lesson 4, questions 4(c) and 5(c).

Part (g)
To do the JMP:

The key thing to understand is that you will type all the scores down the first column.  Double-click Column 1 and give it a name that describes the variable both scores are measuring.  Here, that is GPA.  Type all the GPA scores for High School A down Column 1, and then continue to type all the GPA scores down Column 1 for High School B.  That means you should have a total of 15 rows when you are done 7 scores from A and 8 scores from B.  Be careful, don't accidentally enter the means or standard deviations as data.

Now double-click the region at the top to the right of Column 1 to create a new column.  Call that column High School and type A repeatedly down column 2 in all 7 rows that have scores from High School A in Column 1.  Then type B in the rest of the rows. I suggest you type your first word, then copy and paste it in all the other relevant cells in Column 2, then type your second word and copy and paste it to ensure there are no typos.

Thus, I would have two columns of data.  The first column shows all the numerical data scores (all the GPA scores) and the second column labels the data in the first column telling me which group the scores belong to (A or B).

Right-click Column 1 and select "Column Info" and confirm that its Data Type is Numeric and its Modeling Type is Continuous, changing the settings if necessary.  Right -click Column 2 and select "Column Info" and confirm that its Data Type is Character and its Modeling Type is Nominal, changing the settings if necessary.

Now select "Analyze, Fit Y By X".  Select GPA and click Y, Response and select High School and click X, Factor.  Click OK. 

You will see a graph with dots plotted representing all the scores in two columns.  If you don't see two columns of dots for the two samples, you have not labelled your data correctly!   Close the screen and go back to your data table.  Follow my steps above to right-click each column and select Column Info, and make sure that the Data Type for GPA is Numeric and its Modeling Type is Continuous, changing the settings if necessary.  Repeat for High School, and confirm that its Data Type is Character and its Modeling Type is Nominal, changing the settings if necessary.

Once you have the graph showing the vertical array of dots for your two samples, you are ready to analyze the data.

Click the red triangle and select Means and Std Dev to get a summary of the means and standard deviations.   Look at the standard deviations in this output and use your Rule of Thumb to decide if you wish to use the pooled test or conservative (unpooled) test.  Of course, these values for the mean and standard deviations should match what you were given at the start (if rounded accordingly).

Click the red triangle and select Display Options and select Box Plots if they have requested side-by-side boxplots (not requested for this question). 

Click the red triangle and select Means/Anova/Pooled t to get JMP to do the pooled two-sample t test.  Click the red triangle again and select t-Test to get JMP to do the generalized two-sample t test (not pooling). You will note that, when JMP does the pooled t-test, it says, under the "t Test" title bar, "Assuming equal variances."  When JMP does the generalized t-test, it says, under the "t Test title bar, "Assuming unequal variances."


Look carefully at the JMP printouts to confirm if it is doing "HA- HB" like you did earlier in the question.  I  believe it will do so.  If, however, it is doing "HB- HA", note that that is fine, too.  But that would mean everything is backwards.  The confidence interval limits will have the wrong signs, and be the wrong way around.  The test statistic will have the wrong sign.  Finally, the P-value for an upper-tailed test would now be the P-value for a lower-tailed test.

JMP does not know whether you are doing an upper-tailed, lower-tailed or two-tailed test, so it gives you the P-value for all three.  You have to be clear yourself what the alternative hypothesis is, and therefore which of the three P-values are correct. 

Part (h)
They want you to get the critical value and state the decision rule here.
Question 3
Just like question 2 above.  Now, they are using the University GPA's instead of the High School GPA's.
Question 4
I teach you how to interpret confidence intervals in Lesson 1 and how to interpret P-values in Lesson 2.  Be careful though, those were interpretations for confidence intervals or P-values for the mean.  Now you are interpreting for the difference between two means or the mean difference in matched pairs, so be careful in your wording.
Question 5
This is an Anova question. It is very similar to my questions 1 and 2 in Lesson 5

Note that you will be using Table E for this and other questions on this assignment.  Be sure to download it from the Resources section in Content, or here is a link:

Part (a)
I tell you the assumptions we make in Anova in Lesson 5.

Part (b)
I tell you the hypotheses we test in Anova in Lesson 5.

To do ANOVA with JMP:
The key thing to understand is that you will type all the scores down the first column.  Double-click Column 1 and name it Price.  Then type all the Region 1 scores down the column, then all the Region 2 scores, then all the Region 3 scores.  You should have 19 rows when all the scores have been entered in.

Then create Column 2 and name the column Region.  Type Region 1 repeatedly down column 2 in all 5 rows that have Region 1 scores in Column 1.  Then type Region 2 for the appropriate cells in column 2 and type Region 3 for the rest of the rows.  Don't type just 1, 2, and 3!  That will mess up the output.  Better to type Region 1, Region 2, and Region 3.  I suggest you use Copy and Paste to ensure there are no typos here.

Thus, I would have two columns of data.  The first column shows all the numerical data scores (all the prices ) and the second column labels the data in the first column telling me which region the cases belong to.

Right-click Column 1 and select "Column Info" to confirm that its Data Type is Numeric and its Modeling Type is Continuous, changing the settings if necessary.  Right-click Column 2 and select "Column Info" to confirm that its Data Type is Character and its Modeling Type is Nominal, changing the settings if necessary.

Now select Analyze, Fit Y By X.  Select Price and click Y, Response and select Region and click X, Factor.  Click OK.  You will see a graph with dots plotted representing all the prices in three columns, one column of dots for each region.  If you don't see this graph at all, you did not label your columns properly.  Go back and make sure Column 1 is Numeric and Continuous and Column 2 is Character and Nominal.

Click the red triangle and select Display Options and select the Box Plot to get the side-by-side boxplots they request. Click Display Options again and deselect Show Points and Grand Mean to remove those things from the graph if you wish.

Click the red triangle and select Means and Std Dev to get a summary of the means and standard deviations.  Confirm that JMP computed the same means and standard deviations for each region that you were given at the start of the question. 


Click the red triangle and select Means/Anova/Pooled t to get JMP to do the Anova.  Confirm that JMP's values match your computations (allowing for minor differences due to rounding issues). 

Part (c)
Compare the medians (the lines shown inside the boxes of the boxplots).  Are they all pretty similar? Or does at least one appear to be considerably different from the others?  This is an estimate of what may be the case for the population means, too.

Part (d)
When they ask for the values of ni, all they mean is tell them what n1 equals, n2 equals, n3 equals.  Example, n1=5, n2=8, and n3=6. (I think, but everyone may have different data sets.)  Of course, tell them I and N, too.

Part (e)
This question should be done by hand (i.e. with your calculator, not with JMP ).  You are given the three means and standard deviations then do as I do in my Lesson 5, questions 1 and 2.  Follow my example to compute the overall mean, SSG, MSG, SSE, SSE, and the F statistic.

You will then be able to check your answers when you use JMP.  Note that your answers may be slightly different than JMP's since you have used rounded off values for the means and standard deviations.

Personally, I would just write it in text like this, "SSG = 5(16-30) + 4(12-30) + ..." (I am making up those numbers.)  Then MSG = SSG/DFG = 51/2 = etc.  You can also use the Math Input button in the toolbar if you want to get fancy (but it is so slow.....).

Part (f)
Put bounds on the P-value using Table E, like I do in my questions 1 and 2.

Part (g)
Again, interpret the P-value, keeping in mind what the null hypothesis is here. 

Part (h)
Read the test statistic and P-value off your JMP output above, like I show you to do in Lesson 5, questions 5 to 7.  Your answers should agree with what you computed by hand, allowing for rounding issues (I would expect no more than one-decimal place accuracy in the values).

Part (i)
Get the critical value using Table E, like I do in my questions 1 and 2 and state the F decision rule.
Question 6
This is very similar to my questions 3 and 4 in Lesson 5

Part (f)
They want the pooled sample standard deviation here.  The square root of MSE.

Part (g)
Make sure you read my section about Confidence Intervals in Anova in Lesson 5, question 5 before you answer this question.