Stat 2000: ICYMI Tips for Assignment 2

Published: Tue, 02/16/16

Midterm Exam Prep Seminar Sat. Feb 27
Don't have my book or audio lectures? You can download a free sample here:
Did you read my tips on how to study and learn Stat 2000?  If not, here is a link to those important suggestions:
Did you read my Calculator Tips?  If not, here is a link to those important suggestions:
Did you see my tips for Assignment 1? Click here.
Tips for Assignment 2
Please note that I made major changes to my book in September 2014.  If you are using a book older than September 2014, you are missing about 100 pages of new material and an entirely new lesson on Probability.

Study Lessons 4 and 5 in my study book (if you have it) to learn the concepts involved in Assignment 2.  Remember my advice in the tips above.  Don't start working on the assignment too soon.  Study and learn the lesson first, and use the assignment to test your knowledge.  Of course, always seek out assistance from my book, your course notes, etc. if you ever hit a question you don't understand, but try not to be learning things as you do an assignment.  Learn first, then put your learning to the test.

To type in formulas you are using and to show your numbers subbed into the formulas you can click the Equation Editor button in the toolbar that looks like the Sigma Summation symbol (you have to click the "..." other options button to see the sigma formula input button.  Then click the various buttons to make your fractions and enter the symbols.  However, the Equation Editor is extremely slow and clunky.  Personally, I would never use it.  Just type ordinary text explaining what you are doing if you think you should show some work.

Exception: Always do any JMP stuff open-book.  Have my tips in front of you, and let me guide you step-by-step through any JMP stuff.  JMP is just "busy" work.  The sooner you get it done and can move on to productive things like understanding the concepts and interpreting the JMP outputs, the better off you will be.  Then again, since you never have to upload the JMP printouts, perhaps you might not even bother to do the JMP at all.  Most questions can be answered by hand even when they told you to use JMP.
Question 1
Note that you will be using Table D for this and other questions on this assignment.  Be sure to download it from the Resources section in Content, or here is a link:

You should recognize that this is a matched pairs problem.  Very similar to my questions 3 and 4 in Lesson 4

Part (a)
Note that the A and B scores in a matched pair are dependent, not independent.  However, each pair is independent of other pairs.  For a small sample, the differences must be normally distributed for us to reliably use t in a matched pairs test.

Part (b)
Standard confidence interval problem.

Parts (c)-(g)
Taking you through a matched pairs hypothesis test using all 5 steps.
Question 2
This is obviously a two-sample problem like my questions 6 and 7 in Lesson 4.  Be sure you use the Rule of Thumb to decide whether you will use the pooled method or the unpooled method.  If ratio of standard deviations is less than 2, you pool. 

Note that they told you to do M-W in part (b), so be sure to consistently do that throughout the whole problem.  In other words, 1 is men and 2 is women throughout the problem.

Part (a)
Use the appropriate method to determine the df based on your Rule of Thumb.  Note that you are given the Standard Error and degrees of freedom formulas for both two-sample methods on your formula sheet.

Part (b)
Standard confidence interval problem.

Parts (c), (d), (f), and (g)
Taking you through a 2-sample hypothesis test using all 5 steps.  Be sure to tell them the hypotheses even though they forgot to ask for them.

Part (h)
I discuss this concept in my Lesson 4, questions 6(c) and 7(c).

Part (e)
Steps to do the JMP (But, really, why bother?).

If you want to avoid using JMP, but need to know the exact P-value to answer part (e), there is this nice easy to use P-value calculator on the web.  Note that it gives you a two-tailed P-value, but it should be pretty obvious what the P-value would be if the test is only one-tailed.

Steps for JMP:
The key thing to understand is that you will type all the scores down the first column.  Double-click Column 1 and give it a name that describes the variable both scores are measuring.  Here, that is Calories.  Type all the Calories scores for M down Column 1, and then continue to type all the Calories scores down Column 1 for W.  That means you should have a total of 18 rows when you are done, 10 scores from M and 8 scores from W. 

Now double-click the region at the top to the right of Column 1 to create a new column.  Call that column Gender and type M repeatedly down column 2 in all 10 rows that have scores from M in Column 1.  Then type W in the rest of the rows. I suggest you type your first word, then copy and paste it in all the other relevant cells in Column 2, then type your second word and copy and paste it to ensure there are no typos.

Thus, I would have two columns of data.  The first column shows all the numerical data scores (all the Calories scores) and the second column labels the data in the first column telling me which group the scores belong to (M or W).

Right-click Column 1 and select "Column Info" and confirm that its Data Type is Numeric and its Modeling Type is Continuous, changing the settings if necessary.  Right -click Column 2 and select "Column Info" and confirm that its Data Type is Character and its Modeling Type is Nominal, changing the settings if necessary.

Now select "Analyze, Fit Y By X".  Select Calories and click Y, Response and select Gender and click X, Factor.  Click OK. 

You will see a graph with dots plotted representing all the scores in two columns.  If you don't see two columns of dots for the two samples, you have not labelled your data correctly!   Close the screen and go back to your data table.  Follow my steps above to right-click each column and select Column Info, and make sure that the Data Type for Calories is Numeric and its Modeling Type is Continuous, changing the settings if necessary.  Repeat for Gender and confirm that its Data Type is Character and its Modeling Type is Nominal, changing the settings if necessary.

Once you have the graph showing the vertical array of dots for your two samples, you are ready to analyze the data.

Click the red triangle and select Means and Std Dev to get a summary of the means and standard deviations.   Look at the standard deviations in this output and use your Rule of Thumb to decide if you wish to use the pooled test or conservative (unpooled) test.  Of course, these values for the mean and standard deviations should match what you were given at the start (if rounded accordingly).

Click the red triangle and select Display Options and select Box Plots if they have requested side-by-side boxplots (not requested for this question). 

Click the red triangle and select Means/Anova/Pooled t to get JMP to do the pooled two-sample t test.  Click the red triangle again and select t-Test to get JMP to do the generalized two-sample t test (not pooling). You will note that, when JMP does the pooled t-test, it says, under the "t Test" title bar, "Assuming equal variances."  When JMP does the generalized t-test, it says, under the "t Test title bar, "Assuming unequal variances."

Look carefully at the JMP printouts to confirm if it is doing "M- W" like you did earlier in the question.  I  believe it will do so.  If, however, it is doing "W- M", note that that is fine, too.  But that would mean everything is backwards.  The confidence interval limits will have the wrong signs, and be the wrong way around.  The test statistic will have the wrong sign.  Finally, the P-value for an upper-tailed test would now be the P-value for a lower-tailed test.

JMP does not know whether you are doing an upper-tailed, lower-tailed or two-tailed test, so it gives you the P-value for all three.  You have to be clear yourself what the alternative hypothesis is, and therefore which of the three P-values are correct. 
Question 3
Just like question 2 above.

Part (a)
The estimate for the common variance is the pooled sample variance.  Obviously, this is a pooled 2-sample problem.

Part (f)
This is a concept I don't discuss until Lesson 5 of my book.  See The Connection between t and F for Two-Sample Analysis towards the end of Lesson 5.  Quite simply, when appropriate, F= t-squared, and the P-values are identical.  So there is no real calculation to be done here other than squaring your test statistic from earlier.  They do not want you to do the Anova method here.

Part (g)
Again, you can do JMP if you insist following the same steps as I outlined above in question 2, but why bother?  Just use the P-value calculator.
Question 4
I teach you how to interpret confidence intervals in Lesson 1 and how to interpret P-values in Lesson 2.  Be careful though, those were interpretations for confidence intervals or P-values for the mean.  Now you are interpreting for the difference between two means or the mean difference in matched pairs, so be careful in your wording.
Question 5
This is an Anova question. It is very similar to my questions 1 and 2 in Lesson 5

Note that you will be using Table E for this and other questions on this assignment.  Be sure to download it from the Resources section in Content, or here is a link:

Part (a)
I tell you the assumptions we make in Anova in Lesson 5.

Part (b)
I tell you the hypotheses we test in Anova in Lesson 5.

Part (c)
This question should be done by hand (i.e. with your calculator, not with JMP ).  You are given the three means and standard deviations then do as I do in my Lesson 5, questions 1 and 2.  Follow my example to compute the overall mean, SSG, MSG, SSE, SSE, and the F statistic.

You will then be able to check your answers when you use JMP.  Note that your answers may be slightly different than JMP's since you have used rounded off values for the means and standard deviations.

Personally, I would just write it in text like this, "SSG = 5(16-30) + 4(12-30) + ..." (I am making up those numbers.)  Then MSG = SSG/DFG = 51/2 = etc.  You can also use the Math Input button in the toolbar if you want to get fancy (but it is so slow.....).

Part (d)
Put bounds on the P-value using Table E, like I do in my questions 1 and 2.

Part (e)
Again, interpret the P-value, keeping in mind what the null hypothesis is here. I give you examples of interpreting a P-value in my Lesson 2, question 6, but remember that here the null hypothesis is that all the means are equal.

Part (g)
Read the test statistic and P-value off your JMP output above, like I show you to do in Lesson 5, questions 5 to 7.  Your answers should agree with what you computed by hand, allowing for rounding issues (I would expect no more than one-decimal place accuracy in the values).

Part (h)
Get the critical value using Table E, like I do in my questions 1 and 2 and state the F decision rule.

To do ANOVA with JMP (but, again, why bother? Just use the P-value calculator.)

The key thing to understand is that you will type all the scores down the first column.  Double-click Column 1 and name it Return.  Then type all 6 F scores down the column, then all 8 E scores, then all 7 U scores.  You should have 21 rows when all the scores have been entered in.

Then create Column 2 and name the column Industry.  Type F repeatedly down column 2 in all 6 rows that have F scores in Column 1.  Then type E for the appropriate cells in column 2 and type U for the rest of the rows.  I suggest you use Copy and Paste to ensure there are no typos here.

Thus, I would have two columns of data.  The first column shows all the numerical data scores (all the returns) and the second column labels the data in the first column telling me which industry the returns belong to.

Right-click Column 1 and select "Column Info" to confirm that its Data Type is Numeric and its Modeling Type is Continuous, changing the settings if necessary.  Right-click Column 2 and select "Column Info" to confirm that its Data Type is Character and its Modeling Type is Nominal, changing the settings if necessary.

Now select Analyze, Fit Y By X.  Select Return and click Y, Response and select Industry and click X, Factor.  Click OK.  You will see a graph with dots plotted representing all the prices in three columns, one column of dots for each region.  If you don't see this graph at all, you did not label your columns properly.  Go back and make sure Column 1 is Numeric and Continuous and Column 2 is Character and Nominal.

Click the red triangle and select Display Options and select the Box Plot to get the side-by-side boxplots they request. Click Display Options again and deselect Show Points and Grand Mean to remove those things from the graph if you wish.

Click the red triangle and select Means and Std Dev to get a summary of the means and standard deviations.  Confirm that JMP computed the same means and standard deviations for each region that you were given at the start of the question. 

Click the red triangle and select Means/Anova/Pooled t to get JMP to do the Anova.  Confirm that JMP's values match your computations (allowing for minor differences due to rounding issues). 
Question 6
This is very similar to my questions 3 and 4 in Lesson 5

Part (a)
You should know what the hypotheses are for Anova.

Part (b)
Just show your work much as best you can.  I wouldn't lose a lot of sleep over it.  There is a table insert feature in the toolbar if you want to summarize everything in a table as they suggest, but you could just use tabs and line things up on different lines.  I am sure nobody is going to care, as long as you clearly tell them the answers for DFM, SSM, MSM, DFE, SSE, MSE, and F.

Part (c)
Put bounds on the P-value using Table E, like I do in my questions 1 and 2.

Part (d)
Remember that conclusions always refer to the alternative hypothesis in all hypothesis tests.  Either you are convinced the alternative is correct, or you are not convinced the alternative is correct.

Part (e)
Again, Table E above will tell you the critical value.

Part (f)
They want the pooled sample standard deviation here.  The square root of MSE.

Part (g)
Make sure you read my section about Confidence Intervals in Anova in Lesson 5, question 5 before you answer this question!  Look at my questions 5(b) and 6.