Stat 2000: REVISED Tips for Assignment 5

Published: Fri, 04/01/16

I have added a little extra help for question 1 in the assignment in my tips below.
Please read this important message I sent about the binomial table.  This has major implications regarding your final exam:
Final Exam Prep Seminar April 12
Don't have my book or audio lectures? You can download a free sample here:
Did you read my tips on how to study and learn Stat 2000?  If not, here is a link to those important suggestions:
Did you read my Calculator Tips?  If not, here is a link to those important suggestions:
Did you see my tips for Assignment 1? Click here.
Did you see my tips for Assignment 2? Click here.
Did you see my tips for Assignment 3? Click here.
Did you see my tips for Assignment 4? Click here.
Tips for Assignment 5
Please note that I made major changes to my book in September 2014.  If you are using a book older than September 2014, you are missing about 100 pages of new material and an entirely new lesson on Probability.

Study the Chi-Square Goodness of Fit part of Lesson 9: Chi-Square Tests (in other words the rest of Lesson 9).  You also will need to study Lesson 10: Review of Linear Regression and Lesson 11: Inferences for Linear Regression (up to the end of question 3, you do not need to study the Multiple Linear Regression section at this time). 

Note that they have OMITTED Multiple Linear Regression and Lesson 12: Nonparametric Tests (The Sign Test) this term. None of this appears on the current assignment, and they have omitted these topics every term lately.  DO NOT STUDY THESE SECTIONS.

To type in formulas you are using and to show your numbers subbed into the formulas you can click the Equation Editor button in the toolbar that looks like the Sigma Summation symbol (you have to click the "..." other options button to see the sigma formula input button.  Then click the various buttons to make your fractions and enter the symbols.  However, the Equation Editor is extremely slow and clunky.  Personally, I would never use it.  Just type ordinary text explaining what you are doing if you think you should show some work.

Exception: Always do any JMP stuff open-book.  Have my tips in front of you, and let me guide you step-by-step through any JMP stuff.  JMP is just "busy" work.  The sooner you get it done and can move on to productive things like understanding the concepts and interpreting the JMP outputs, the better off you will be.  Then again, since you never have to upload the JMP printouts, perhaps you might not even bother to do the JMP at all.  Most questions can be answered by hand even when they told you to use JMP.
Question 1
You will be using Table F for the first two questions.  Here is a link where you can download the table if you have not already done so:

This is not unlike my Lesson 9, questions 5 and 6.  They have not specified how many decimal places to use, so personally, I would round everything to 4 decimal places. 

There are two ways you can figure out the distribution they are suggesting:
  1. You can let the probability there are car accidents in Fall = k.  Then Spring = 2k, Summer = 3k and Winter = 4k.  Now solve k, knowing the four probabilities must add to 1.
  2. You can consider this distribution a ration (like my question 6).  They are saying Fall:Spring:Summer:Winter is a 1:2:3:4 ratio.
They make their goodness-of-fit table horizontally, while I prefer to make mine vertically. 

Note that you can insert a table from the toolbar into your answer box if you want to summarize the results into a table like they provide.  Or, if you find that too annoying (and it is pretty annoying), you could just describe in words where each answer belongs.  Such as Expected Count for Spring is <blank> and The Chi-Square for Spring is <blank>, etc.

If you get a test statistic that goes off the end of the table:
If your chi-square value is so small that it is off the left side of the table, then say the P-value is between 0.25 and 1.  If your chi-square is so large that it is off the right side of the table, say the P-value is between 0 and 0.0005.

Keep this tip in mind for any of your critical value tables.  Always imagine there is an extra column at each end.  The last column is 0 (for very small P-values), and the first column is either .50 (in the case of t) or 1 (in the case of F or chi-square since those two curves are not symmetric).
Question 2
Similar to my Lesson 9, question 8.  Note that you will want to use Table C: Binomial Distribution Probabilities to speed up the setup in this problem.  Please make sure you have read my important message about this issue and get a firm statement from your profs.

They also then talk about estimating the parameter later in the problem, parts (g) and (h), which is like my Lesson 9, question 9.

Part (f) is quite weird.  First of all, to know if there is any possibility of error here, we need to know the truth.  Technically, the distribution is NOT binomial with n=6 and p=0.5 because, when dealing cards, we are sampling WITHOUT replacement, thus the probability of a black card is changing each time and DEPENDS on how many black cards have already been dealt.  Therefore, the null hypothesis is wrong.  If we reject Ho, we have made a correct decision.  If we do not reject Ho, we have made an error (what type)?
Question 3
This is a runthrough of Linear Regression.  Be sure to study Lessons 10 and 11 in my book before attempting this and the rest of the questions in this assignment.  You should especially work through question 1 in Lesson 10 and questions 1 and 3 in Lesson 11.

Note that part (b) is asking for r-squared, the coefficient of determination as I discuss for the first time in Lesson 10, question 1(d).

Part (e) is getting at extrapolation.  Always be mindful as to whether any particular prediction is an extrapolation.

Part (f) is a two-part problem.  First, you must compute your prediction for Individual 5, then you can compute the residual.  See Lesson 10, question 1(j) for my first example of computing a residual.

Note that they give you SSE, the sum of the squared residuals, so you are able to compute the variance of the residuals (MSE = SSE/DFE).  MSE is your estimate for σ, as requested in part (g).  That is what I call Se, the standard deviation of the residuals, the estimate for σε, the standard deviation of the population of residuals.

Never forget , in a regression context, if they start talking about σ or s, they are referring to the standard deviation of the residuals for the population or sample, respectively.  To add to the confusion, they have also been known to use σ^ to represent Se. 

Parts (h), (i) and (j) use the confidence interval formulas I introduce in Lesson 11.  See questions 1 and 3 for examples.

Part (k) is testing the hypothesis for slope.  Again, see Lesson 11 for examples.

Part (l) is clearly mistaken.  You cannot get a P-value to four decimal places.  You can only put bounds on the P-value, since the test statistic is t.  The next part is where you can get an exact P-value.

Part (m) can be solved by simply feeding the t test statistic you computed and your degrees of freedom into the P-value calculator I gave you previously.  It may be slightly inaccurate since you are using all the rounded off givens to compute the test statistic in the first place.

To do Linear Regression in JMP:
Open a "New Data Table".  Enter all the data for x in Column 1 and all the data for y in Column 2.  Be sure to name the columns appropriately.  Here, Column 1, x, will be Fat and Column 2, y, will be Cholesterol.  Select "Analyze, Fit Y By X".  Highlight Fat and click "X, Factor".  Highlight Cholesterol and click "Y, Response".  Click OK.

You should now be looking at a scatterplot.  Click the red triangle and select Density Ellipse and select 0.99 (it doesn't matter; you don't want this at all, but this gives you a summary of the means, standard deviations, and the correlation coefficient, r).  Click the red triangle that appears below the scatterplot which says Bivariate Normal Ellipse and deselect "Line of Fit" to make the ellipse disappear from your scatterplot.  You will also note that there is a title bar called Correlation below the scatterplot now.  Click the blue triangle to open it up and confirm the means and standard deviations match those you were given.  If not, perhaps you were mixed up which one was x and which one was y?

Click the red triangle and select "Fit Line" to get the least-squares regression line.  You now have all the outputs you need. 

Part (o): 
Be sure to read in Lesson 11 the connection between the t test statistic for the slope and the t test statistic for the correlation.  And also the connection between t for the slope and F for the slope.  Although they want you to do a lot of this question by hand (and you certainly should since that will also happen on the exam), do note that JMP does do a lot of this stuff for you and you can use it to check your answers before you submit them.
Question 4
I show you how to fill in an Anova table like this in Lesson 11, question 3.  Otherwise, similar to the previous question.

Again, in part (b), they are asking for the standard deviation of the residuals, Se, which is the square root of MSE.
Question 5
This is similar to my Lesson 11, question 2.