This is a runthrough of Linear Regression. Be sure to study Lessons 10 and 11 in my book before attempting this and the rest of the questions in this assignment. You should especially work through question 1 in Lesson 10 and
questions 1 and 3 in Lesson 11.
Note that part (b) is asking for r-squared, the coefficient of determination as I discuss for the first time in Lesson 10, question 1(d).
Part (e) is getting at extrapolation. Always be mindful as to whether any particular prediction is an extrapolation.
Part (f) is a two-part
problem. First, you must compute your prediction for Individual 5, then you can compute the residual. See Lesson 10, question 1(j) for my first example of computing a residual.
Note that they give you SSE, the sum of the squared residuals, so you are able to compute the variance of the residuals (MSE = SSE/DFE). MSE is your estimate for σ, as requested in part (g). That is what I call Se, the
standard deviation of the residuals, the estimate for σε, the standard deviation of the population of residuals.
Never forget , in a regression context, if they start talking about σ or s, they are referring to the standard deviation of the residuals for the population or sample, respectively. To add to the confusion, they have also been known to use σ^ to represent Se.
Parts (h), (i) and
(j) use the confidence interval formulas I introduce in Lesson 11. See questions 1 and 3 for examples.
Part (k) is testing the hypothesis for slope. Again, see Lesson 11 for examples.
Part (l) is clearly mistaken. You cannot get a P-value to four decimal places. You can only put bounds on the P-value, since the
test statistic is t. The next part is where you can get an exact P-value.
Part (m) can be solved by simply feeding the t test statistic you computed and your degrees of freedom into the
P-value calculator I gave you previously. It may be slightly inaccurate since you are using all
the rounded off givens to compute the test statistic in the first place.
To do Linear Regression in JMP:
Open a "New Data Table". Enter all the data for x in Column 1 and all the data for y in Column 2. Be sure to name the columns appropriately. Here, Column 1, x, will be Fat and Column 2, y, will be Cholesterol. Select "Analyze, Fit Y By X". Highlight Fat and
click "X, Factor". Highlight Cholesterol and click "Y, Response". Click OK.
You should now be looking at a scatterplot. Click the red triangle and select Density Ellipse and select 0.99 (it doesn't matter; you don't want this at all, but this gives you a summary of the means, standard deviations, and the correlation coefficient, r). Click the red triangle that appears below the scatterplot which says Bivariate Normal Ellipse
and deselect "Line of Fit" to make the ellipse disappear from your scatterplot. You will also note that there is a title bar called Correlation below the scatterplot now. Click the blue triangle to open it up and confirm the means and standard deviations match those you were given. If not, perhaps you were mixed up which one was x and which one was y?
Click the red triangle and select "Fit Line" to get the least-squares regression line. You now
have all the outputs you need.
Part (o):
Be sure to read in Lesson 11 the connection between the t test statistic for the slope and the t test statistic for the correlation. And also the connection between t for the slope and F for the slope. Although they want you to do a lot of this question by hand (and you certainly should since that will also happen on the exam), do note that
JMP does do a lot of this stuff for you and you can use it to check your answers before you submit them.