Stat 2000: Tips for Assignment 6
Published: Tue, 04/10/12
Did you miss my Tips on How to Do Well in this Course? Click here
Did you miss my Tips for Assignment 5? Click here
If you are taking the course by classroom lecture (Sections A01, A02, etc.), click here for my tips for your Assignment 6.
You will need to continue studying Lesson 10 in my book for this assignment. Especially the Multiple Linear Regression section.
Note, the values you get for your coefficients and
their test statistics in a multiple linear regression are likely to be
different than the values you would get if you did a simple linear
regression of y versus just one of the explanatory variables. That is
because a simple linear regression looks at the effect that one
explanatory variable alone has on y, while a multiple linear regression
looks at the effect a particular explanatory variable has on y while
holding all the other explanatory variables constant (in a sense,
filtering out the effects of other explanatory variables). In a simple
linear regression, you could always find r, the correlation coefficient,
by square rooting r-squared as given by JMP, but remember r can be
positive or negative (r always has the same sign as b, the slope). In
multiple linear regression, r no longer has much meaning since the model
is using several explanatory variables, but you could still compute it
by square rooting r-squared as given by JMP. In multiple linear
regression, r is always considered to be positive since it is unable to
isolate the effects of any particular explanatory variable and it is
always possible that some of the explanatory variables have a negative
association with y while others have a positive association.
You will use JMP for question 1.
Open a "New Data Table" and copy and paste in the given data set. If
you are using JMP 8, be sure to select "Edit" and "Paste with Column
Names". Double-click the Duration, Speed, Height, and Length column names and make sure
their Data Type is Numeric and their Modeling Type is Continuous.
Question 1(a): Remember the model is using all the Greek Letters. Do not include any numbers when writing the model.
Question 1(b):
Select "Analyze" then "Fit Model" and select Duration and click the "Y"
button to make it a Y. Select Speed, Height, and Length and click the "Add"
button to add them as explanatory variables in the model. Make sure the
"Personality" drop-down list is set at Standard Least Squares. If it
is not, and it is not even available as an option, your data has been
corrupted. Go back to the data spreadsheet, double-click on each of
Duration, Speed, Height, and Length and make sure their Data Type is Numeric and their
Modeling Type is Continuous and try this again. Click "Run Model" to
have it perform the multiple linear regression. Everything you need is
in the Parameter Estimates. (See my question 4 in Lesson 10 for an
example of how to read the various outputs.)
Question 1(c): Rememember to say that you are holding Speed and Height constant, while interpreting the effect Length has on Duration.
Question 1(d): Just sub the given values into the multiple regression equation JMP lists for you in the Parameter Estimates.
Question 1(e): As always, the residual is the Observed value of y minus the Predicted value of y. Sub the given values of the Explanatory Variables for Scream Machine in the multiple linear regression equation to get your predicted value for Duration and subtract that from the actual Duration as listed in the data set.
Question 1(f): That is the standard deviation of the residuals. You can read the Root Mean Square Error value off the Summary of Fit or square root the MSE value yourself from the ANOVA table.
Question 1(g): This wants you to compute the value of R-squared, the coefficient of determination. Recall: R-squared = SSM/SST.
Question 1(h): That is what the ANOVA F-test is doing.
Study Lesson 9 to review the principles of Linear Regression in my study book
then study Lesson 10 at least up to the end of question 3 to prepare
for this assignment. You do not need to study the section on
Multiple Linear Regression at this time. Note that HW6, 7 and 8 will
all deal with concepts from Lesson 10.
Question 1 is
just an algebra problem, they have given you a value for x, y and the
slope and you can use that to compute the intercept. Note, they have
written out the least-squares regression equation for you, and all you
have to do is enter the values for the intercept and slope into the
boxes. Hint: You could actually use the formula that computes the slope where you could sub in the given values for x and y in the places where the formula calls for the mean values of x and y.
Question 2 gives
you all the info you need to compute the confidence intervals for the
slope. I give you the appropriate formula in Lesson 10.
You will use JMP for question 3. Open a "New Data Table" and
create three columns. Name the first column "Sex", the second column
"Speed", and the third column "Stride rate". Remember, to create a new
column, simply double-click in the space at the top of the column, to
the right of a pre-existing column. Enter in your data, typing "female"
or "male" as appropriate in the "Sex" column. Obviously, enter in all
the female data first, then all the male data. Now, on the left of the
spreadsheet where it numbers all the rows, click and drag to select all
the rows that have "female" scores Now select "Rows" and "Markers" and
choose whatever marker you want to represent the females. Now, click
and drag to select the "male" rows and select a marker to use for them.
Click in the top left corner of the spreadsheet (right above row 1) to
deselect the rows and we are now ready to analyze the data.
Question 3(a) and (b):
Select "Analyze" then "Fit Y by X". They never make it clear which is x
and which is y in this problem, but it appears they want x to be speed
and y to be stride rate, so select "Stride rate" and click "Y, Response"
and select "Speed" and click "X, Factor". Click OK. You will now see a
scatterplot with the two different markers plotted distinguishing the
female and male scores. Click the red triangle next to "Bivariate Fit
..." and select "Fit Line" to have JMP compute and graph the
least-squares regression line. Select and copy the printout and paste
into a file ready for upload.
Question 3(c):
Click the red triangle next to "Linear Fit" and select "Save
Residuals". JMP will now add a fourth column to your spread sheet
called "Residuals Stride rate". Select and copy the entire data table
(or just the residuals column) and paste into your file ready for
upload. They do not make it clear whether they actually want you to
include the residuals in your upload, but why ask you to compute them
then?
Question 3(d):
Click the red triangle next to "Linear Fit" and select
"PlotResiduals". I have no idea what they are getting at in this
question. You would expect to see some obvious pattern like the males
tend to have positive residuals and the females have negative residuals,
or something that makes the females look different from the males, but
good luck seeing anything here.
Question 3(e):
JMP already did this test for you when you selected "Fit Line". The
ANOVA table and the "Parameter Estimates" for the "Stride rate" are
giving you all the info you need, but be sure to write out your
hypotheses and conclusion in the file you are uploading. You can
determine if there is a linear relationship by either testing the
hypothesis about zero correlation or a hypothesis about zero slope. JMP
gives us the latter in the ANOVA and Parameter Estimates, so I would do
the zero slope hypothesis. I show you how to read these outputs in my
question 3 of Lesson 10.
Question 4(a):
Copy and paste your data into a "New Data Table" being sure to select
"Edit" and "Paste with column names" if you are using JMP 8. Select
"Analyze" then "Distribution", highlight both columns and click "Y,
Columns" then click OK. The "Moments" give you the means and standard
deviations they request.
Question 4(b):
Select "Analyze" then "Fit Y by X". Assign x and y as they have
indicated in part (a). Click OK. Click the red triangle next to
"Bivariate Fit ..." and select "Fit Line" to have JMP compute and graph
the least-squares regression line. You will see the least-squares
regression equation directly below "Linear Fit". I assume they want the
t statistic for the correlation which is also the t statistic for the
slope which you can read off the "Parameter Estimates" (See my question
3 in Lesson 10 for how to read the printouts.) Note, JMP gives us the
coefficient of determination, r-squared which we can easily change into
r. Remember, r always has the same sign as the slope.
Question 5:
Use the same approach used in question 4 to get all the info they
request. Make sure you think about which is x and which is y in this
problem (they pretty much spell it out in part (c)). Note,
you will use the "Parameter Estimates" to get the slope and its
standard error, but then finish computing the confidence interval
yourself.