Stat 2000: Tips for Assignment 6

Published: Tue, 04/10/12


Please note that my final exam prep seminar for Stat 2000 will be on Sunday, April 15, 2012, in room 100 St. Paul's College, from 9 am to 9 pm .  I am now ready to take registrations.  Please click this link for more information about the seminar and to sign up if you are interested:
Grant's Stat 2000 Exam Prep Seminars 
 
If you ever want to look back over a previous tip I have sent, do note that all my tips can be found in my archive.  Click this link to go straight to my archive: 
Grant's Updates Archive
 
Did you miss my Tips on How to Do Well in this Course? Click here
 
Did you miss my Tips for Assignment 5? Click here
 
If you are taking the course by Distance/Online (Sections D01, D02, etc.), click here for my tips for your Assignment 6.
 
If you are taking the course by classroom lecture (Sections A01, A02, etc.), click here for my tips for your Assignment 6.
 
Tips for Assignment 6 (Classroom Lecture Sections A01, A02, etc.)
 
You will need to continue studying Lesson 10 in my book for this assignment.  Especially the Multiple Linear Regression section.
 
Note, the values you get for your coefficients and their test statistics in a multiple linear regression are likely to be different than the values you would get if you did a simple linear regression of y versus just one of the explanatory variables.  That is because a simple linear regression looks at the effect that one explanatory variable alone has on y, while a multiple linear regression looks at the effect a particular explanatory variable has on y while holding all the other explanatory variables constant (in a sense, filtering out the effects of other explanatory variables).  In a simple linear regression, you could always find r, the correlation coefficient, by square rooting r-squared as given by JMP, but remember r can be positive or negative (r always has the same sign as b, the slope).  In multiple linear regression, r no longer has much meaning since the model is using several explanatory variables, but you could still compute it by square rooting r-squared as given by JMP.  In multiple linear regression, r is always considered to be positive since it is unable to isolate the effects of any particular explanatory variable and it is always possible that some of the explanatory variables have a negative association with y while others have a positive association.
 
You will use JMP for question 1.  Open a "New Data Table" and copy and paste in the given data set.  If you are using JMP 8, be sure to select "Edit" and "Paste with Column Names".  Double-click the Duration, Speed, Height, and Length column names and make sure their Data Type is Numeric and their Modeling Type is Continuous.
 
Question 1(a): Remember the model is using all the Greek Letters.  Do not include any numbers when writing the model.
 
Question 1(b): Select "Analyze" then "Fit Model" and select Duration and click the "Y" button to make it a Y.  Select Speed, Height, and Length and click the "Add" button to add them as explanatory variables in the model.  Make sure the "Personality" drop-down list is set at Standard Least Squares.  If it is not, and it is not even available as an option, your data has been corrupted.  Go back to the data spreadsheet, double-click on each of Duration, Speed, Height, and Length and make sure their Data Type is Numeric and their Modeling Type is Continuous and try this again.  Click "Run Model" to have it perform the multiple linear regression.  Everything you need is in the Parameter Estimates.  (See my question 4 in Lesson 10 for an example of how to read the various outputs.)
 
Question 1(c): Rememember to say that you are holding Speed and Height constant, while interpreting the effect Length has on Duration.
 
Question 1(d): Just sub the given values into the multiple regression equation JMP lists for you in the Parameter Estimates.
 
Question 1(e): As always, the residual is the Observed value of y minus the Predicted value of y.  Sub the given values of the Explanatory Variables for Scream Machine in the multiple linear regression equation to get your predicted value for Duration and subtract that from the actual Duration as listed in the data set.
 
Question 1(f): That is the standard deviation of the residuals.  You can read the Root Mean Square Error value off the Summary of Fit or square root the MSE value yourself from the ANOVA table.
 
Question 1(g): This wants you to compute the value of R-squared, the coefficient of determination.  Recall: R-squared = SSM/SST.
 
Question 1(h): That is what the ANOVA F-test is doing.
 
Question 1(i): The one with the smallest P-value for its t-ratio as listed in the Parameter Estimates is most important (since it has the strongest evidence for a linear effect with Duration).
 
http://grantstutoring.com/
 
Study Lesson 9 to review the principles of Linear Regression in my study book then study Lesson 10 at least up to the end of question 3 to prepare for this assignment.  You do not need to study the section on Multiple Linear Regression at this time.  Note that HW6, 7 and 8 will all deal with concepts from Lesson 10.
 
Question 1 is just an algebra problem, they have given you a value for x, y and the slope and you can use that to compute the intercept.  Note, they have written out the least-squares regression equation for you, and all you have to do is enter the values for the intercept and slope into the boxes.  Hint: You could actually use the formula that computes the slope where you could sub in the given values for x and y in the places where the formula calls for the mean values of x and y.
 
Question 2 gives you all the info you need to compute the confidence intervals for the slope.  I give you the appropriate formula in Lesson 10.
 
You will use JMP for question 3.  Open a "New Data Table" and create three columns.  Name the first column "Sex", the second column "Speed", and the third column "Stride rate".  Remember, to create a new column, simply double-click in the space at the top of the column, to the right of a pre-existing column.  Enter in your data, typing "female" or "male" as appropriate in the "Sex" column.  Obviously, enter in all the female data first, then all the male data.  Now, on the left of the spreadsheet where it numbers all the rows, click and drag to select all the rows that have "female" scores  Now select "Rows" and "Markers" and choose whatever marker you want to represent the females.  Now, click and drag to select the "male" rows and select a marker to use for them.  Click in the top left corner of the spreadsheet (right above row 1) to deselect the rows and we are now ready to analyze the data.
 
Question 3(a) and (b):  Select "Analyze" then "Fit Y by X".  They never make it clear which is x and which is y in this problem, but it appears they want x to be speed and y to be stride rate, so select "Stride rate" and click "Y, Response" and select "Speed" and click "X, Factor".  Click OK.  You will now see a scatterplot with the two different markers plotted distinguishing the female and male scores.  Click the red triangle next to "Bivariate Fit ..." and select "Fit Line" to have JMP compute and graph the least-squares regression line.  Select and copy the printout and paste into a file ready for upload.
 
Question 3(c): Click the red triangle next to "Linear Fit" and select "Save Residuals".  JMP will now add a fourth column to your spread sheet called "Residuals Stride rate".  Select and copy the entire data table (or just the residuals column) and paste into your file ready for upload.  They do not make it clear whether they actually want you to include the residuals in your upload, but why ask you to compute them then?
 
Question 3(d):  Click the red triangle next to "Linear Fit" and select "PlotResiduals".  I have no idea what they are getting at in this question.  You would expect to see some obvious pattern like the males tend to have positive residuals and the females have negative residuals, or something that makes the females look different from the males, but good luck seeing anything here.
 
Question 3(e):  JMP already did this test for you when you selected "Fit Line".  The ANOVA table and the "Parameter Estimates" for the "Stride rate" are giving you all the info you need, but be sure to write out your hypotheses and conclusion in the file you are uploading.  You can determine if there is a linear relationship by either testing the hypothesis about zero correlation or a hypothesis about zero slope.  JMP gives us the latter in the ANOVA and Parameter Estimates, so I would do the zero slope hypothesis.  I show you how to read these outputs in my question 3 of Lesson 10.
 
Question 4(a):  Copy and paste your data into a "New Data Table" being sure to select "Edit" and "Paste with column names" if you are using JMP 8.  Select "Analyze" then "Distribution", highlight both columns and click "Y, Columns" then click OK.  The "Moments" give you the means and standard deviations they request.
 
Question 4(b):  Select "Analyze" then "Fit Y by X".  Assign x and y as they have indicated in part (a).  Click OK.  Click the red triangle next to "Bivariate Fit ..." and select "Fit Line" to have JMP compute and graph the least-squares regression line.  You will see the least-squares regression equation directly below "Linear Fit".  I assume they want the t statistic for the correlation which is also the t statistic for the slope which you can read off the "Parameter Estimates"  (See my question 3 in Lesson 10 for how to read the printouts.)  Note, JMP gives us the coefficient of determination, r-squared which we can easily change into r.  Remember, r always has the same sign as the slope.
 
Question 5:  Use the same approach used in question 4 to get all the info they request.  Make sure you think about which is x and which is y in this problem (they pretty much spell it out in part (c)).  Note, you will use the "Parameter Estimates" to get the slope and its standard error, but then finish computing the confidence interval yourself.