Stat 2000 Tips for Assignment 6
Published: Mon, 11/29/10

Hi ,
You are receing this e-mail because you indicated when you signed up for Grant's Updates that you are taking Stat 2000 this term. If in fact, you are not taking Stat 2000, please reply to this e-mail and let me know, and I will fix that.
If you missed the final exam seminar I did on Nov. 28, I am trying to set up an audio podcast where you can listen to the entire seminar at your convenience for a reasonable fee. Watch for a future e-mail alerting you to this once I have the podcast up and running.
Throughout the term I will send you all sorts of tips to help you study and learn the course. You probably already have done so, but, if not, I strongly recommend you purchase my Basic Stats 2 Study Book. You will find it a great resource to learn the course. I pride myself in explaining things in clear, everyday language. I also provided numerous examples of all the key concepts with step-by-step solutions. You can order my book at UMSU Digital Copy Centre at University Centre at UM campus. They make the book to order so please allow one business day. The book is split into two volumes and each volume costs $45 + tax.
If you ever want to look back over a previous tip I have sent, do note that all my tips can be found in my archive. Click this link to go straight to my archive:
Tips for Assignment 6: Simple and Multiple Linear Regression
Study Lesson 10 in my book, if you have it, to prepare for this assignment.
Question 1 requires you to finish filling in the ANOVA table as I do in question 3 of Lesson 10 in my book. Note, to get the P-value you will be using Table E and listing the bounds, just like you did back in Lesson 5 of my book.
Question 2 gives you all the info you need to compute the test statistic for zero correlation. Part b is all about the reliability of your samples.
You will use JMP for question 3. Open a "New Data Table" and create two columns. Name the first column "Diameter" and the second column "Height". Remember, to create a new column, simply double-click in the space at the top of the column, to the right of a pre-existing column.
Question 3(a) to (e): Select "Analyze" then "Fit Y by X". If you skim through all the parts of the question, it is quite clear which is x and which is y in this problem. Highlight the appropriate variable and click "Y, Response" and highlight the appropriate variable and click "X, Factor". Click OK. You will now see a scatterplot.
Click the red triangle next to "Bivariate Fit..." at the top and select "Density Ellipse", and select ".99" (it doesn't actually matter which percentage you select because none of that matters). You will then see a section in the output appear called "Correlation". Click the blue triangle next to that to open up that part of the output and you will now see the means and standard deviations for both x and y as well as the correlation coefficient.
The nuisance is that you will also have a stupid ellipse show up on your scatterplot. Directly below the scatterplot you will see a red triangle next to "Bivariate Normal Ellipse". Click that red triangle and deselect "Line of Fit" (i.e. remove the check mark next to it) to get rid of that ellipse on the graph.
Click the red triangle next to "Bivariate Fit ..." again, and select "Fit Line" to have JMP compute and graph the least-squares regression line. Note, by the "regression coefficient", they mean the slope of the line. JMP does not give you the correlation, but you are given the coefficient of determination, r-squared, so it is a simple matter to compute r. To answer parts (d) and (e), click the red triangle next to "Linear Fit" and select "Confid Curve Indiv" and "Confid Curve Fit". Those give you the prediction intervals and confidence intervals for the mean. I also discuss these in question 8 of my study book Lesson 10. Select and copy the printout and paste into a file ready for upload.
Question 3(f): Use the printout to answer these questions as I teach in Lesson 10.
Question 3(g) and (h): Similar to my questions 3 (f) and (g). You have to finish this by hand, using the printouts to get a headstart.
Question 3(i): I show you this in 3(a) and how to interpret it in my question 1(d) in Lesson 9.
Note, the values you get for your coefficients and their test statistics in a multiple linear regression are likely to be different than the values you would get if you did a simple linear regression of y versus just one of the explanatory variables. That is because a simple linear regression looks at the effect that one explanatory variable alone has on y, while a multiple linear regression looks at the effect a particular explanatory variable has on y while holding all the other explanatory variables constant (in a sense, filtering out the effects of other explanatory variables). In a simple linear regression, you could always find r, the correlation coefficient, by square rooting r-squared as given by JMP, but remember r can be positive or negative (r always has the same sign as b, the slope). In multiple linear regression, r no longer has much meaning since the model is using several explanatory variables, but you could still compute it by square rooting r-squared as given by JMP. In multiple linear regression, r is always considered to be positive since it is unable to isolate the effects of any particular explanatory variable and it is always possible that some of the explanatory variables have a negative association with y while others have a positive association.
You will use JMP for question 4. Open a "New Data Table" and copy and paste in the given data set. If you are using JMP 8, be sure to select "Edit" and "Paste with Column Names".
Question 4(a): Select "Analyze" then "Multivariate Methods" then "Multivariate". Select the GPA, IQ and Concept columns and click the "Y, Columns" button to make them all Y columns, click OK. That takes you to an output that shows a correlation matrix where you can read off the desired correlations. Note, when they ask for the proportion of total variation they are asking for the coefficient of determination (see my Lesson 9, question 1 part (d) for a discussion of the coefficient of determination).
Question4(b): Select "Analyze" then "Fit Model" and select GPA and click the "Y" button to make it a Y. Select both IQ and Concept and click the "Add" button to add them as explanatory variables in the model. Make sure the "Personality" drop-down list is set at Standard Least Squares. If it is not, and it is not even available as an option, your data has been corrupted. Go back to the data spreadsheet, double-click on each of GPA, IQ and Concept and make sure their Data Type is Numeric and their Modeling Type is Continuous and try this again. Click "Run Model" to have it perform the multiple linear regression. Everything you need is in the Parameter Estimates. (See my question 4 in Lesson 10 for an example of how to read the various outputs.)
Question 4(c): They just want the coefficient of detemination again that you just gave in part (a). The additional percentage is just the difference between that coefficient of determination and the new coefficient of determination your multiple linear regression model now has (given in the Summary of Fit).
Question 4(d): I think they are wanting you to do the t test for the Concept coefficient here which is given in the Parameter Estimates.
Question 5: Just read off the appropriate values from the given tables. Note that (h) is just a very tricky way of asking you for the confidence interval for the appropriate coefficient (slope). Recall the formula to compute the coefficient of determination from an ANOVA table (see my Lesson 10, question 3 part (a)).
Question 6: Copy and paste the data into JMP just as in question 4, then perform a multiple linear regression using "Fit Model" as shown in question 4 above. Parts b, c and d are asking you for the relevant outputs in the JMP tables. Parts a, e and f you are doing by hand.