Stat 1000: Tips for Assignment 2
Published: Sun, 01/29/12
Please note that my first midterm exam prep seminar for Stat
1000 will be on Saturday, Feb. 4, in room 100 St. Paul's College, from 9
am to 9 pm . I am now ready to take registrations. Please click this link for more information about the seminar and to sign up if you are interested:
Join Grant's Tutoring on Facebook or follow Grant on Twitter.
Simply go to www.grantstutoring.com and click the Facebook and/or Twitter icons.
If you ever want to look back over a previous tip I have sent, do note that all my tips can be found in my archive. Click this link to go straight to my archive:
Did you miss my Tips on How to Do Well in this Course? Click here
Did you miss my Tips for Assignment 1? Click here
If you are taking the course by Distance/Online (Sections D01, D02, etc.), click here for my tips for your Assignment 2.
If you are taking the course by classroom lecture (Sections A01, A02, etc.), click here for my tips for your Assignment 2.
You should study Lesson 2: Regression and Correlation and Lesson 3: Designing Samples and Experiments in the current edition of my study book to prepare for this assignment. Lesson 2 teaches the concepts for questions 1 and 2. Lesson 3 teaches the concepts for questions 3, 4 and 5. If you are using an older edition of my book, note that these are Lessons 3 and 4 in older editions.
Question 1 is supposed to be done by hand, but why not get JMP to do it for you (see my steps in question 2 below for how to do linear regression in JMP), then you can just copy out by hand the Scatterplot JMP makes for you. I think you will find my Lesson 2, question 1 very helpful in understanding how to do this question.
I am certain that, even though you are supposed to do it by hand, you are still allowed to use the Stat mode on your calculator to compute the mean and standard deviation of both variables to assist you in the computation of r, a and b. You should clarify this with your prof, but surely they are not going to make you work out the means and standard deviations by hand also. Follow the steps in the Appendix of my book showing you how to enter x,y data pairs into your calculator in Linear Regression mode. I show you how the calculator gives you r, a and b, but your calculator also gives you x-bar, y-bar, Sx, and Sy. Just click the appropriate buttons. For example, on Sharps, you click "RCL 4" to get x-bar, "RCL 5" to get Sx, "RCL 7" to get y-bar, and "RCL 8" to get Sy. Record every single decimal place the calculator gives you to ensure your computations are accurate.
1(a). Read my tips when I teach Lesson 2, question 1(b) in my book to make sure you make the scatterplot correctly. You may also want to have JMP help you here.
1(b). To compute the correlation coefficient by hand, follow my
example in Lesson 2, question 1, part (c). Note, you are not given
the means and standard deviations for x and y already, so I am sure you are allowed to use the Linear Regression Stat Mode on your
calculator to tell you the means and standard deviations of both x and
y. Put your calculator in Linear Regression Stat Mode (see Appendix D
of my book). After you enter all the (x,y) data points, you can ask it for the mean
and standard deviation of the x values and the mean and standard
deviation of the y values. For example, Sharps use "RCL 4" to get x-bar
and "RCL 7" to get y-bar. "RCL 5" gives you Sx and "RCL 8" gives you
Sy.
Record every single decimal place your calculator gives you for each
calculation, or else your answers won't be accurate enough. Of course, your calculator actually tells you the value of r, so you
can use that as a check.
When they ask, "What does this value tell us?" I assume they want you to interpret the value of r.
1(c). Use the formulas I show you in question 1(e) of my book, Lesson 2, to compute a and b (also given on page 1 of my book on the formula sheet). Of course, you can compare the answers you get with the values your calculator gives you in the Linear Regression Stat mode.
Here is how you can use JMP to do Linear Regression:
Here is how to use JMP for linear regression. First
copy and paste the data into a New Data Table the usual way (see my
previous homework tips if you are not sure how to paste the data). If
you have to type the data in manually, simply double-click the space to
the right of "Column 1" to create "Column 2". Enter the X data down
column 1 and the Y data down column 2. Be sure to double-click each
column to give it an appropriate name and to ensure the Data Type is
Numeric and the Modeling Type is Continuous.
Select Analyze, then Fit Y By X. Highlight
the column you have determined should be X, and click the X, Factor
button. Highlight the column you have determined should be Y and click
the Y, Response button. Click OK.
You should now see a scatterplot. Click the red tiangle next to "Bivariate Fit" and select "Density Ellipse, .99". A stupid ellipse shows up on your scatterplot that you don't want, but you will also see an output called "Correlation" show up below the scatterplot. Click the blue triangle next to that to open it up and it shows you the mean and standard deviation of x and y and also shows you r, the correlation. Click the red triangle under the scatterplot where it says "Bivariate Normal Ellipse" and deselect "Line of Fit" to remove that stupid ellipse from your scatterplot.
Click the red triangle
above the scatterplot and select Fit Line and JMP will draw in the
least-squares regression line. Note, it shows you the regression
equation directly below the scatterplot. JMP also shows you the value
of r-squared (the coefficient of determination), rather than r, the
correlation coefficient. Remember, the coefficient of determination is
the percentage of y's variation explained by the regression equation.
You can always square root this number to get r, the correlation
coefficient, but use your scatterplot to help you decide if r is
negative or positive because your calculator can't tell you that.
If you want to get rid of anything, click the red triangle
and deselect anything you don't want to see. Note, if you click the
blue triangle next to something, that will make part of the output
disappear as well, if you wish. Just click the blue triangle again to
make it reappear.
Use JMP as I show above to answer question 2. Be sure to read my question 1(a) for tips on how to identify the explanatory and response variable. Note that JMP does not answer part (e), you have to compute the residual yourself (see my question 1 for examples of all these things). Also, take a look at my question 3 for key concepts about the correlation coefficient and question 8 for a discussion of influential observations.
Question 3 is similar to my question 7, in Lesson 3.
Continue to study Lesson 1 in my study book (if you have it) to learn the concepts involved in HW 02.
Ignore any references to JMP 6SE or Crunchit!.
You are using JMP 8 in this course. The assignment is just an old
assignment that they forgot to update. Use JMP 8 anytime they tell you
to use computer stuff.
Question 3 should be done manually. Note to enter the answers correct to 0.1, they mean round your answers off to one decimal place.
Question 4 should be done manually. Be sure to read the Appendix at the back of my book to learn how to use
Stat Mode in your calculator to compute a mean and standard deviation
quickly. By "nearest decimal place", they mean round your answers off
to one decimal place.
Question 5 (the IQ and GPA question):
Click the link to the data file, then select and copy
the entire data set (you can click "Ctrl A" on your keyboard to select
all, then click "Ctrl C" to copy it all). Having opened a "New Data
Table" in JMP, select "Edit" then "Paste with Column Names" to paste the
data in. Double-click the "iq" column name at top and confirm that JMP
has the "Data Type" as "Numeric" and the "Modeling Type" as
"Continuous", changing those settings in the drop-down list if
necessary. Click OK. Do the same for the "gpa" column. Important:
Double-click the "gender" column and make sure that JMP has the "Data
Type" as "Character" (it probably doesn't) and the "Modeling Type" as
"Nominal" (it probably doesn't), changing those settings in the
drop-down list if necessary. Click OK. Finally, take a look at
the last row of data that has been pasted into JMP. If it just shows a
bunch of dots instead of numbers, click that row to highlight it then
right-click and select "delete rows" to delete that row. Of course, do
not delete any row that has numbers (data) in it!
To find the mean, standard deviation and median in part (a):
Select "Analyze" then "Distribution".
Highlight "iq" in the pop-up menu and click the "Y, Columns" button.
Click OK. You are then taken to a screen that shows a histogram among
other things. You will find the mean and standard deviation in the
"Moments" section and the median in the "Quantiles" section.
To make the boxplots and histogram in part (b): In the toolbar
at the top of your data spreadsheet, select "Analyze" then
"Distribution". Select the "gpa" column and click the "Y, Columns"
button. Click OK. Your histogram appears sideways but they didn't ask
you to switch it horizontally, so don't bother. If they want to see it
the typical way (and they will request that if they want it), click the
red triangle next to your variable above the histogram and select
Histogram Options from the drop-down menu. Select Horizontal Layout.
Click the red triangle next to "gpa" and select "quantile boxplot" (if
it isn't checked already) and "outlier boxplot" as well to get the
desired boxplots. Click the blue triangles next to "Quantiles" and
"Moments" to hide that stuff, then "select all" (click "Ctrl A" on your
keyboard) and then "copy" (click Ctrl C). Paste it into your document.
Be sure to type in your answers to the question they ask in part b
underneath the graphs you pasted into your document. Remember how
skewness and/or outliers affects a mean and median.
To make the side-by-side boxplots in part (c): Back in your data spreadsheet, select "Analyze" then "Fit Y By X".
Highlight "gpa" and click "Y, Response". Highlight "gender" and click
"X, Factor". Click OK. Now click the red triangle and select "Display
Options", then select "Box Plots" to get your side-by-side boxplots.
Select all and copy and paste into the same document you already have in
part (b). Make sure you type your answer to their question below these
boxplots in your document. You can now save the file and upload it
into Web Assign.
Question 6 should be done manually. Read my section in Lesson 1 on "The Effect of Changing Units on Centre
and Spread" to properly prepare for this question.