Stat 1000: Tips for Assignment 2
Published: Fri, 10/07/11
Hi ,
You are receiving this email because you indicated when you signed up for Grant's Updates that you are taking Stat 1000 this term. If in fact, you do not want to receive tips for Stat 1000, please reply to this email and let me know.
Please note that my first midterm exam prep seminar for Stat 1000 will be on Saturday, Oct. 8, from 9 am to 9 pm (Thanksgiving Weekend). Sorry about that, but the first exam is the next week, so there is nothing else I can do. If you would like complete info, and/or would like to register for the seminar, please click this link:
Join Grant's Tutoring on Facebook or follow Grant on Twitter.
Simply go to www.grantstutoring.com and click the Facebook and/or Twitter icons.
If you ever want to look back over a previous tip I have sent, do note that all my tips can be found in my archive. Click this link to go straight to my archive:
Did you miss my Tips on How to Do Well in this Course? Click here
Did you miss my Tips for Assignment 1? Click here
If you are taking the course by Distance/Online (Sections D01, D02, etc.), click here for my tips for your Assignment 2.
If you are taking the course by classroom lecture (Sections A01, A02, etc.), click here for my tips for your Assignment 2.
You should study Lesson 2: Regression and Correlation and Lesson 3: Designing Samples and Experiments in the current edition of my study book to prepare for this assignment. Lesson 2 teaches the concepts for questions 1 and 2. Lesson 3 teaches the concepts for questions 3, 4 and 5. If you are using an older edition of my book, note that these are Lessons 3 and 4 in older editions.
Question 1 is supposed to be done by hand, but why not get JMP to do it for you, then you can just copy out by hand the Scatterplot JMP makes for you. I think you will find my Lesson 2, question 1 very helpful in understanding how to do this question.
I am certain that, even though you are supposed to do it by hand, you are still allowed to use the Stat mode on your calculator to compute the mean and standard deviation of both variables to assist you in the computation of r, a and b. You should clarify this with your prof, but surely they are not going to make you work out the means and standard deviations by hand also. Follow the steps in the Appendix of my book showing you how to enter x,y data pairs into your calculator in Linear Regression mode. I show you how the calculator gives you r, a and b, but your calculator also gives you x-bar, y-bar, Sx, and Sy. Just click the appropriate buttons. For example, on Sharps, you click "RCL 4" to get x-bar, "RCL 5" to get Sx, "RCL 7" to get y-bar, and "RCL 8" to get Sy. Record every single decimal place the calculator gives you to ensure your computations are accurate.
1(a). Read my tips when I teach Lesson 2, question 1(a) in my book to make sure you correctly establish who is x and who is y.
1(b). Read my tips when I teach Lesson 2, question 1(b) in my book to make sure you make the scatterplot correctly. You may also want to have JMP help you here.
1(c). To compute the correlation coefficient by hand, follow my
example in Lesson 2, question 1, part (c). Note, you are not given
the means and standard deviations for x and y already, so I am sure you are allowed to use the Linear Regression Stat Mode on your
calculator to tell you the means and standard deviations of both x and
y. Put your calculator in Linear Regression Stat Mode (see Appendix D
of my book). After you enter all the (x,y) data points, you can ask it for the mean
and standard deviation of the x values and the mean and standard
deviation of the y values. For example, Sharps use "RCL 4" to get x-bar
and "RCL 7" to get y-bar. "RCL 5" gives you Sx and "RCL 8" gives you
Sy.
Record every single decimal place your calculator gives you for each
calculation, or else your answers won't be accurate enough. Of course, your calculator actually tells you the value of r, so you
can use that as a check.
1(d). Use the formulas I show you in question 1(e) of my book, Lesson 2, to compute a and b (also given on page 1 of my book on the formula sheet). Of course, you can compare the answers you get with the values your calculator gives you in the Linear Regression Stat mode.
1(e). I show you how to compute a residual in 1(j) of my book, Lesson 2. I think all they mean by "what does the sign tell you?" is what really happened higher or lower than you predicted?
Here is how you can use JMP to do Linear Regression:
Here is how to use JMP for linear regression. First
copy and paste the data into a New Data Table the usual way (see my
previous homework tips if you are not sure how to paste the data). If
you have to type the data in manually, simply double-click the space to
the right of "Column 1" to create "Column 2". Enter the X data down
column 1 and the Y data down column 2. Be sure to double-click each
column to give it an appropriate name and to ensure the Data Type is
Numeric and the Modeling Type is Continuous.
Select Analyze, then Fit Y By X. Highlight
the column you have determined should be X, and click the X, Factor
button. Highlight the column you have determined should be Y and click
the Y, Response button. Click OK.
You should now see a scatterplot. Click the red tiangle next to "Bivariate Fit" and select "Density Ellipse, .99". A stupid ellipse shows up on your scatterplot that you don't want, but you will also see an output called "Correlation" show up below the scatterplot. Click the blue triangle next to that to open it up and it shows you the mean and standard deviation of x and y and also shows you r, the correlation. Click the red triangle under the scatterplot where it says "Bivariate Normal Ellipse" and deselect "Line of Fit" to remove that stupid ellipse from your scatterplot.
Click the red triangle
above the scatterplot and select Fit Line and JMP will draw in the
least-squares regression line. Note, it shows you the regression
equation directly below the scatterplot. JMP also shows you the value
of r-squared (the coefficient of determination), rather than r, the
correlation coefficient. Remember, the coefficient of determination is
the percentage of y's variation explained by the regression equation.
You can always square root this number to get r, the correlation
coefficient, but use your scatterplot to help you decide if r is
negative or positive because your calculator can't tell you that.
If you want to get rid of anything, click the red triangle
and deselect anything you don't want to see. Note, if you click the
blue triangle next to something, that will make part of the output
disappear as well, if you wish. Just click the blue triangle again to
make it reappear.
Use JMP as I show above to answer question 2. Note that JMP does not answer parts (d) and (e), you have to make the interpretations and predictions yourself.
In question 3, note that I teach you what a lurking variable is in Lesson 2. That can also be called an outside factor which I discuss in Lesson 3. This entire question is depending on you using your common sense to offer possible explanations why the higher death rate with C is not necessarily a sign that C is more dangerous than the others. There are no wrong answers here, as long as you offer plausible explanations.
Question 4 is similar to my question 7, in Lesson 3.
Question 5 involves the concepts I discuss in the first half of Lesson 3, up to the end of question 5.
So far I have not seen a Web Assign assignment for students taking the course in-class. Only the distance students appear to be using Web Assign. If you are taking the course in-class, and are using Web Assign, please email a copy of your assignment to me, and I will happily make up some tips.
Continue to study Lesson 1 in my study book (if you have it) to learn the concepts involved in HW 02.
Ignore any references to JMP 6SE or Crunchit!.
You are using JMP 8 in this course. The assignment is just an old
assignment that they forgot to update. Use JMP 8 anytime they tell you
to use computer stuff.
Question 3 should be done manually. Note to enter the answers correct to 0.1, they mean round your answers off to one decimal place.
Question 4 should be done manually. Be sure to read the Appendix at the back of my book to learn how to use
Stat Mode in your calculator to compute a mean and standard deviation
quickly. By "nearest decimal place", they mean round your answers off
to one decimal place.
Question 5 (the IQ and GPA question):
Click the link to the data file, then select and copy
the entire data set (you can click "Ctrl A" on your keyboard to select
all, then click "Ctrl C" to copy it all). Having opened a "New Data
Table" in JMP, select "Edit" then "Paste with Column Names" to paste the
data in. Double-click the "iq" column name at top and confirm that JMP
has the "Data Type" as "Numeric" and the "Modeling Type" as
"Continuous", changing those settings in the drop-down list if
necessary. Click OK. Do the same for the "gpa" column. Important:
Double-click the "gender" column and make sure that JMP has the "Data
Type" as "Character" (it probably doesn't) and the "Modeling Type" as
"Nominal" (it probably doesn't), changing those settings in the
drop-down list if necessary. Click OK. Finally, take a look at
the last row of data that has been pasted into JMP. If it just shows a
bunch of dots instead of numbers, click that row to highlight it then
right-click and select "delete rows" to delete that row. Of course, do
not delete any row that has numbers (data) in it!
To find the mean, standard deviation and median in part (a):
Select "Analyze" then "Distribution".
Highlight "iq" in the pop-up menu and click the "Y, Columns" button.
Click OK. You are then taken to a screen that shows a histogram among
other things. You will find the mean and standard deviation in the
"Moments" section and the median in the "Quantiles" section.
To make the boxplots and histogram in part (b): In the toolbar
at the top of your data spreadsheet, select "Analyze" then
"Distribution". Select the "gpa" column and click the "Y, Columns"
button. Click OK. Your histogram appears sideways but they didn't ask
you to switch it horizontally, so don't bother. If they want to see it
the typical way (and they will request that if they want it), click the
red triangle next to your variable above the histogram and select
Histogram Options from the drop-down menu. Select Horizontal Layout.
Click the red triangle next to "gpa" and select "quantile boxplot" (if
it isn't checked already) and "outlier boxplot" as well to get the
desired boxplots. Click the blue triangles next to "Quantiles" and
"Moments" to hide that stuff, then "select all" (click "Ctrl A" on your
keyboard) and then "copy" (click Ctrl C). Paste it into your document.
Be sure to type in your answers to the question they ask in part b
underneath the graphs you pasted into your document. Remember how
skewness and/or outliers affects a mean and median.
To make the side-by-side boxplots in part (c): Back in your data spreadsheet, select "Analyze" then "Fit Y By X".
Highlight "gpa" and click "Y, Response". Highlight "gender" and click
"X, Factor". Click OK. Now click the red triangle and select "Display
Options", then select "Box Plots" to get your side-by-side boxplots.
Select all and copy and paste into the same document you already have in
part (b). Make sure you type your answer to their question below these
boxplots in your document. You can now save the file and upload it
into Web Assign.
Question 6 should be done manually. Read my section in Lesson 1 on "The Effect of Changing Units on Centre
and Spread" to properly prepare for this question.