Stat 1000: Assignment 2 Tips (Classroom Lecture Sections)
Published: Wed, 02/06/13
My tips for Assignment 2 are coming below, but first a couple of announcements.
Please note that my first two-day review seminar for
Stat 1000 will be on Saturday, Feb. 2 and Sunday, Feb. 3, in room 100 St. Paul's College,
from 9 am to 6 pm each day. This seminar will cover the lessons in Volume 1 of my book.
For more info about the seminar, and to register if you have not done so already, click this link:
I am also taking registrations for all my midterm exam
prep seminars (Calculus, Linear Algebra and Statistics). Please click this link for more info and to register, if
you are interested:
Make sure you do: Tips on How to Do Well in Stat 1000
Did you read my Tips on what kind of calculator you should get?
Did you miss my Tips for Assignment 1?
Tips for Assignment 2 (Classroom Lecture Sections A01, A02, A03, etc.)
Don't have my book? You can download a free sample containing Lesson 1 at my website here:
You will have to study both Lesson 2: Regression and Correlation and Lesson 3: Designing Samples and Experiments in my Basic Stats 1 book to prepare for this assignment. Questions 1, 2 and 3 cover the concepts I teach in Lesson 2. The remaining questions are dealt with in Lesson 3 of my book.
Question 1:
To compute the correlation coefficient by hand, DO NOT follow
my
example in Lesson 2, question 1, part (c). They have given you slightly
different column headings so they want you to compute r by hand a
slightly different way. They are using this formula for the correlation
coefficient (click the link):
First, make sure you decide which variable is x and which
variable is y in this problem. Is the explanatory variable, x,
"Marijuana"? or is it "Other Drugs"?
Put your calculator into Linear Regression Stat Mode and
enter the data. Note that I show you how to do that in Appendix A of my
book.
Here is a link to a
digital copy of that appendix:
Once you have entered the data, you can confirm that you got the same answers for x̅ , y̅ , sx, and sy
that they have provided for you. That is one way to make sure you have
correctly identified which is x and which is y. BUT, you will probably
discover that your calculator has given you more accurate answers since the question has rounded off to two decimal places. We must assume
that you are expected to do all your calculations with the two-decimal
place values you have been given.
For example, after you have entered your (x,y) data points, Sharps use "RCL 4" to get x̅
and "RCL 7" to get y̅ . "RCL 5" gives you sx and "RCL 8" gives you sy.
A lot of Casio calculators (and some Texas Instruments) use
the "σ" symbol ("sigma," the Greek lowercase "s") to denote "standard
deviation". For example, in many Casios, after you
have entered the data, you first select "S.VAR." You will find it
written above one of your buttons, perhaps above the "2" or nearby on
the keyboard. It is accessed by pressing "SHIFT" then "S.VAR"
(Statistical Variables). Once you select S.VAR, you are shown a menu
where you see the symbol " x̅ " for the sample mean (select "1" and press "=" to get the sample mean). You are also told you can press "2" to get " xσn " or press "3" to get " xσn-1 ". That is Casio's way of
designating the population standard deviation and the sample standard
deviation, respectively. You will always want the sample standard
deviation, Sx, to select " xσn-1 ". Similarly, if you
select S.VAR and then press your right arrow button, you will be
scrolled through other options. For example, you can select " y̅ ", the mean of the y values, or " yσn-1 " to the get Sy, the standard deviation of the y values.
Here is how I suggest you do this problem:
Step 1: Enter all your (x,y) data points into your calculator once you have put it into Linear Regression Stat Mode.
Step 2: Ask your calculator for x̅ , y̅ , sx, and sy and confirm your answers match the givens when rounded off to two decimal places. If so, ask your calculator for r,
the
correlation coefficient, and note its value, rounded off to two decimal
places (and make sure you round, don't trim: e.g. 0.617 rounds
to 0.62) Once you have correctly found r, keep your data in the calculator ready to proceed to Step 3.
Step 3: ON
PAPER, proceed to calculate and record all the entries you will
eventually type into the boxes. The first column is telling you to subtract x̅ from each of the six x values; the second column is telling you to subtract y̅
from each of the six y values; the last column is telling you to
multiply the entries in the first two columns together. WRITE DOWN ON
PAPER EVERY SINGLE DECIMAL PLACE YOUR CALCULATOR GIVES YOU. You should
find that the answers for your products that you are putting in the
third column will have three or four decimal places (depending on the
given values for the means). In the boxes they provide, you will enter
these values rounded off to two decimal places as instructed, but I
believe that you need to use the more accurate answers you have computed
and written down on paper to compute the final answer for r.
Step 4:
Compute the total of that last column (that is the numerator in the
alternative formula for the correlation I have given you above). Be
sure to compute the total using all the decimal places you have written
on paper, not the two-decimal place values you will round off to when
you enter them in the boxes provided.
Step 5: Compute the denominator in the alternative formula for r I have shown you above by multiplying n-1, sx, and sy together (using the two decimal place values for sx, and sy they have given you). Write down the complete answer you have found, keeping all the decimal places.
Step 6: Now compute r by dividing the total
you computed in Step 4 by the answer you computed in Step 5.
Hopefully, the answer you get for r, when you round off to four decimal
places as they request, will be very close to the actual value you got
for r by using the Stat mode in your calculator. Once you have confirmed
that you
were able to compute the correct value of r by hand, enter all the
numbers you computed into the appropriate boxes. I RECOMMEND YOU ENTER
THE EXACT VALUE OF r THAT YOUR STAT MODE HAS COMPUTED FOR YOU rounded to
four decimal places, in the event that your computed value is not
precisely the same as the calculator value. My hunch is that the
assignment has been programmed to mark the value of r you compute using the rounded off numbers, whereas the
value you compute using the Stat Mode in your calculator will actually be too accurate, and possibly marked wrong if it isn't close to the rounded off answer you compute by hand.
I hope this works. If they mark your value for r wrong, try
entering the value you computed using the Stat Mode instead (assuming it
is slightly different to the value computed by hand).
Question 2
First, make sure you decide which variable is x and which
variable is y in this problem. Is the explanatory variable, x,
"Horsepower"? or is it "Mileage"?
Again, I recommend you use the Linear Regression Stat Mode on
your calculator to enter the data and check that you get the same
answers for a, the intercept, and b, the slope, as you get by the
formulas you use. I suspect that your Stat Mode answers will differ
slightly from the values you compute because your computations are using
rounded off values for the means and standard deviations.
Use the formulas to compute the slope and intercept that I
introduce in question 1(e) in my Lesson 2 and also use again in question
5 of that lesson.
If your answers you compute for a and b, rounded to four
decimal places do not precisely match the perfect answers your Stat Mode
gives you, I recommend you enter the computed values in the boxes in part (a)
and use those rounded off values to answer the remaining questions. If
they mark you wrong, try the more accurate values your Stat Mode computed
instead. If they still mark you wrong, you're in big trouble. You
might want to ask your prof ahead of time if the answer key is using the
precise values or the rounded off values for the means and standard
deviations you have been given for both questions 1 and 2.
Note, in part (b), they ask for a
proportion, not a percentage, so leave your value for the coefficient of
determination as a decimal. Do not change it into a percent.
As I already suggested, I think you should first make the predictions they request in parts (c) and (d)
using the computed values for a and b, rounded to four decimal places,
that you computed by hand. Only try your more accurate Stat Mode values the second time, if necessary.
I show you how to compute a residual (part (g)) in my question 1(j).
Make sure you have taken a look at my question 3 in Lesson 2
to learn some key facts about the correlation that may help you with part (h).
Question 3 uses JMP.
Click the "New Data
Table" icon on the toolbar at top left in the JMP home screen.
Double-click the region to the right of "Column 1" to create "Column
2." Rename Column 1 "Temperature" and Column 2 "Viscosity" by either
double-clicking the columns and typing in the new name or by
right-clicking the columns and selecting "Column Info," typing in the
name and clicking OK. Type in the data. You can move from one cell to
the next in the data table by pressing "Enter", "Tab" or the arrow
buttons on your keyboard.
Select "Analyze", then "Fit Y By X". Highlight "Temperature", and click the "X, Factor"
button. Highlight "Viscosity" and click
the "Y, Response" button. Click OK.
You should now see a scatterplot. (If you don't, your data
is
not properly formatted; go back and check the columns are Numeric and
Continuous by right-clicking each column name and selecting "Column
Info". The Data Type should be Numeric, and the Modeling Type should be
Continous.)
Click the red triangle
above the scatterplot and select "Fit Line" and JMP will draw in the
least-squares regression line. Note, it shows you the regression
equation directly under "Linear Fit" below the scatterplot. JMP also
shows you the value
of r-squared (the coefficient of determination) in the "Summary of
Fit", rather than r, the
correlation coefficient. You can then square root this number to get r, the correlation
coefficient, but use your scatterplot to help you decide if r is
negative or positive because your calculator can't tell you that.
They don't ask you to hide the "Analysis of Variance" and "Parameter Estimates" parts of the output,
but you can do so if you wish. Simply click the gray triangle next to those title bars, and you will see those parts
of the output disappear.
You will have to compute the residual they request in part (d)
yourself using the approach I illustrate in my question 1(j). When
they ask, "What does the sign of the residual tell us?" they mean, was
the actual viscosity higher or lower than you predicted it would be.
Method 1: Answering Parts (b) through (d) directly into the HTML editor
After
you have made your scatterplot and added the least-squares regression
line and hidden the Analysis of Variance and Parameter Estimates, you can save this as a PDF file ready to upload.
Click
the thin blue line or click "Alt" on your keyboard to see the toolbar
again. Select "File" then "Save As." In the "Save Report As" pop-up
window, select which folder you want to save the file in (I suggest you
select Desktop), type in a "File Name" and, in the "Save as type" menu,
be sure to select "PDF file" from the drop-down list. Click Save, and
just click OK if it shows you another pop-up window. You should now see
the pdf file it has created. If you are satisfied with what you see,
you are now ready to upload the file to Stats Portal. See below.
Method 2: Answering Parts (b) through (d) in a Word document or similar
After
you have made your scatterplot and added the least-squares regression
line and hidden the Analysis of Variance and Parameter Estimates, you
want to copy and paste this output into your Word document (or whatever
word processor you use) where you will also add the answers to the other
parts of the question.
Click the thin blue near the top of the JMP scatterplot screen, or press
"Alt" on your keyboard, to reveal a toolbar with a series of icons. If
you point your mouse at the icons, you should see, looking at the icons
left to right, the first icon is for a "New Data Table," the second
icon is for "New Script," the third is to "Open" a file, etc. Click the
icon that looks
like a fat white cross or plus sign "+". This is your "Selection"
tool. Your mouse cursor should now have changed from an arrow to that
white cross. Click the title bar that says "Bivariate Fit of ..." at the top
of the output and that should select the entire output (scatterplot, Summary of Fit, etc.). Right-click and select Copy.
In your Word document, below the outputs you have pasted in, type in your answers for parts (b), (c) and (d).
You are now ready to save and upload the file that answers parts (b), (c) and (d). In your Word document (or whatever program you are using), select "File" then "Save As" and select "PDF File". Type in whatever name you want the file to have in the "File name" section. Select which folder you want to save the
file in (I suggest you select "Desktop" so that the file will just
appear write on your desktop home screen). Click
"Save" or "Publish". You should now have your file ready to upload into the
assignment.
To upload your file into the text box they provide:
Once you have saved your PDF file using whichever of Methods 1 or 2 I
show you above, you are ready to upload the file. Click "HTML editor"
below the text box to make a toolbar appear in the
text box. Click the toolbar option called "Link" and select
"Website/Uploaded File." In the pop-up window that appears, click the
button called "Find/Upload File" (it is at the bottom of the pop-up
window, you may have to enlarge the box or scroll down to see it).
Click the "Browse" button and find the histogram file you just saved.
Either double-click that file or select it and click "Open" and you
should see the path to that file appear in the Browse box. Click
"Upload File" and its name should appear in the "Uploaded Files" pop-up
window. Select the file in the list of "Uploaded Files" to highlight
it
and click OK and you should see a link to the file appear in the text
box. Of course, if you are using Method 1, above, make sure you have
also typed your answers to parts (b), (c) and (d) in this box as well.
Questions 4 to 7
These questions are a good runthrough of the various things we do
in an experiment. Be sure you have studied the latter half of Lesson 3
in my Basic Stats 1 book, starting from question 6 and beyond. My
question 7 is a good illustration, especially. For 4(e) and 5(e),
I think you should tell them how many treatments your experiment has,
but then tell them what exactly each treatment is. (For example, in my
question 7(b), I would say Treatment 1 is Food A, served early;
Treatment 2 is Food B, served early; etc.)
Note that 4(g) is getting at the benefits of an experiment.
We learned in Lesson 2 that correlation does not imply
causation. But, the whole point of an experiment is to see if you can
find a causal link between two variables. If you have followed the
three principles of experimental design, you might be able to prove that
poison causes harm, or that smoking increases the risk of cancer. Experiments, if properly designed, can prove that x causes y.
Question 8
This is a good runthrough of the various types of samples and the
possible biases that can exist. Be sure to have studied the first half
of my Lesson 3 up to the end of question 5 before attempting this
question. Be clear when listing the bias you see. For example, don't
just say "response bias". Say, there is response bias because there
will be too many people lying to the researcher. (I am not saying that
is the correct answer for any of your questions; I am just saying be
clear what you mean instead of just using a generic term.) Don't
speculate about biases that might be there (such as saying, the
researcher might be doctoring their data, for example; there is always
that possibility, but we are not going to be paranoid and mention that
as a possible bias everytime, unless we have clearly been given reason
to believe that has occurred). Only discuss biases that are clearly
present by the information you have been given.
Question 9
A nice and easy example of randomization, as I demonstrate early in Lesson 3. Here is a link where you can download Table B, if you have not already done so: