Stat 1000: Assignment 2 Tips (Classroom Lecture Sections)
Published: Wed, 10/10/12
My tips for Assignment 2 are coming below, but first a couple of announcements.
Please note that my second midterm exam prep seminar for
Stat 1000 will be on Sunday, Nov. 4, in room 100 St. Paul's College,
from 9 am to 9 pm . For complete info about the seminar, and to register if you have not done so already, click this link:
I am also offering seminars in Calculus, Linear Algebra, and Stat
2000 in the coming weeks. You can get info about those seminars here:
Make sure you have read my Tips on How to Do Well in this Course Click here
Did you miss my Tips on what kind of calculator you should get? Click here
If you are taking the course by Distance/Online (Sections D01, D02, etc.), click here for my tips for your Assignment 6.
Tips for Assignment 2 (Classroom Lecture Sections A01, A02, A03, etc.)
You will have to study both Lessons 2 and 3 in my Basic Stats 1 book to prepare for this assignment. Questions 1, 2 and 3 cover the concepts I teach in Lesson 2. The remaining questions are dealt with in Lesson 3 of my book.
Question 1:
To compute the correlation coefficient by hand, DO NOT follow my
example in Lesson 2, question 1, part (c). They have given you slightly different column headings so they want you to compute r by hand a slightly different way. They are using this formula for the correlation coefficient (click the link):
First, make sure you decide which variable is x and which variable is y in this problem. Is the explanatory variable, x, "Marijuana"? or is it "Other Drugs"?
Put your calculator into Linear Regression Stat Mode and enter the data. Note that I show you how to do that in Appendix A of my book.
Here is a link to a
digital copy of that appendix:
Once you have entered the data, you can confirm that you got the same answers for x̅ , y̅ , sx, and sy that they have provided for you. That is one way to make sure you have correctly identified which is x and which is y. BUT, you will probably discover that your calculator has given you more accurate answers, that the question has rounded off to two decimal places. We must assume that you are expected to do all your calculations with the two-decimal place values you have been given.
For example, after you have entered your (x,y) data points, Sharps use "RCL 4" to get x̅
and "RCL 7" to get y̅ . "RCL 5" gives you sx and "RCL 8" gives you sy.
A lot of Casio calculators (and some Texas Instruments) use
the "σ" symbol ("sigma," the Greek lowercase "s") to denote "standard
deviation". For example, in many Casios, after you
have entered the data, you first select "S.VAR." You will find it
written above one of your buttons, perhaps above the "2" or nearby on
the keyboard. It is accessed by pressing "SHIFT" then "S.VAR"
(Statistical Variables). Once you select S.VAR, you are shown a menu
where you see the symbol " x̅ " for the sample mean (select "1" and press "=" to get the sample mean). You are also told you can press "2" to get " xσn " or press "3" to get " xσn-1 ". That is Casio's way of
designating the population standard deviation and the sample standard
deviation, respectively. You will always want the sample standard
deviation, Sx, to select " xσn-1 ". Similarly, if you
select S.VAR and then press your right arrow button, you will be
scrolled through other options. For example, you can select " y̅ ", the mean of the y values, or " yσn-1 " to the get Sy, the standard deviation of the y values.
Here is how I suggest you do this problem:
Step 1: Enter all your (x,y) data points into your calculator once you have put it into Linear Regression Stat Mode.
Step 2: Ask your calculator for x̅ , y̅ , sx, and sy and confirm your answers match the givens when rounded off to two decimal places. If so, ask your calculator for r, the
correlation coefficient, and note its value, rounded off to two decimal places (and make sure you round, don't trim: e.g. 0.617 rounds
to 0.62) Once you have correctly found r, keep your data in the calculator ready to proceed to Step 3.
Step 3: ON
PAPER, proceed to calculate and record all the entries you will
eventually type into the boxes. The first column is telling you to subtract x̅ from each of the six x values; the second column is telling you to subtract y̅ from each of the six y values; the last column is telling you to multiply the entries in the first two columns together. WRITE DOWN ON PAPER EVERY SINGLE DECIMAL PLACE YOUR CALCULATOR GIVES YOU. You should find that the answers for your products that you are putting in the third column will have three or four decimal places (depending on the given values for the means). In the boxes they provide, you will enter these values rounded off to two decimal places as instructed, but I believe that you need to use the more accurate answers you have computed and written down on paper to compute the final answer for r.
Step 4: Compute the total of that last column (that is the numerator in the alternative formula for the correlation I have given you above). Be sure to compute the total using all the decimal places you have written on paper, not the two-decimal place values you will round off to when you enter them in the boxes provided.
Step 5: Compute the denominator in the alternative formula for r I have shown you above by multiplying n-1, sx, and sy together (using the two decimal place values for sx, and sy they have given you). Write down the complete answer you have found, keeping all the decimal places.
Step 6: Now compute r by dividing the total you computed in Step 4 by the answer you computed in Step 5. Hopefully, the answer you get for r, when you round off to four decimal places as they request, will be very close to the actual value you got for r by using the Stat mode in your calculator. Once you have confirmed that you
were able to compute the correct value of r by hand, enter all the
numbers you computed into the appropriate boxes. I RECOMMEND YOU ENTER THE EXACT VALUE OF r THAT YOUR STAT MODE HAS COMPUTED FOR YOU rounded to four decimal places, in the event that your computed value is not precisely the same as the calculator value. My hunch is that the assignment has been programmed to mark the true value of r, whereas the value you compute using the two-decimal place values will be inaccurate due to the rounding that was done to the means and standard deviations.
I hope this works. If they mark your value for r wrong, try entering the value you computed using the formula instead (assuming it is slightly different to the true value computed by your Stat Mode).
Question 2
First, make sure you decide which variable is x and which
variable is y in this problem. Is the explanatory variable, x,
"Horsepower"? or is it "Mileage"?
Again, I recommend you use the Linear Regression Stat Mode on your calculator to enter the data and check that you get the same answers for a, the intercept, and b, the slope, as you get by the formulas you use. I suspect that your Stat Mode answers will differ slightly from the values you compute because your computations are using rounded off values for the means and standard deviations.
Use the formulas to compute the slope and intercept that I introduce in question 1(e) in my Lesson 2 and also use again in question 5 of that lesson.
If your answers you compute for a and b, rounded to four decimal places do not precisely match the perfect answers your Stat Mode gives you, I recommend you enter the exact values in the boxes in part (a) and use those rounded off values to answer the remaining questions. If they mark you wrong, try the less accurate values you computed instead. If they still mark you wrong, you're in big trouble. You might want to ask your prof ahead of time if the answer key is using the precise values or the rounded off values for the means and standard deviations you have been given for both questions 1 and 2.
Note, in part (b), they ask for a proportion, not a percentage, so leave your value for the coefficient of determination as a decimal. Do not change it into a percent.
As I already suggested, I think you should first make the predictions they request in parts (c) and (d) using the exact values for a and b, rounded to four decimal places, that your Stat Mode determined. Only try your slightly less accurate computed values the second time, if necessary.
I show you how to compute a residual (part (g)) in my question 1(j).
Make sure you have taken a look at my question 3 in Lesson 2 to learn some key facts about the correlation that may help you with part (h).
Question 3 uses JMP.
Click the "New Data
Table" icon on the toolbar at top left in the JMP home screen. Double-click the region to the right of "Column 1" to create "Column 2." Rename Column 1 "Temperature" and Column 2 "Viscosity" by either double-clicking the columns and typing in the new name or by right-clicking the columns and selecting "Column Info," typing in the name and clicking OK. Type in the data. You can move from one cell to the next in the data table by pressing "Enter", "Tab" or the arrow buttons on your keyboard.
Select "Analyze", then "Fit Y By X". Highlight "Temperature", and click the "X, Factor"
button. Highlight "Viscosity" and click
the "Y, Response" button. Click OK.
You should now see a scatterplot. (If you don't, your data is
not properly formatted; go back and check the columns are Numeric and
Continuous by right-clicking each column name and selecting "Column Info". The Data Type should be Numeric, and the Modeling Type should be Continous.)
Click the red triangle
above the scatterplot and select "Fit Line" and JMP will draw in the
least-squares regression line. Note, it shows you the regression
equation directly under "Linear Fit" below the scatterplot. JMP also
shows you the value
of r-squared (the coefficient of determination) in the "Summary of
Fit", rather than r, the
correlation coefficient. You can then square root this number to get r, the correlation
coefficient, but use your scatterplot to help you decide if r is
negative or positive because your calculator can't tell you that.
They don't ask you to hide the "Analysis of Variance" and "Parameter Estimates" parts of the output,
but you can do so if you wish. Simply click the gray triangle next to those title bars, and you will see those parts
of the output disappear.
You will have to compute the residual they request in part (d) yourself using the approach I illustrate in my question 1(j). When they ask, "What does the sign of the residual tell us?" they mean, was the actual viscosity higher or lower than you predicted it would be.
Method 1: Answering Parts (a) through (d) directly on the JMP output
After you have made your scatterplot and added the least-squares regression line and hidden the Analysis of Variance and Parameter Estimates, add your answers to parts (b), (c) and (d).
To add a text box to the JMP scatterplot output, do this:
Click the thin blue near the top of the JMP scatterplot screen, or press "Alt" on your keyboard, to reveal a toolbar with a series of icons. If you point your mouse at the icons, you should see, looking at the icons left to right, the first icon is for a "New Data Table," the second icon is for "New Script," the third is to "Open" a file, etc. Click the icon that says "Annotate (T)." It looks like a little white rectangle with a "T" in the top left corner. You should discover your mouse pointer turns into that icon after you click it.
Move your mouse to an empty region in the scatterplot output (perhaps in that space to the right of the scatterplot; or the space underneath the Parameter Estimates) and, while holding the left button on your mouse, drag to make a nice big text box to have room to type in your answers.
Type your answers for parts (b), (c) and (d) in the text box. Click the region outside of the text box to save the box. If you discover that the box has shrunk on you and hidden some of your text, point the mouse at the frame of the box (not at the text inside the box). You should see the mouse pointer change into an arrowed cross with arrows pointing in all four directions when you point the mouse at the textbox frame. When you see that four-arrow cross, click to select the box. You should see the box surrounded by little blue markers at each corner and at the midpoints of the frame. Point and at any of these blue markers and CLICK AND HOLD the left button of your mouse and drag to change that dimension of the box. You can then resize the box by playing with any of these blue markers. You can also move the box to another location if you wish, by doing the same thing, but by clicking and holding somewhere else on the frame other than the blue markers and then dragging to wherever you like.
Once you have your text box sized and placed where you like, click the thin blue line or click "Alt" on your keyboard to see the toolbar again. Select "File" then "Save As." In the "Save Report As" pop-up window, select which folder you want to save the file in (I suggest you select Desktop), type in a "File Name" and, in the "Save as type" menu, be sure to select "PDF file" from the drop-down list. Click Save, and just click OK if it shows you another pop-up window. You should now see the pdf file it has created. If you are satisfied with what you see, you are now ready to upload the file to Stats Portal. See below.
Method 2: Answering Parts (b) through (d) directly into the HTML editor
After
you have made your scatterplot and added the least-squares regression
line and hidden the Analysis of Variance and Parameter Estimates, you can save this as a PDF file ready to upload.
Click
the thin blue line or click "Alt" on your keyboard to see the toolbar
again. Select "File" then "Save As." In the "Save Report As" pop-up
window, select which folder you want to save the file in (I suggest you
select Desktop), type in a "File Name" and, in the "Save as type" menu,
be sure to select "PDF file" from the drop-down list. Click Save, and
just click OK if it shows you another pop-up window. You should now see
the pdf file it has created. If you are satisfied with what you see,
you are now ready to upload the file to Stats Portal. See below.
Method 3: Answering Parts (b) through (d) in a Word document or similar
After
you have made your scatterplot and added the least-squares regression
line and hidden the Analysis of Variance and Parameter Estimates, you want to copy and paste this output into your Word document (or whatever word processor you use) where you will also add the answers to the other parts of the question.
Click the thin blue near the top of the JMP scatterplot screen, or press
"Alt" on your keyboard, to reveal a toolbar with a series of icons. If
you point your mouse at the icons, you should see, looking at the icons
left to right, the first icon is for a "New Data Table," the second
icon is for "New Script," the third is to "Open" a file, etc. Click the
icon that looks
like a fat white cross or plus sign "+". This is your "Selection"
tool. Your mouse cursor should now have changed from an arrow to that
white cross. Click the title bar that says "Bivariate Fit of ..." at the top
of the output and that should select the entire output (scatterplot, Summary of Fit, etc.). Right-click and select Copy.
In your Word document, below the outputs you have pasted in, type in your answers for parts (b), (c) and (d).
You are now ready to save and upload the file that answers parts (b), (c) and (d). In your Word document (or whatever program you are using), select "File" then "Save As" and select "PDF File". Type in whatever name you want the file to have in the "File name" section. Select which folder you want to save the
file in (I suggest you select "Desktop" so that the file will just
appear write on your desktop home screen). Click
"Save" or "Publish". You should now have your file ready to upload into the
assignment.
To upload your file into the text box they provide:
Once you have saved your PDF file using whichever of Methods 1, 2 or 3 I show you above, you are ready to upload the file. Click "HTML editor" below the text box to make a toolbar appear in the
text box. Click the toolbar option called "Link" and select
"Website/Uploaded File." In the pop-up window that appears, click the
button called "Find/Upload File" (it is at the bottom of the pop-up
window, you may have to enlarge the box or scroll down to see it).
Click the "Browse" button and find the histogram file you just saved.
Either double-click that file or select it and click "Open" and you
should see the path to that file appear in the Browse box. Click
"Upload File" and its name should appear in the "Uploaded Files" pop-up
window. Select the file in the list of "Uploaded Files" to highlight it
and click OK and you should see a link to the file appear in the text box. Of course, if you are using Method 2, above, make sure you have also typed your answers to parts (b), (c) and (d) in this box as well.
Questions 4 to 7
These questions are a good runthrough of the various things we do in an experiment. Be sure you have studied the latter half of Lesson 3 in my Basic Stats 1 book, starting from question 6 and beyond. My question 7 is a good illustration, especially. For part 5, I think you should tell them how many treatments your experiment has, but then tell them what exactly each treatment is. (For example, in my question 7(b), I would say Treatment 1 is Food A, served early; Treatment 2 is Food B, served early; etc.)
Note that part 7 in questio 4 is getting at the benefits of an experiment. We learned in Lesson 2 that correlation does not necessarily imply causation. But, the whole point of an experiment is to see if you can find a causal link between two variables. If you have followed the three principles of experimental design, you might be able to prove that poison causes harm, or that smoking increases the risk of cancer.
Question 8
This is a good runthrough of the various types of samples and the possible biases that can exist. Be sure to have studied the first half of my Lesson 3 up to the end of question 5 before attempting this question. Be clear when listing the bias you see. For example, don't just say "response bias". Say, there is response bias because there will be too many people lying to the researcher. (I am not saying that is the correct answer for any of your questions; I am just saying be clear what you mean instead of just using a generic term.) Don't speculate about biases that might be there (such as saying, the researcher might be doctoring their data, for example; there is always that possibility, but we are not going to be paranoid and mention that as a possible bias everytime, unless we have clearly been given reason to believe that has occurred). Only discuss biases that are clearly present by the information you have been given.
Question 9
A nice and easy example of randomization, as I demonstrate early in Lesson 3. Here is a link where you can download Table B, if you have not already done so: