Stat 1000: Assignment 2 Tips (Classroom Lecture Sections)

Published: Wed, 02/06/13


 
My tips for Assignment 2 are coming below, but first a couple of announcements.
 
Please note that my first two-day review seminar for Stat 1000 will be on Saturday, Feb. 2 and Sunday, Feb. 3, in room 100 St. Paul's College, from 9 am to 6 pm each day.  This seminar will cover the lessons in Volume 1 of my book.  
 
For more info about the seminar, and to register if you have not done so already, click this link:
Stat 1000 Exam Prep Seminar 
 
I am also taking registrations for all my midterm exam prep seminars (Calculus, Linear Algebra and Statistics).  Please click this link for more info and to register, if you are interested:
Grant's Exam Prep Seminars 
 
Did you read my Tips on How to Do Well in this Course? 
Make sure you do:  Tips on How to Do Well in Stat 1000 
 
Did you read my Tips on what kind of calculator you should get?
Tips on what calculator to buy for Statistics
 
Did you miss my Tips for Assignment 1?
Tips for Stat 1000 Classroom Assignment 1
 
If you are taking the course by Distance/Online (Sections D01, D02, etc.), I have sent tips for Assignment 2 long ago.  Check my archive:
Grant's Homework Help Archive 
 
Tips for Assignment 2 (Classroom Lecture Sections A01, A02, A03, etc.)
  
Don't have my book?  You can download a free sample containing Lesson 1 at my website here:
Grant's Tutoring Study Guides (Including Free Samples)
 
You will have to study both Lesson 2: Regression and Correlation and Lesson 3: Designing Samples and Experiments in my Basic Stats 1 book to prepare for this assignment.  Questions 1, 2 and 3 cover the concepts I teach in Lesson 2.  The remaining questions are dealt with in Lesson 3 of my book.
 
Question 1:
To compute the correlation coefficient by hand, DO NOT follow my example in Lesson 2, question 1, part (c).  They have given you slightly different column headings so they want you to compute r by hand a slightly different way.  They are using this formula for the correlation coefficient (click the link):
Alternative Formula for the Correlation
 
First, make sure you decide which variable is x and which variable is y in this problem.  Is the explanatory variable, x, "Marijuana"? or is it "Other Drugs"?
 
Put your calculator into Linear Regression Stat Mode and enter the data.  Note that I show you how to do that in Appendix A of my book.
Here is a link to a digital copy of that appendix:
 
Once you have entered the data, you can confirm that you got the same answers for x̅ , y̅ , sx, and sy that they have provided for you.  That is one way to make sure you have correctly identified which is x and which is y.  BUT, you will probably discover that your calculator has given you more accurate answers since the question has rounded off to two decimal places.  We must assume that you are expected to do all your calculations with the two-decimal place values you have been given.
 
For example, after you have entered your (x,y) data points, Sharps use "RCL 4" to get and "RCL 7" to get .  "RCL 5" gives you sx and "RCL 8" gives you sy.
 
A lot of Casio calculators (and some Texas Instruments) use the "σ" symbol ("sigma," the Greek lowercase "s") to denote "standard deviation".  For example, in many Casios, after you have entered the data, you first select "S.VAR."  You will find it written above one of your buttons, perhaps above the "2" or nearby on the keyboard.  It is accessed by pressing "SHIFT" then "S.VAR" (Statistical Variables).  Once you select S.VAR, you are shown a menu where you see the symbol " x̅ " for the sample mean (select "1" and press "=" to get the sample mean).  You are also told you can press "2" to get " xσn " or press "3" to get " xσn-1 ".  That is Casio's way of designating the population standard deviation and the sample standard deviation, respectively.  You will always want the sample standard deviation, Sx, to select " xσn-1 ".  Similarly, if you select S.VAR and then press your right arrow button, you will be scrolled through other options.  For example, you can select " y̅  ", the mean of the y values, or " yσn-1 " to the get Sy, the standard deviation of the y values.
 
Here is how I suggest you do this problem:
Step 1:  Enter all your (x,y) data points into your calculator once you have put it into Linear Regression Stat Mode.
Step 2:   Ask your calculator for x̅ , y̅ , sx, and sy and confirm your answers match the givens when rounded off to two decimal places.  If so, ask your calculator for r, the correlation coefficient, and note its value, rounded off to two decimal places (and make sure you round, don't trim: e.g. 0.617 rounds to 0.62)   Once you have correctly found r, keep your data in the calculator ready to proceed to Step 3.
Step 3:  ON PAPER, proceed to calculate and record all the entries you will eventually type into the boxes.  The first column is telling you to subtract x̅ from each of the six x values; the second column is telling you to subtract y̅ from each of the six y values; the last column is telling you to multiply the entries in the first two columns together.  WRITE DOWN ON PAPER EVERY SINGLE DECIMAL PLACE YOUR CALCULATOR GIVES YOU.  You should find that the answers for your products that you are putting in the third column will have three or four decimal places (depending on the given values for the means).  In the boxes they provide, you will enter these values rounded off to two decimal places as instructed, but I believe that you need to use the more accurate answers you have computed and written down on paper to compute the final answer for r.
Step 4: Compute the total of that last column (that is the numerator in the alternative formula for the correlation I have given you above).  Be sure to compute the total using all the decimal places you have written on paper, not the two-decimal place values you will round off to when you enter them in the boxes provided.
Step 5: Compute the denominator in the alternative formula for r I have shown you above by multiplying n-1, sx, and sy together (using the two decimal place values for sx, and sy they have given you).  Write down the complete answer you have found, keeping all the decimal places.
Step 6:  Now compute r by dividing the total you computed in Step 4 by the answer you computed in Step 5.  Hopefully, the answer you get for r, when you round off to four decimal places as they request, will be very close to the actual value you got for r by using the Stat mode in your calculator. Once you have confirmed that you were able to compute the correct value of r by hand, enter all the numbers you computed into the appropriate boxes.  I RECOMMEND YOU ENTER THE EXACT VALUE OF r THAT YOUR STAT MODE HAS COMPUTED FOR YOU rounded to four decimal places, in the event that your computed value is not precisely the same as the calculator value.  My hunch is that the assignment has been programmed to mark the value of r you compute using the rounded off numbers, whereas the value you compute using the Stat Mode in your calculator will actually be too accurate, and possibly marked wrong if it isn't close to the rounded off answer you compute by hand.
 
I hope this works.  If they mark your value for r wrong, try entering the value you computed using the Stat Mode instead (assuming it is slightly different to the value computed by hand).
 
Question 2
First, make sure you decide which variable is x and which variable is y in this problem.  Is the explanatory variable, x, "Horsepower"? or is it "Mileage"?
 
Again, I recommend you use the Linear Regression Stat Mode on your calculator to enter the data and check that you get the same answers for a, the intercept, and b, the slope, as you get by the formulas you use.  I suspect that your Stat Mode answers will differ slightly from the values you compute because your computations are using rounded off values for the means and standard deviations.
 
Use the formulas to compute the slope and intercept that I introduce in question 1(e) in my Lesson 2 and also use again in question 5 of that lesson.
 
If your answers you compute for a and b, rounded to four decimal places do not precisely match the perfect answers your Stat Mode gives you, I recommend you enter the computed values in the boxes in part (a) and use those rounded off values to answer the remaining questions.  If they mark you wrong, try the more accurate values your Stat Mode computed instead.  If they still mark you wrong, you're in big trouble.  You might want to ask your prof ahead of time if the answer key is using the precise values or the rounded off values for the means and standard deviations you have been given for both questions 1 and 2.
 
Note, in part (b), they ask for a proportion, not a percentage, so leave your value for the coefficient of determination as a decimal.  Do not change it into a percent.
 
As I already suggested, I think you should first make the predictions they request in parts (c) and (d) using the computed values for a and b, rounded to four decimal places, that you computed by hand.  Only try your more accurate Stat Mode values the second time, if necessary.
 
I show you how to compute a residual (part (g)) in my question 1(j).
 
Make sure you have taken a look at my question 3 in Lesson 2 to learn some key facts about the correlation that may help you with part (h).
 
Question 3 uses JMP.
Click the "New Data Table" icon on the toolbar at top left in the JMP home screen.  Double-click the region to the right of "Column 1" to create "Column 2."  Rename Column 1 "Temperature" and Column 2 "Viscosity" by either double-clicking the columns and typing in the new name or by right-clicking the columns and selecting "Column Info," typing in the name and clicking OK.  Type in the data.  You can move from one cell to the next in the data table by pressing "Enter", "Tab" or the arrow buttons on your keyboard.
 
Select "Analyze", then "Fit Y By X".  Highlight "Temperature", and click the "X, Factor" button.  Highlight "Viscosity" and click the "Y, Response" button.  Click OK.
 
You should now see a scatterplot.  (If you don't, your data is not properly formatted; go back and check the columns are Numeric and Continuous by right-clicking each column name and selecting "Column Info".  The Data Type should be Numeric, and the Modeling Type should be Continous.)
 
Click the red triangle above the scatterplot and select "Fit Line" and JMP will draw in the least-squares regression line.  Note, it shows you the regression equation directly under "Linear Fit" below the scatterplot.  JMP also shows you the value of r-squared (the coefficient of determination) in the "Summary of Fit", rather than r, the correlation coefficient.  You can then square root this number to get r, the correlation coefficient, but use your scatterplot to help you decide if r is negative or positive because your calculator can't tell you that.
 
They don't ask you to hide the "Analysis of Variance" and "Parameter Estimates" parts of the output, but you can do so if you wish.  Simply click the gray triangle next to those title bars, and you will see those parts of the output disappear.
 
You will have to compute the residual they request in part (d) yourself using the approach I illustrate in my question 1(j).  When they ask, "What does the sign of the residual tell us?" they mean, was the actual viscosity higher or lower than you predicted it would be.
 
You could save this output as a PDF and upload it into the HTML editor and then simply type the rest of your answers into the HTML box, or, you can copy and paste this output into a Word document (or whatever word processor you use), and then also answer their other questions in this Word document, too.  Then save the Word document as a PDF and upload that file.  I will show you both methods below.
 

Method 1: Answering Parts (b) through (d) directly into the HTML editor
After you have made your scatterplot and added the least-squares regression line and hidden the Analysis of Variance and Parameter Estimates, you can save this as a PDF file ready to upload.
 
Click the thin blue line or click "Alt" on your keyboard to see the toolbar again.  Select "File" then "Save As."  In the "Save Report As" pop-up window, select which folder you want to save the file in (I suggest you select Desktop), type in a "File Name" and, in the "Save as type" menu, be sure to select "PDF file" from the drop-down list.  Click Save, and just click OK if it shows you another pop-up window.  You should now see the pdf file it has created.  If you are satisfied with what you see, you are now ready to upload the file to Stats Portal.  See below.
 
Method 2: Answering Parts (b) through (d) in a Word document or similar
After you have made your scatterplot and added the least-squares regression line and hidden the Analysis of Variance and Parameter Estimates, you want to copy and paste this output into your Word document (or whatever word processor you use) where you will also add the answers to the other parts of the question.
 
You will need to copy and paste this output into a document to get ready to add your answers for for parts (b), (c) and (d) as well.  Here is how to do that:
 
Click the thin blue near the top of the JMP scatterplot screen, or press "Alt" on your keyboard, to reveal a toolbar with a series of icons.  If you point your mouse at the icons, you should see, looking at the icons left to right, the first icon is for a "New Data Table," the second icon is for "New Script," the third is to "Open" a file, etc.  Click the icon that looks like a fat white cross or plus sign "+".  This is your "Selection" tool.  Your mouse cursor should now have changed from an arrow to that white cross.  Click the title bar that says "Bivariate Fit of ..." at the top of the output and that should select the entire output (scatterplot, Summary of Fit, etc.).  Right-click and select Copy.
 
Now, open whatever program you use for word processing (such as Word).  In a new document, right-click and select Paste to paste your output into the document. 
 
In your Word document, below the outputs you have pasted in, type in your answers for parts (b), (c) and (d).
 
You are now ready to save and upload the file that answers parts (b), (c) and (d).  In your Word document (or whatever program you are using), select "File" then "Save As" and select "PDF File".  Type in whatever name you want the file to have in the "File name" section. Select which folder you want to save the file in (I suggest you select "Desktop" so that the file will just appear write on your desktop home screen).  Click "Save" or "Publish".  You should now have your file ready to upload into the assignment.  
 
To upload your file into the text box they provide:
Once you have saved your PDF file using whichever of Methods 1 or 2 I show you above, you are ready to upload the file.  Click "HTML editor" below the text box to make a toolbar appear in the text box.  Click the toolbar option called "Link" and select "Website/Uploaded File."  In the pop-up window that appears, click the button called "Find/Upload File" (it is at the bottom of the pop-up window, you may have to enlarge the box or scroll down to see it).  Click the "Browse" button and find the histogram file you just saved.  Either double-click that file or select it and click "Open" and you should see the path to that file appear in the Browse box.  Click "Upload File" and its name should appear in the "Uploaded Files" pop-up window.  Select the file in the list of "Uploaded Files" to highlight it and click OK and you should see a link to the file appear in the text box. Of course, if you are using Method 1, above, make sure you have also typed your answers to parts (b), (c) and (d) in this box as well.
 
Questions 4 to 7
These questions are a good runthrough of the various things we do in an experiment. Be sure you have studied the latter half of Lesson 3 in my Basic Stats 1 book, starting from question 6 and beyond.  My question 7 is a good illustration, especially.  For 4(e) and 5(e), I think you should tell them how many treatments your experiment has, but then tell them what exactly each treatment is.  (For example, in my question 7(b), I would say Treatment 1 is Food A, served early; Treatment 2 is Food B, served early; etc.)
 
Note that 4(g) is getting at the benefits of an experiment.  We learned in Lesson 2 that correlation does not imply causation.  But, the whole point of an experiment is to see if you can find a causal link between two variables.  If you have followed the three principles of experimental design, you might be able to prove that poison causes harm, or that smoking increases the risk of cancer.  Experiments, if properly designed, can prove that x causes y.
 
Question 8
This is a good runthrough of the various types of samples and the possible biases that can exist.  Be sure to have studied the first half of my Lesson 3 up to the end of question 5 before attempting this question.  Be clear when listing the bias you see.  For example, don't just say "response bias".  Say, there is response bias because there will be too many people lying to the researcher.  (I am not saying that is the correct answer for any of your questions; I am just saying be clear what you mean instead of just using a generic term.)  Don't speculate about biases that might be there (such as saying, the researcher might be doctoring their data, for example; there is always that possibility, but we are not going to be paranoid and mention that as a possible bias everytime, unless we have clearly been given reason to believe that has occurred).  Only discuss biases that are clearly present by the information you have been given. 
 
Question 9
A nice and easy example of randomization, as I demonstrate early in Lesson 3.  Here is a link where you can download Table B, if you have not already done so:
Table B