Stat 1000: Tips for Assignment 2

Published: Thu, 01/28/16

Midterm Exam Prep Seminar Sat. Jan. 30!
Don't have my book or audio?  You can download a free sample of my book and audio lectures containing Lesson 1:
Did you read my tips on how to study and learn Stat 1000?  If not, here is a link to those important suggestions:
Did you read my Calculator Tips?  If not, here is a link to those important suggestions:
Did you see my tips for Assignment 1? Click here.
Tips for Assignment 2
Study Lessons 2 and 3 in my study book (if you have it) to learn the concepts involved in Assignment 2.  Don't start working on the assignment too soon.  Study and learn the lessons first, and use the assignment to test your knowledge.  If you want to break things up, study Lesson 2, then do questions 1-3 in the assignment.  Then, study Lesson 3 before doing questions 4-8.

Of course, always seek out assistance from my book, your course notes, etc. if you ever hit a question you don't understand, but try not to be learning things as you do an assignment.  Learn first, then put your learning to the test.

To type in formulas you are using and to show your numbers subbed into the formulas you can click the Equation Editor button in the toolbar that looks like the Sigma Summation symbol (you have to click the "..." other options button to see the sigma formula input button.  Then click the various buttons to make your fractions and enter the symbols.  However, the Equation Editor is extremely slow and clunky.  Personally, I would never use it.  Just type ordinary text explaining what you are doing if you think you should show some work.

Exception: Always do any JMP stuff open-book.  Have my tips in front of you, and let me guide you step-by-step through any JMP stuff.  JMP is just "busy" work.  The sooner you get it done and can move on to productive things like understanding the concepts and interpreting the JMP outputs, the better off you will be.  Then again, since you never have to upload the JMP printouts, perhaps you might not even bother to do the JMP at all.  Most questions can be answered by hand even when they told you to use JMP.
Question 1
Part (a)
Although they don't clearly say so, I believe they want you to compute the correlation coefficient by hand. DO NOT follow my example in Lesson 2, question 1(c).  There is an easier version of the fomula they have given you on the formula sheet.  The formula below for the correlation coefficient is a little better because it requires less rounding during your calculations:

Put your calculator into Linear Regression Stat Mode and enter the data.  Note that I show you how to do that in Appendix A of my book.  You can also click the Calculator Tips above to see these steps.  Make sure you are following the steps for Linear Regression (the second column) and not the Basic Data Problem steps (the first column).

Once you have entered the data, you can confirm that you have the same answers for x̅ , y̅ , Sx, and Sy that they have provided for you.  Your answers should match theirs when rounded to one or two decimal places.

For example, after you have entered your (x,y) data points, Sharps use "RCL 4" to get x̅ and "RCL 7" to get y̅ .  "RCL 5" gives you Sx and "RCL 8" gives you Sy.

A lot of Casio calculators (and some Texas Instruments) use the "σ" symbol ("sigma," the Greek lowercase "s") to denote "standard deviation".  For example, in many Casios, after you have entered the data, you first select "S.VAR."  You will find it written above one of your buttons, perhaps above the "2" or nearby on the keyboard.  It is accessed by pressing "SHIFT" then "S.VAR" (Statistical Variables).  Once you select S.VAR, you are shown a menu where you see the symbol " x̅ " for the sample mean (select "1" and press "=" to get the sample mean).  You are also told you can press "2" to get " xσn " or press "3" to get " xσn-1 ".  That is Casio's way of designating the population standard deviation and the sample standard deviation, respectively.  You will always want the sample standard deviation, Sx, so select " xσn-1 " (number 3 in the menu).  Similarly, if you select S.VAR and then press your right arrow button, you will be scrolled through other options.  For example, you can select " y̅  ", the mean of the y values, or " yσn-1 " to the get Sy, the standard deviation of the y values.

Here is how I suggest you do this problem:
  1. Enter all your (x,y) data points into your calculator once you have put it into Linear Regression Stat Mode.
  2. Ask your calculator for x̅ , y̅ , Sx, and Sy and confirm your answers match the givens when rounded off to two decimal places.  If so, ask your calculator for r, the correlation coefficient, and note its value, rounded off to four decimal places (and make sure you round, don't trim: e.g. 0.61736 rounds to 0.6174)   Once you have correctly found r, keep your data in the calculator ready to proceed to Step 3.
  3. ON PAPER, proceed to calculate and record all the entries you will eventually type into the boxes.  USE THE VALUES FOR THE MEANS AND STANDARD DEVIATIONS THAT YOU WERE GIVEN IN THE QUESTION, NOT YOUR MORE ACCURATE VALUES THE CALCULATOR PROVIDES. 
  4. ON PAPER, Make a table with 5 columns. 
    • The first column is labeled x.  Enter all 6 given x values down that column (the Marijuana values). 
    • The second column is labeled y.  Enter all 6 given values down that column (the Other Drugs values).
    • The third column is labeled x -, telling you to subtract x̅ from each of the six x values.  Which is to say, all you are doing here is calculating the x deviations.  Take the first given x score and subtract the given value for x̅ .  This should be a two-decimal place value already since you were given the mean rounded to two decimal places.  Repeat this for the second, third, etc. x-values to fill in the third column in your chart.
    • The fourth column is labeled y - y̅, telling you to subtract y̅ from each of the six y values.  Which is to say, you are computing the y deviations.  Do exactly as you did above to get the x deviations, except using each y score and the value of y̅ , the given two-decimal place rounded off value of the mean of the y-values.
    • The fifth column is labeled (x - x̅)(y - y̅), telling you to multiply the entries in the third and fourth columns together.  Multiply your first x deviation in column 3 by the first y deviation in column 4 and enter the answer in column 5.  Multiply your second x deviation in column 3 by the second y deviation in column 4 and enter the answer in column 5.  WRITE DOWN ON PAPER EVERY SINGLE DECIMAL PLACE YOUR CALCULATOR GIVES YOU.  You should find that the answers for your products that you are putting in the fifth column will have three or four decimal places (depending on the given values for the means). 
  5. Compute the TOTAL of that last column (that is the numerator in the alternative formula for the correlation I have given you above).  Be sure to compute the total using the four-decimal place values you entered in column 5.
  6. Compute the denominator in the alternative formula for r I have shown you above by multiplying n-1, Sx, and Sy together (using the two decimal place values for Sx, and Sy they have given you).  Write down the complete answer you have found, keeping all the decimal places. Note that n-1 is 5 in this problem since there are n=6 pairs of data.
  7. Now compute r by dividing the total you computed in Step 5 by the answer you computed in Step 6. 
Hopefully, the answer you get for r, when you round off to four decimal places as they request, will be very close to the actual value you got for r by using the Stat mode in your calculator. I would expect the answer you have computed by hand should match the answer your Stat mode gives you for r accurate to about 2 decimal places.  If your two methods for computing r are basically the same to 2 decimal places (maybe the last digit is off by 1 or 2), then you can safely assume you have not made a mistake in your calculations.

Once you have confirmed that you were able to compute the correct value of r by hand, enter all the numbers you computed into the answer box on the assignment. Click the "..." (Show All Components) button in the toolbar on the far right of the answer box and select the Table option that appears on the toolbar.  If you slide your cursor over the table that appears in the drop-down menu, you should see that it highlights cells in the table. 
  • Highlight a table that has 5 columns and 7 rows.  The table will appear in your answer box (you may have to click and drag the little buttons on the side of the table to resize it to see it better, but it will actually expand on its own as you enter values into the table. 
  • Label the 5 columns, x, y, x - x̅, y - y̅, and (x - x̅)(y - y̅), respectively.  (You can use the Math Input button to enter these values, or you can copy and paste my labels into your table.)
  • Enter the numbers you were given and computed on paper into the table.
  • Now, below the table in your answer box, compute the Total of the fifth column.  I would just right "Total of fifth column" or "Total of last column" and give them the total.
  • You can now show the computation of (n-1)SxSy in the answer box below the total you computed.
  • Finally, divide those two numbers to get r.  Say something like "r = # / # = answer" where you replace # with the numbers you calculated for the Total and the (n-1)SxSy parts.

Part (b)
Correlation does not imply causation.
Question 2
Note that Fat Consumptio is x and Cholesterol Level is y.  How do we know that?  We always use x to predict y.

DO NOT USE THE STAT MODE in your calculator to state the answers for the intercept and slope!  It will be too accurate.

Use the rounded off values they have given you for the correlation coefficient, means and standard deviations and the formulas to compute the slope and intercept that I introduce in Lesson 2, question 1(e) and also use again in question 5 of that lesson.  These are also the formulas numbered 2 and 3 on the Formula Sheet you will be provided on your exams.

Again, I recommend you use the Linear Regression Stat Mode on your calculator to enter the data and check that you get the same answers for the means, standard deviation, and correlation coefficient as they have given you.  Then you can confirm that you have used the formulas correctly by matching your Stat mode's answers for a, the intercept, and b, the slope, as you get by the formulas you use.  I suspect that your Stat Mode answers will differ slightly from the values you compute because your computations are using rounded off values for the means and standard deviations. 

Do not use your calculator's perfect values.  Use the rounded off numbers you were given for the means and standard deviations to compute the slope and intercept.

Part (a)
If your answers you compute for a and b, rounded to four decimal places do not precisely match the perfect answers your Stat Mode gives you, I recommend you enter the values computed from the formulas because that is what they expect.  Show your work when computing the slope and intercept.  If you want to be fancy, you can use the Math Input button to state the formulas you are using and show the calculations.  Or just type it straight into the answer box.  For example, if r=0.25, Sy=2.6 and Sx=9.8 (they aren't), you could say: "slope = 0.25*2.6/9.8 = 0.0663."

Make sure you round your answer for the slope to 4 decimal places before you proceed to use it to compute the intercept.  Then, of course, round the intercept to 4 decimal places, too.  Be sure to use these rounded off values for any other computations the question requires.

Make sure you state your final answer in the form y^ = a + bx (replacing a and b with the values you computed, of course).

Part (b)
They ask for a proportion, not a percentage, so leave your value for the coefficient of determination as a decimal (see my Lesson 2, question 1(d)).  Do not change it into a percent.  For example, say 0.2235, don't say 22.35%.

Part (c)
Use the rounded off answers you found in part (a) to make the prediction requested.

Part (d)
I show you how to compute a residual in my Lesson 2, question 1(j).  This is a two-step process.  You must first make the appropriate prediction, then compute the residual.

Part (e)
Make sure you have taken a look at my Lesson 2, question 4 to learn some key facts about the correlation that may help you here.
Question 3

Since they will never see the JMP output, personally, I wouldn't waste my time doing it.  Instead, use the Linear Regression Stat Mode on your calculator.  It will tell you all the answers you need for this question.


If you don't have a Linear Regression mode on your calculator, here is a very simple Linear Regression calculator available for free on the web:

Linear Regression Calculator


Note that, to enter the data in that calculator above, you type x,y in the first line, then press enter to move to a second line and type your second x,y then enter, to put the third x,y on the third line, etc.


For example, if I were entering my data from Lesson 2, question 1 into the calculator above, I would type:


10,19

20,13

30,12

40,8

50,7


Then submit the data.  Note that below the scatterplot and fitted line, there is a link to click to get the value of r, the correlation coefficient, too.


If you want to use JMP:

Click the "New Data Table" icon on the toolbar at top left in the JMP home screen.  Double-click the region to the right of "Column 1" to create "Column 2."  Rename Column 1 "Latitude" and Column 2 "Temperature" by either double-clicking the columns and typing in the new name or by right-clicking the columns and selecting "Column Info," typing in the name and clicking OK.  Type in the data.  You can move from one cell to the next in the data table by pressing "Enter", "Tab" or the arrow buttons on your keyboard.

Select "Analyze", then "Fit Y By X".  Highlight "Latitude", and click the "X, Factor" button.  Highlight "Temperature" and click the "Y, Response" button.  Click OK.

You should now see a scatterplot.  (If you don't, your data is not properly formatted; go back and check the columns are Numeric and Continuous by right-clicking each column name and selecting "Column Info".  The Data Type should be Numeric, and the Modeling Type should be Continous.)

Click the red triangle above the scatterplot and select "Fit Line" and JMP will draw in the least-squares regression line.  Note, it shows you the regression equation directly under "Linear Fit" below the scatterplot.  JMP also shows you the value of r-squared (the coefficient of determination) in the "Summary of Fit", rather than r, the correlation coefficient.  You can then square root this number to get r, the correlation coefficient, but use your scatterplot to help you decide if r is negative or positive because your calculator can't tell you that.


Part (a)
I show you how to interpret a slope throughout Lesson 2, and give you a specific example in question 1(f) of my book.  JMP shows you the slope as part of the least-squares regression equation below the title Linear Fit.  Or, you can use the stat mode in your calculator to tell you the slope, b.

Part (b)
As I mention above, and illustrate in my question 7, you can determine the correlation from the JMP printout, but don't forget to attach the correct sign to r.  Or, you can use the stat mode in your calculator to tell you the value of r.

Remember, if there is a positive association, r is positive; if there is a negative association, r is negative.  Also, r and b, the slope, always have the same sign.

Parts (c) and (d)
You will use the least-squares regression equation JMP or your calculator has computed for you to compute the predictions they request.

Part (e)
Extrapolation!  Look at my Lesson 2, questions 1(g) and (h).

Part (f)
I show you how to compute a residual in my Lesson 2, question 1(j).  This is a two-step process.  You must first make the appropriate prediction, then compute the residual.
Question 4
Make sure you have studied Lesson 3 in my book before you answer this and the remaining questions in this assignment.  You should especially look at questions 6 and 7 as illustrations of the Three Principles of Experimental Design and examples of identifying the various factors, factor levels, treatments, experimental units, and response variable for an experiment.  As well as identifying what type of experiment it may be (randomized comparative experiment, block design, matched pairs design).

When they ask for the treatments (part (d)), tell them not only how many treatments there are in the experiment, but what the exact treatments are.  For example, in my Lesson 3, question 7(b), I wouldn't just say that there are 6 treatments.  I would say the 6 treatments are: Dog Food A served early; Dog Food B served early; etc. up to Dog Food C served late.

Here are some extra things to clarify the three principles of experimental design which you may be asked to discuss in questions in this assignment.  Do note that different students get different scenarios and questions, so I cannot be very specific:

Note that randomization is used in experiments to randomly determine which unit gets which treatment (when there are many units and each unit will be given exactly one treatment), or to randomly determine the order the treatments will be administered (when one unit is going to receive two or more treatments).

When discussing the principle of control, there is no need to speculate.  Discuss the actual things they have obviously done to control outside factors or certainly should have done.

By repetition, they mean what I call replication; quite simply: how many times is each treatment being applied?

Note also that we learned in Lesson 2 that correlation does not imply causation.  Just because a pattern is observed between x and y does not mean we have proven that x causes y.  But, the whole point of designing an experiment is to identify possible cause and effect.  If an experiment has been designed properly, we have every right to believe we have proven that blank causes blank, provided we have seen a significant difference in the response variable, when applying one treatment as compared to another. 

Experiments can prove causation!
Question 5
Similar to the previous question.
Question 6
Look at my examples of matched pairs experiments before you design yours here. 
Question 7
This is a good runthrough of the various types of samples.  Be sure to have studied the first half of my Lesson 3 up to the end of question 5 before attempting this question. 

Is the sample a voluntary response sample, convenience sample, simple random sample, stratified random sample, multistage sample?

I also want to point out that, if the researcher selects the entire population to study, then they are doing a census, the biggest sample possible.  I am not saying that you are being given an example of a census, but they have had that occur in other questions in past assignments, and I was remiss in my book in not having discussed a census.

Please note that different students get different questions, so some of you may get different examples and types of samples than others.
Question 8
A nice and easy example of randomization, as I demonstrate early in Lesson 3

Here is a link where you can download Table B, if you have not already done so (although, you don't need it for this question, as they have provided you with a string of random digits):