Stat 1000: Tips for Assignment 2

Published: Sun, 09/28/14

Follow:

My Midterm Exam Prep Seminar for Stat 1000 is Saturday, October 4. It is NOT on campus. It is held at Canadian Mennonite University, on the SOUTH side of Grant Ave., at 600 Shaftesbury Blvd. In the Lecture Hall. It costs $20 if you bring my book to the seminar, or $40 without a book.

Midterm Exam Prep Seminar for Stat 1000
$20 with Grant's book, $40 without
Saturday, October 4
9:00 am to 5:00pm
Lecture Hall, Canadian Mennonite University
600 Shaftesbury Blvd.
Corner of Grant Ave. and Shaftesbury
South Side of Grant (same side as Shaftesbury High School
Map of CMU campus (Lecture Hall is 21 on map)

Click here to register for the seminar (you pay at the door).

Did you read my tips on how to study and learn Stat 1000? If not, here is a link to those important suggestions:

Stat 1000: 4 Tips on How to Study & Learn This Course

Did you read my Calculator Tips? If not, here is a link to those important suggestions:

Stat 1000: Calculator Tips

Did you see my tips for Assignment 1? Click here.

Tips for Assignment 2

Study Lessons 2 and 3 in my study book (if you have it) to learn the concepts involved in Assignment 2. Don't start working on the assignment too soon. Study and learn the lesson first, and use the assignment to test your knowledge. Of course, always seek out assistance from my book, your course notes, etc. if you ever hit a question you don't understand, but try not to be learning things as you do an assignment. Learn first, then put your learning to the test.

Exception: Always do any JMP stuff open-book. Have my tips in front of you, and let me guide you step-by-step through any JMP stuff. JMP is just "busy" work. The sooner you get it done and can move on to productive things like understanding the concepts and interpreting the JMP outputs, the better off you will be.

Don't have my book? You can download a free sample of my book and audio lectures containing Lesson 1:

Free sample of Grant's Tutoring book and audio

A Warning about StatsPortal

Make sure that you are using Firefox for your browser. Don't even use Internet Explorer. It actually also has some glitches in the HTML editor boxes.

Do note that every time you exit a question in StatsPortal, the next time you return to it, the data may very well change. Do not press the "back-up" button on your browser in a question. That, too, will change the data. When you are prepared to actually do a question, open the link, keep it open, and do not close it until you have submitted your answers. Be sure to press "Save Answers" once you have done any calculations and entered any information to ensure the data does not change and force you to start over again.

After you submit the answer to a question, if you have been marked wrong on any parts, be sure that you write down the correct answers before you exit the screen (or grab a screen shot). To try a second attempt at the question do not click the link to the question again, that will change the data and you will have to start all over again. Also, DO NOT click "try again" or make a "second attempt." That will also reset the data.

Instead, exit back to the home screen where they show the links for all the different questions on the assignment. Where it shows the tries for a question on the right side of your screen, you should see the "1" grayed out, showing that you have had 1 attempt. Click the number "2" to get your second attempt with the same data. That way you can enter the answers you already know are correct and focus on correcting your mistakes.

You should also have already downloaded the JMP statistical software which was provided with either one of the course options for StatsPortal as mentioned in your course outline.

Make sure you have gone through Assignment 0 completely to learn how to use the interface. I also suggest you print out a copy of question 8 in Assignment 0 (Long Answer Questions - Part 3) so that you have the steps for saving and uploading files into the HTML editor in front of you.

Question 1: Correlation - Ice Cream Sales

To compute the correlation coefficient by hand, DO NOT follow my example in Lesson 2, question 1(c). They have given you slightly different column headings so they want you to compute r by hand a slightly different way. They are using this formula for the correlation coefficient (click the link):

Alternative Formula for the Correlation

Put your calculator into Linear Regression Stat Mode and enter the data. Note that I show you how to do that in Appendix A of my book. You can also click the Calculator Tips above to see these steps. Make sure you are following the steps for Linear Regression (the second column) and not the Basic Data Problem steps (the first column).

Once you have entered the data, you can confirm that you have the same answers for x̅ , y̅ , Sx, and Sy that they have provided for you. Your answers should match theirs when rounded to two decimal places.

For example, after you have entered your (x,y) data points, Sharps use "RCL 4" to get x̅ and "RCL 7" to get y̅ . "RCL 5" gives you Sx and "RCL 8" gives you Sy.

A lot of Casio calculators (and some Texas Instruments) use the "σ" symbol ("sigma," the Greek lowercase "s") to denote "standard deviation". For example, in many Casios, after you have entered the data, you first select "S.VAR." You will find it written above one of your buttons, perhaps above the "2" or nearby on the keyboard. It is accessed by pressing "SHIFT" then "S.VAR" (Statistical Variables). Once you select S.VAR, you are shown a menu where you see the symbol " x̅ " for the sample mean (select "1" and press "=" to get the sample mean). You are also told you can press "2" to get " xσn " or press "3" to get " xσn-1 ". That is Casio's way of designating the population standard deviation and the sample standard deviation, respectively. You will always want the sample standard deviation, Sx, so select " xσn-1 " (number 3 in the menu). Similarly, if you select S.VAR and then press your right arrow button, you will be scrolled through other options. For example, you can select " y̅ ", the mean of the y values, or " yσn-1 " to the get Sy, the standard deviation of the y values.

Here is how I suggest you do this problem:

Enter all your (x,y) data points into your calculator once you have put it into Linear Regression Stat Mode.
Ask your calculator for x̅ , y̅ , Sx, and Sy and confirm your answers match the givens when rounded off to two decimal places. If so, ask your calculator for r, the correlation coefficient, and note its value, rounded off to four decimal places (and make sure you round, don't trim: e.g. 0.61736 rounds to 0.6174) Once you have correctly found r, keep your data in the calculator ready to proceed to Step 3.
ON PAPER, proceed to calculate and record all the entries you will eventually type into the boxes. The first column is telling you to subtract x̅ from each of the six x values; the second column is telling you to subtract y̅ from each of the six y values; the last column is telling you to multiply the entries in the first two columns together. USE THE VALUES FOR THE MEANS AND STANDARD DEVIATIONS THAT YOU WERE GIVEN IN THE QUESTION. WRITE DOWN ON PAPER EVERY SINGLE DECIMAL PLACE YOUR CALCULATOR GIVES YOU. You should find that the answers for your products that you are putting in the third column will have three or four decimal places (depending on the given values for the means). In the boxes they provide, you will enter these values rounded off to two decimal places as instructed.
Compute the total of that last column (that is the numerator in the alternative formula for the correlation I have given you above). Be sure to compute the total using the two-decimal place values you will round off to when you enter them in the boxes provided.
Compute the denominator in the alternative formula for r I have shown you above by multiplying n-1, Sx, and Sy together (using the two decimal place values for Sx, and Sy they have given you). Write down the complete answer you have found, keeping all the decimal places.
Now compute r by dividing the total you computed in Step 4 by the answer you computed in Step 5. Hopefully, the answer you get for r, when you round off to four decimal places as they request, will be very close to the actual value you got for r by using the Stat mode in your calculator. I would expect the answer you have computed by hand should match the answer your Stat mode gives you for r accurate to about 2 decimal places. If your two methods for computing r are basically the same to 2 decimal places (maybe the last digit is off by 1 or 2), then you can safely assume you have not made a mistake in your calculations.
Once you have confirmed that you were able to compute the correct value of r by hand, enter all the numbers you computed into the appropriate boxes. My hunch is that the assignment has been programmed to mark the value of r you compute using the rounded off numbers, whereas the value you compute using the Stat Mode in your calculator will actually be too accurate, and possibly marked wrong if it isn't close to the rounded off answer you compute by hand.

I hope this works. If they mark your value for r wrong, try entering the value you computed using the Stat Mode instead (assuming it is slightly different to the value computed by hand).

Question 2: Regression - Cholesterol vs Fat

First, make sure you decide which variable is x and which variable is y in this problem. Is the explanatory variable, x, "Fat Consumption"? or is it "Cholesterol Level"?

Use the formulas to compute the slope and intercept that I introduce in Lesson 2, question 1(e) and also use again in question 5 of that lesson. These formulas are also provided on the course Formula Sheet.

Again, I recommend you use the Linear Regression Stat Mode on your calculator to enter the data and check that you get the same answers for the means, standard deviation, and correlation coefficient as they have given you. Then you can confirm that you have used the formulas correctly by matching your Stat mode's answers for a, the intercept, and b, the slope, as you get by the formulas you use. I suspect that your Stat Mode answers will differ slightly from the values you compute because your computations are using rounded off values for the means and standard deviations.

Do not use your calculator's perfect values. Use the rounded off numbers you were given for the means and standard deviations to compute the slope and intercept.

If your answers you compute for a and b, rounded to four decimal places do not precisely match the perfect answers your Stat Mode gives you, I recommend you enter the computed values in the boxes in part (a) and use those rounded off values to answer the remaining questions. Make sure you round your answer for the slope to 4 decimal places before you proceed to use it to compute the intercept. Then, of course, round the intercept to 4 decimal places, too. Be sure to use these rounded off values for any other computations the question requires.

Note, in part (b), they ask for a proportion, not a percentage, so leave your value for the coefficient of determination as a decimal (see my Lesson 2, question 1(d)). Do not change it into a percent.

Use the rounded off answers you submitted in part (a) to make the prediction requested in part (c).

I show you how to compute a residual (part (d)) in my Lesson 2, question 1(j). This is a two-step process. You must first make the appropriate prediction, then compute the residual.

Make sure you have taken a look at my Lesson 2, question 4 to learn some key facts about the correlation that may help you with part (e).

Question 3: Regression - Temperature vs Latitude

Click the "New Data Table" icon on the toolbar at top left in the JMP home screen. Double-click the region to the right of "Column 1" to create "Column 2." Rename Column 1 "Latitude" and Column 2 "Temperature" by either double-clicking the columns and typing in the new name or by right-clicking the columns and selecting "Column Info," typing in the name and clicking OK. Type in the data. You can move from one cell to the next in the data table by pressing "Enter", "Tab" or the arrow buttons on your keyboard.

Select "Analyze", then "Fit Y By X". Highlight "Latitude", and click the "X, Factor" button. Highlight "Temperature" and click the "Y, Response" button. Click OK.

You should now see a scatterplot. (If you don't, your data is not properly formatted; go back and check the columns are Numeric and Continuous by right-clicking each column name and selecting "Column Info". The Data Type should be Numeric, and the Modeling Type should be Continous.)

Click the red triangle above the scatterplot and select "Fit Line" and JMP will draw in the least-squares regression line. Note, it shows you the regression equation directly under "Linear Fit" below the scatterplot. JMP also shows you the value of r-squared (the coefficient of determination) in the "Summary of Fit", rather than r, the correlation coefficient. You can then square root this number to get r, the correlation coefficient, but use your scatterplot to help you decide if r is negative or positive because your calculator can't tell you that.

They don't ask you to hide the "Analysis of Variance" and "Parameter Estimates" parts of the output, but you can do so if you wish. Simply click the gray triangle next to those title bars, and you will see those parts of the output disappear.

If you are using Windows PC:

Press "Alt" on your keyboard or click the thin blue line that is near the top of the window to get the toolbar icons to appear. Select "File" then "Save As" to get a pop-up window. Type in whatever name you want the file to have in the "File name" section. Click the "Browse Folders" arrow and select which folder you want to save the file in (I suggest you select "Desktop" so that the file will just appear right on your desktop home screen). Finally, click the drop down arrow in the "Save as type" section and select "JPEG File". Click "Save". You should now have your file ready to upload into the assignment.
To upload your file into the text box they provide: Click "HTML editor" below the text box (if you have not already done so) to make a toolbar appear in the text box. Click the toolbar option called "Link" and select "Image." In the pop-up window that appears, click the button called "Find/Upload File" (it is at the bottom of the pop-up window, you may have to enlarge the box or scroll down to see it). Click the "Browse" button and find the scatterplot file you just saved. Either double-click that file or select it and click "Open" and you should see the path to that file appear in the Browse box. Click "Upload File" and its name should appear in the "Uploaded Files" pop-up window. Select the file in the list of "Uploaded Files" to highlight it and click OK and you should see the file appear in the text box.

If you are using Apple/Mac:

You will need to take a screen shot of your output in order to upload it. To take a screen shot hold down Command+Shift+4 and drag the cross-hairs over the image to capture it. The image will save a .png file to your desktop by default.
To upload your file into the text box they provide: Click "HTML editor" below the text box (if you have not already done so) to make a toolbar appear in the text box. Click the toolbar option called "Link" and select "Image." In the pop-up window that appears, click the button called "Find/Upload File" (it is at the bottom of the pop-up window, you may have to enlarge the box or scroll down to see it). Click the "Browse" button and find the scatterplot file you just saved. Either double-click that file or select it and click "Open" and you should see the path to that file appear in the Browse box. Click "Upload File" and its name should appear in the "Uploaded Files" pop-up window. Select the file in the list of "Uploaded Files" to highlight it and click OK and you should see the file appear in the text box.

Type your answers for the rest of the question into the box (make sure you have clicked HTML Editor). You will have to compute the predictions and residual they request yourself using the least-squares regression equation JMP has computed for you. When they ask in part (g), "What does the sign of the residual tell us?" they merely mean, was the actual temperature higher or lower than you predicted it would be?

Question 4: Design of Experiments 1

Make sure you have studied Lesson 3 in my book before you answer this and the remaining questions in this assignment. You should especially look at questions 6 and 7 as illustrations of the Three Principles of Experimental Design and examples of identifying the various factors, factor levels, treatments, experimental units, and response variable for an experiment. As well as identifying what type of experiment it may be (randomized comparative experiment, block design, matched pairs design).

Be sure to click the HTML Editor link before you type your answers into the box provided. When they ask for the treatments (part (e)), tell them not only how many treatments there are in the experiment, but what the exact treatments are (since it is unclear what they want here). For example, in my Lesson 3, question 7(b), I wouldn't just say that there are 6 treatments. I would say the 6 treatments are: Dog Food A served early; Dog Food B served early; etc. up to Dog Food C served late.

Note that randomization is used in experiments to randomly determine which unit gets which treatment (when there are many units and each unit will be given exactly one treatment), or to randomly determine the order treatments are administered (when one unit is going to receive two or more treatments).

Note also that we learned in Lesson 2 that correlation does not imply causation. Just because a pattern is observed between x and y does not mean we have proven that x causes y. But, the whole point of designing an experiment is to identify possible cause and effect. If an experiment has been designed properly, we have every right to believe we have proven that blank causes blank, provided we have seen a significant difference in the response variable, when applying one treatment as compared to another. Experiments can prove causation!

Question 5: Design of Experiments 2

Similar to the previous question. When discussing the principle of control, there is no need to speculate. Discuss the actual things they have obviously done to control outside factors or certainly should have done. By repetition, they mean what I call replication: how many times is each treatment being applied?

Question 6: Matched Pairs Experiments

Look at my examples of matched pairs experiments before you design yours here. Be sure to click HTML Editor before you type in your answer.

Question 7: SRS Random Digits Table

A nice and easy example of randomization, as I demonstrate early in Lesson 3. Here is a link where you can download Table B, if you have not already done so:

Table B: Random Digits Table

Question 8: Sampling Schemes

This is a good runthrough of the various types of samples and the possible biases that can exist. Be sure to have studied the first half of my Lesson 3 up to the end of question 5 before attempting this question. Is the sample a voluntary response sample, convenience sample, simple random sample, stratified random sample, multistage sample?

Note that you are told that, of course, there will be nonresponse in any sample, so don't bother listing that as a source of bias unless you believe the nonresponse will be especially egregious due to the design.

Note that response bias occurs if the respondent lies, or if the question is phrased in such a way as to force the respondent to not be truthful (what I call question bias in my book). For example, if a researcher asks, "Stephen Harper, great Prime Minister or greatest Prime Minister?" that would be considered response bias because he is forcing you to say he is either great or the greatest, even if you really think he is mediocre or awful.

Be clear when listing the bias you see. For example, don't just say "response bias". Say, there is response bias because there will be too many teenage boys will exaggerate how many girlfriends they have had, if a researcher is asking have you ever kissed a girl? (I am not saying that is the correct answer for any of your questions; I am just saying be clear what you mean instead of just using a generic term.)

Don't speculate about biases that might be there (such as saying, the researcher might be doctoring their data, for example; there is always that possibility, but we are not going to be paranoid and mention that as a possible bias everytime, unless we have clearly been given reason to believe that has occurred). Only discuss biases that are clearly present by the information you have been given. There is certainly bias if a bad sampling method is being used, but say what you think is the likely direction of that bias. For example, most voluntary response samples tend to have a much higher proportion of complainers (negative opinions) than is likely to be seen in the population as a whole.

Note that there may not be any obvious bias in some of the questions. If you believe there is no obvious source of bias, say so. After all, if there is a good sampling design being used, we would hope there is no obvious bias.