Stat 2000: Tips for Assignment 3

Published: Sat, 02/25/12

 
Please note that my midterm exam prep seminar for Stat 2000 will be on Sunday, Feb. 26, in room 100 St. Paul's College, from 9 am to 9 pm .  I am now ready to take registrations.  Please click this link for more information about the seminar and to sign up if you are interested:
Grant's Stat 2000 Exam Prep Seminars 
 
Join Grant's Tutoring on Facebook or follow Grant on Twitter.
Simply go to www.grantstutoring.com and click the Facebook and/or Twitter icons.
 
If you ever want to look back over a previous tip I have sent, do note that all my tips can be found in my archive.  Click this link to go straight to my archive:
 
Grant's Updates Archive
 
Did you miss my Tips on How to Do Well in this Course? Click here
 
Did you miss my Tips for Assignment 2? Click here
 
If you are taking the course by Distance/Online (Sections D01, D02, etc.), click here for my tips for your Assignment 3.
 
If you are taking the course by classroom lecture (Sections A01, A02, etc.), click here for my tips for your Assignment 3.
 
Tips for Assignment 3 (Classroom Lecture Sections A01, A02, etc.)
 
You need to study Lesson 6: Discrete Probability Distributions (this also includes the Binomial and Poisson Distributions; if you are using a considerably older edition of my book, you may have those two distributions taught in a separate Lesson 7).  I am also surprised to see a lot of probability questions on this assignment that are from Stat 1000, so I have attached a handout from my Basic Stats 1 study book with a more thorough discussion of making two-way tables and Venn diagrams.  Those of you who have my Basic Stats 1 book, should study Lesson 5: Introduction to Probability.
 
Excerpts from Grant's Probability Lesson in Basic Stats 1 
 
Question 1 is very similar to my question 18 in the handout I have given you. I certainly recommend you make a three-circle Venn Diagram to solve this problem.
 
Question 1 (g) and (h) are examples of conditional probability.  I also discuss conditional probability in the handout, but don't really show any examples of it.  This is denoted P(B|A), pronounced "probability of B given A".  The formula for this is
 
P(B|A) = P(A and B) divided by P(A)
 
What this really means is this:
 
Step 1: Look through your sample space or Venn diagram and find all the parts that belong to A and add those probabilities up.  That is the denominator of your answer, P(A).
 
Step 2: Now go through the bits you just added up in Step 1 and collect the bits that also belong to B.  Add that subset of bits up.  That is the numerator of your answer, P(A and B) because that is the bits that are B and also A.
 
Step 3: Divide your answer in Step 2 by your answer in Step 1 to get the conditional probability.
 
Essentially, conditional probability, when it says "given A" is telling you that we know for sure that event A has occurred, so we are now only interested in outcomes that belong to A.  That becomes the "whole".  P(B|A) wants the fraction of that "whole" that also belongs to B.
 
For example, if you look at my question 18 in the handout, I could add a part (d) that asks, "What is the probability someone is a basketball fan if they are a hockey fan?"  Any probability question that asks, what is the probability of B if event A has occurred, you are doing conditional probability. 
 
We want P(B|H).  I first look through my Venn diagram and find all the bits that belong to H, since we know for sure the person is a hockey fan. There are four bits in the H circle so I add those bits up: 33 + 31 + 8 + 5 = 77%.  Now, I gather all the bits in that H circle that represent people who are also basketball fans.  There are two bits: 8 + 5 = 13%.  Thus, the probability a person is a basketball fan if they are a hockey fan is 13%/77% or .13/.77 = .1688.
 
Question 2 is obviously binomial (but he tells you not to use Table C so you must use the formula to compute the probabilities).  Some of the answers you will be able to verify with the table though.
 
I assume your prof has written out the 4 conditions of a binomial distribution in class.  Here is the way I would number and describe them:
1. Each trial has only two possible results: Success or Failure.
2. Each trial is independent.
3. There are a fixed number of trials, n.
4. The probability of success on each trial is constant, p.
 
Question 3 is another two-way table problem similar to my question 13 in the handout I have given you.  I also discuss these concepts earlier in the handout.
 
Question 4 is very challenging.  You can use the properties of mean and variance I teach in Lesson 4 to work out the mean and variance of X1 + X2.  Since X1 and X2 are both normal distributions, X1 + X2 is also normal.  Part (b) wants X1 < X2 which can also be written X1 - X2 < 0.  Again, you can find the mean and variance of X1 - X2.
 
Question 5 is similar to a question I teach in Lesson 6 in the "Hypothesis Testing Revisited" section.  Now you can use Table C.
 
Questions 6 and 7 are obviously Poisson Distribution questions similar to my examples in Lesson 6. 
 
 
Again, this assignment focuses on Lessons 1 and 2 in my study book.  The difference is that now you are using t instead of z because σ, the population standard deviation, is not given.  You will also need to study the Matched Pairs section of Lesson 4 of my book.  Study up to the end of question 4 in that lesson (the rest of the lesson will be covered in Assignment 4).  If you are using an older edition of my book, you should find Matched Pairs taught as the last two questions in Lesson 2 of my book.
 
Question 1 is standard stuff for a student who has studied my lessons.  Be sure to use the Stat mode in your calculator to work out the mean and standard deviation.  See Appendix A in my book if you don't know how.
 
Question 2 (bone formation):
Personally, I would not bother stacking this data like they suggest (good luck even successfully copying it and pasting it at all).  I would merely type the data in manually.
 
Open a "New Data Table" in JMP.  Double-click "Column 1" and name it something like "OC".  Make sure the "data type" is numeric and the "modeling type" is continuous, and click OK.  Now type the given data into the column on the spreadsheet and make sure you don't make a mistake.
 
If you insist on copying and stacking the data, here's how:  Copy and paste the given data into a New Data Table in JMP.  In the toolbar at the top, select "Tables", then select "Stack".  Highlight all of the columns in the "Select Columns" box and click "Stack Columns" and click OK.  You will now see all of the data stacked into one column (there will be another column showing all the column names which you can ignore).  Name the column something like "OC" and make sure its Data Type is Numeric and its Modeling Type is Continuous.  Click OK.
 
To get JMP to make confidence intervals for the mean:
Select "Analyze, Distribution" from the toolbar at top.  Highlight the column you are interested in ("OC" in this case) and click the "Y, Columns" button.  Click OK.  You are now taken to a window showing a histogram and stuff.  To get a confidence interval, click the red triangle next to your column variable directly above the histogram to get a drop-down list and select "Confidence Interval".  In the pop-up window that appears, select "Other" (even if the level of confidence you desire is in the list) and type in the level of confidence you want (in decimal form, so 95% is 0.95).  Make sure "Two-sided" is selected.  You are not given a value for sigma in this question, so make sure the "Use known Sigma" checkbox is not selected.  Click OK.  A Confidence Intervals table will appear in your output screen at the bottom.
 
Of course, JMP will already have made a histogram for you while you were getting the confidence interval, so I would use that graph.  If you want the stemplot instead, click the red triangle and select "Stem and Leaf Plot".  When they ask you to comment on the suitability, remember my discussion in Lesson 1 just before question 1 about the key sample size values of 15 and 40.
 
To get JMP to test hypotheses for the mean:
To test a hypothesis, click that same red triangle you used to make a confidence interval and select "Test Mean".  Type in the value the null hypothesis believes the mean to be and type in the known value of sigma, if you have one (otherwise leave that value blank).  Click OK.  A Test Mean =  Value table appears in your output where, among other things, JMP gives you the test statistic and three probability values.  Those three probabilities are the P-value for the three possible alternative hypotheses.  JMP will use a z statistic if you are given a sigma value to enter or a t statistic if sigma is unknown.
 
Prob > |t| is the two-tailed P-value.
Prob > t is the upper-tailed P-value.
Prob < t is the lower-tailed P-value.
 
To get rid of any outputs you don't want to copy and paste, click the red triangle and deselect the unwanted things.
 
To copy and paste the parts of a JMP printout you do want, select the icon on the JMP toolbar that looks like a fat white plus sign "+" (the Selection tool).  You can then click various parts of the printout to select the sections you want.  Copy and paste into Word or something like that.
 
Questions 2 (e) to (g):
To make a column with the logarithms:  Double-click on the empty space next to the last column of data you have to make JMP create a new column for you.  Name it something like "log(OC)".  Double-click that new column heading to get the pop-up menu.  Click the "Column Properties" button and select Formula.  Now click the Edit Formula button.  In the formula pop-up screen select "Transcendental" in the Functions(grouped) menu and then select "Log" in the sub-menu.  You will see Log appear in the section below with a set of brackets around a red box.  Highlight the OC column in the "Table Columns" section of this screen to make OC appear in that red box.  Now click OK a few times to get back to your data table and you should see your Log(OC) column filled in with numbers.  Each of those numbers is the natural log of the original OC scores.  Which is to say, it is identical to computing the "ln" of each OC score by pressing the "ln" button on your calculator (which is right next to the "log" button on your calculator).  For example, if your OC score was 49.9, then log(OC) would be ln(49.9) = 3.91002...
 
You can now make the confidence interval for "Log(OC)" in the same way you made the confidence interval for "OC" except using the "Log(OC)" column, of course.
 
Once you have found the confidence interval limits for your Log(OC) scores, you can convert those back to OC limits by simply using the ex button on your calculator.  You get ex by pressing "2nd F" "ln" or "SHIFT" "ln".  For example, if your lower limit for Log(OC) was 5, then you would press "2nd F" "ln" 5 to compute e5 = 148.413... to get the corresponding lower limit for your OC score.  Do not think your answers in (g) have to match your answers in (a).
 
Why are they doing this?  This all boils down to the reliability of our confidence intervals or hypothesis tests for means.  Remember, our methods are only reliable if the sample mean is normally distributed.  If n < 15, we can only trust our methods if our population is normal.  If n ≥ 15, we can generally trust our methods even if the population is not normal.  If the population is strongly skewed or has outliers, we should use n ≥ 40.  That is why they are having you make graphs.  To get an idea of the possible shape of the population and therefore the reliability of your methods.  Statisticians sometimes transform the data (by doing logarithms or something) in order to make a new population that is more normally distributed than the original population, and so to be able to get more reliable confidence intervals or hypothesis tests.
 
Which data do you think will make t more reliable in your problem?  The OC data or the log(OC) data?  Which confidence limits do you think are more reliable?
 
Question 3:
Make sure you are examining this data correctly!  Again, look at the Matched Pairs section in Lesson 4 of my book.  Use your calculator in Stat mode to work out the mean and standard deviation (see Appendix A in my book how to do this on your calculator).