Stat 2000: Tips for Assignment 4

Published: Fri, 03/18/16

Please read this important message I sent about the binomial table.  This has major implications regarding your final exam:
Final Exam Prep Seminar April 12
Don't have my book or audio lectures? You can download a free sample here:
Did you read my tips on how to study and learn Stat 2000?  If not, here is a link to those important suggestions:
Did you read my Calculator Tips?  If not, here is a link to those important suggestions:
Did you see my tips for Assignment 1? Click here.
Did you see my tips for Assignment 2? Click here.
Did you see my tips for Assignment 3? Click here.
Tips for Assignment 4
Please note that I made major changes to my book in September 2014.  If you are using a book older than September 2014, you are missing about 100 pages of new material and an entirely new lesson on Probability.

Study Lesson 8: Inferences about Proportions (if you are using an older edition of my book, this may be Lesson 7).  You also will need to study the first half of Lesson 9: Chi-Square Tests (up to the end of question 4, you do not need to study the Goodness-of-Fit Test at this time).

To type in formulas you are using and to show your numbers subbed into the formulas you can click the Equation Editor button in the toolbar that looks like the Sigma Summation symbol (you have to click the "..." other options button to see the sigma formula input button.  Then click the various buttons to make your fractions and enter the symbols.  However, the Equation Editor is extremely slow and clunky.  Personally, I would never use it.  Just type ordinary text explaining what you are doing if you think you should show some work.

Exception: Always do any JMP stuff open-book.  Have my tips in front of you, and let me guide you step-by-step through any JMP stuff.  JMP is just "busy" work.  The sooner you get it done and can move on to productive things like understanding the concepts and interpreting the JMP outputs, the better off you will be.  Then again, since you never have to upload the JMP printouts, perhaps you might not even bother to do the JMP at all.  Most questions can be answered by hand even when they told you to use JMP.
Question 1
This is very similar to my question 1(c) and (d) in Lesson 8. Be careful to note which is the true proportion, p, and which is the sample proportion p^.

Be careful that you don't lose accuracy by rounding off too much.  I suggest you round off to no less than 5 or 6 decimal places while computing things like the standard deviation of p^ to ensure that you get accurate z-scores.  Better yet, store exact answers in memory in your calculator.
Question 2
This is standard sample size stuff, like my questions 6 and 7 in Lesson 8.  Remember, if you are given a value of p, that is your p*.  If you are not given a value of p, then p* = 50% = 0.5.

You now have TWO sample size formulas!  One is for MEANS (Lesson 1, questions 6 to 8) and the other is for PROPORTIONS (Lesson 8, questions 6 and 7).  One needs a sigma, one does not.

Cheesy trick to help tell them apart:
  • If they give you sigma, the population standard deviation, they must want you to use the sample size formula for means because that formula needs a sigma.  Read the question, they will clearly tell you that you are trying to estimate the mean.
  • If they don't give you sigma, then they can't be trying to estimate the mean (because you have to be given a sigma for those questions).  They must be estimating a proportion.  Use the sample size formula for proportions.  It doesn't need a sigma.
Don't  forget your Paint-Can Principle when stating the answer for n.  Always round your answer for n UP to the next whole number.  If you need to select 23.1 items, that means 23 is not enough, so you must select 24 items.  Don't follow normal rounding rules.

Note that part (b) is talking about the Inverse-Square Relationship for sample size which I introduced way back in Lesson 1, question 8.  This concept applies to all sample size problems.

Here is another way to think about the Inverse-Square Relationship.  Essentially, if you want your margin of error to get smaller, then you want your sample size to get larger by the square of the factor.  If you want your margin of error to get larger, then you want your sample size to get smaller by the square of the factor. 
  • This means, if you want to multiply the margin of error, you divide the sample size.
  • If you want to divide the margin of error, you multiply the sample size.
For example, if I want to divide my margin of error by a factor of 7, then I multiply my sample size by a factor of 49 (7-squared).  If I want to multiply my margin of error by a factor of 5, then I divide my sample size by a factor of 25 (5-squared).

Part (c) brings you back to the sample size formula but now you have a p*.  Note that you are making a confidence interval with the same level of confidence and same margin of error as you did in part (a).  The only difference is the value of p*.  This illustrates the conservative estimate principle. 

If you are unsure what p* is, you set it to be 50% because that gives you the largest possible sample size.  Not sure what is the truth, then take a big sample to be safe.  If you have better information about what p* is likely to be, you can take a smaller sample size and maintain the same margin of error.  This has practical value since it may be difficult to take a larger sample.
Question 3
This is a good run through of confidence intervals and hypothesis testing for a proportion, as I teach in Lesson 8 (see my questions 2 and 3). 

Be careful that you don't lose accuracy by rounding off too much.  I suggest you round off to no less than 5 or 6 decimal places while computing things like the standard deviation of p^ to ensure that you get accurate margins of error, and accurate test statistics.  Better yet, store exact answers in memory in your calculator.

Note that you will need to use the z* critical value you found in part (f) to compute p^*, the critical value for p^ where you will reject Ho (the p^ decision rule) needed to answer part (g).  We derive p^* from the standardizing formula for p^ bell curves.


Part (h) requires an alpha/beta table. You should have the p^ decision rule found in (g) in the alpha column, and the reverse of that rule (when you do not reject Ho) in the beta column.  You should have what Ho told you is p in the alpha column, and the alternative value for p, given in (h), in the beta column.  On the beta side, draw a p^ bell curve centred at the alternative value of p, and shade the values of p^ where you will not reject Ho.  That is beta, the probability of type II error.  But you want the power, so that is 1 - beta.  Use your p^ bell curve formula (as you did in question 1 above), using the p^ noted in the decision rule in (g) and the alternative p given in (h) as your p (that is the centre of your curve).
Question 4
This is very similar to my questions about confidence intervals and hypothesis tests for the difference between two proportions taught in the latter half of Lesson 8.

Be careful that you don't lose accuracy by rounding off too much.  I suggest you round off to no less than 5 or 6 decimal places while computing things like the standard error of p1^-p2^ to ensure that you get accurate z-scores.  Better yet, store exact answers in memory in your calculator.

Part (b) is just asking for the pooled sample proportion, p^, as given on your formula sheet.  This is what will be used to compute the Standard Error later.  p^ is simply x1 + x2 all divided by n1 + n2.

Part (g) introduces the concept I teach in my question 4 in Lesson 9 (Chi-Square Tests).  Note that you don't really have to do any work for part (g) if you apply the concept that relates two-proportion z tests to 2 by 2 Two-Way Chi-Square analysis.  In other words, you already know the test statistic and P-value.
Question 5
Part (a)
Use the same standard two sentences as always to interpret your confidence interval as I show you way back in Lesson 1, question 1(b).  But, keep in mind that you are interpreting a confidence interval for a proportion, p, not the true mean, mu as I am doing in my example.

Part (b)
Follow my examples back in Lesson 2, question 6 to see how to interpret a P-value.  As always, first stress that you are assuming Ho is correct (what is Ho in this problem?).  Keep in mind that you are testing a hypothesis for two proportions, p1=p2 in your example, not a mean.
Question 6
You will be using Table F for the first time in these last two questions.  Here is a link where you can download the table if you have not already done so:

This is standard Two-Way Table Chi-Square analysis as taught in questions 1 through 4 in Lesson 9 of my book.

You have to use JMP for this question.  You can do most of this question by hand using the methods I teach in Lesson 9 (Chi-Square Tests), questions 1 to 3, but it is a lot of work since the two-way table is so large.  Then again, it is also quite painful to type all of this into JMP, so maybe you would prefer to do this by hand anyway.

I have found an extremely simple two-way table calculator online that is much more straightforward to input the numbers and compute both the expected counts, the chi-square statistic, and the P-value.  It only gives the P-value to 3 decimal places though, so JMP will probably be more accurate.  Also, it does not give the chi-square value for each individual cell, but, if necessary, you can compute these yourself.

Free online two-way table calculator.

Here is how to do Contingency Tables (2-Way Tables) in JMP:

Click New Data Table. You will need a total of three columns. Double-click Column 1 and name it Jockey and change the Data Type to "Character" and the Modeling Type to "Nominal".

Double-click the space to the right of the Jockey column to create a new column. Name that column Placing and change the Data Type to "Character" and the Modeling Type to "Nominal".

Double-click the space to the right of the Placing column to create a new column. Name that column Count and keep the Data Type as "Numeric" but change the Modeling Type to "Nominal". 

Make sure that you have the correct Data Type and Modeling Type for each of these three columns as I outline above!

Each row in the JMP data table is used to enter the information for a particular cell of the two-way table. The first row will represent the 1,1 cell; the second row will represent the 2,1 cell; etc. For example, your 1,1 cell gives you the observed count for Placing 1st if you are Jockey A. 

In the JMP data table, type Jockey A down the first column six times (or copy Jockey A in row 1 and paste it in the next five rows so that Jockey A is in the first 6 rows.  Then, put Jockey B in the next 6 rows of column 1, and Jockey C in the next 6 rows of column 1.  Thus, you will have 18 rows in total featuring Jockey A, B and C six times each.

Now, in column 2 (the Placing column), type 1st, then 2nd, then 3rd all the way to 6th in the first six rows, representing the possible places Jockey A may have finished in.  Then repeat the entries 1st through 6th for the Jockey B and Jockey C rows.

Finally, in column 3 (the Count column) enter the count you were given for each cell of your table.  The first six counts you enter will be the first column of your given two-way table, representing the counts for Jockey A finishing 1st, Jockey A finishing 2nd, all the way to Jockey A finishing 6th.  The next six counts will be the counts you have been given in the second column of your two-way table, representing the counts for Jockey B finishing 1st through 6th.  Finish by entering the six counts given for Jockey C.

You will notice that the first two columns of the JMP table are used to specify which column and row of the two-way table you are talking about, and the third column enters the observed count for that particular cell.

Once you have entered in all the observed counts, select Analyze, Fit Y By X. Select Jockey and click Y, Response, select Placing and click X, Factor, and select Count and click Freq. Click OK.

Click the red triangle next to Contingency Analysis of Jockey by Placing at the top and deselect Mosaic Plot to remove that from the output. You now see a Contingency Table (or two-way table) and the "Tests" below it. (If your two-way table has the rows and columns the wrong way round compared to what the question has, that doesn't really matter, but you can fix that by changing which column you called X and which you called Y.)

Click the red triangle next to Contingency Table and make sure that all that is selected is "Count", "Expected" and "Cell Chi Square" to display those values in each cell of the table. Deselect everything else. 

Note the Pearson ChiSquare is the test statistic for the problem (in the last row of the "Tests" output) and the Prob>ChiSq is the P-value for that test.

Part (a)
As they instruct, be sure to
  1. State your hypotheses (they pretty much gave you the null hypothesis at the start of part(a)).
  2. Read the test statistic off JMP (or compute it manually, if you don't want to use JMP).
  3. Read the P-value off JMP (or use the P-value Calculator to type in your manually computed chi-square test statistic and degrees of freedom to get the exact P-value).
  4. State your conclusion, keeping in mind that you are given alpha = 10%.
Part (b)
I show you how to compute the expected count for a cell and the chi-square for a cell in Lesson 9, questions 1, 2 and 3.

Part (c)
Follow my examples back in Lesson 2, question 6 to see how to interpret a P-value.  As always, first stress that you are assuming Ho is correct (what is Ho in this problem?).  Keep in mind that you are testing a hypothesis for homogeneity in your problem, not a mean. 

I always think the easiest way to interpret a P-value is to, first, stress that you are assuming the null hypothesis is correct, then merely describe in words the shaded region on your density curve that you used to visualize the P-value.

Part (d)
They are asking which four cells have the largest chi-square values.

Part (e)
Use the Chi-Square table to get the critical value for the degrees of freedom involved knowing alpha = 10%.  You can now state the Chi-Square decision rule.  Obviously, you had better make the same decision as you did back in part (a).

Part (f)
As always, use the null and alternative hypotheses here to help formulate what a Type I and Type II error would be.
Question 7
This is standard Two-Way Table Chi-Square analysis as taught in questions 1 through 4 in Lesson 9 of my book.  I suggest you make a table in your answer box (that is an option in the toolbar) to summarize the expected counts and chi-square values.  If you find that too annoying and clunky (it is), you could also give the answers to part (a) like this:
  • Expected Count for 1,1 (Compact, Minor Damage) cell is <blank> and the chi-square value is <blank>.
  • Expected Count for 1,2 (Compact, Extensive Damage) cell is <blank> and the chi-square value is <blank>.
  • etc., etc.
  • Expected Count for 3,3 (Large, Write-Off) cell is <blank> and the chi-square value is <blank>.
You can use JMP or that handy-dandy Free online two-way table calculator to check most of your answers.