Stat 2000: Assignment 4 Tips (Distance/Online Sections)

Published: Wed, 01/30/13

My tips for Assignment 3 are coming below, but first a couple of announcements.

Please note that my first review seminar for Stat 2000 will be on Feb. 24 (one week before the midterm exam). Unfortunately, that means the seminar is the weekend at the end of the week-long Midterm Break, but it is out of my hands. This seminar will cover the lessons in Volume 1 of my book.

For more info about the seminar, and to register if you have not already done so, please click this link:

Stat 2000 Exam Prep Seminar

I am also taking registrations for all my midterm exam prep seminars (Calculus, Linear Algebra, and Statistics). Please click this link for more info and to register, if you are interested:

Grant's Exam Prep Seminars

Did you read my Tips on How to Do Well in this Course?

Make sure you do: Tips on How to Do Well in Stat 2000

Did you read my Tips on what kind of calculator you should get?

Tips on what calculator to buy for Statistics

Did you miss my Tips for Assignment 3?

Tips for Stat 2000 Distance Assignment 3

If you are taking the course by Classroom Lecture (Sections A01, A02, etc.), I will send tips for Assignment 4 once it is posted.

Tips for Assignment 4 (Distance/Online Sections D01, D02, D03, etc.)

Don't have my book? You can download a free sample containing Lesson 3 at my website here:

Grant's Tutoring Study Guides (Including Free Samples)

Study Lesson 4: Inferences for Two Means in my study book to prepare for this assignment.

Be sure to use your Rule of Thumb (Lesson 4) for all of the questions in this assignment to determine if your are using the pooled method or the generalized method. Note, if you are using an older edition of my study book, you must use that insanely complicated degrees of freedom formula for any question that requires the generalized method. Refer to #1 on the formula sheet included in your course outline to see that formula if you can't find it in my book (it is in most of the recent editions of my book, but it depends how old your book is). Also, be sure to skim through the entire question to see if they ever specify which order they want you to subtract your means, and, if so, be sure to do as they say right from the start.

When you are using JMP to do a two-sample hypothesis test or confidence interval, watch which way it is subtracting. It may not do it the way you expected. For example, you may have called "A" sample 1 and "B" sample 2, so you would expect to do A - B, but JMP may do B - A. Look at the two sample means JMP computes for A and B, then check if the "difference" in its t test has computed A - B or B - A.

If JMP is not subtracting the means in the order you wish it to, do this:

Once you have decided who you are naming Sample 1 and Sample 2 (and that means you will be subtracting in that order, "Sample 1" - "Sample 2") or they have told you which order they want, make sure JMP does it the way you expect. Click Column 2 to highlight the entire column. Select "Cols,Validation, List Check..." (in the Columns toolbar at top). It will show the labels you have written for Column 2. The label written first is what JMP will consider Sample 1. If you don't like the order it shows, highlight the label and click "Move Up" or "Move Down" to change the order of the labels.

If you do not do this, your signs will be all wrong. For example, the signs in your lower and upper limits for your confidence interval for the difference between the means would be the opposite of what they should be.

Tip: When you want to do a one-sided test, if JMP has a positive test statistic, you must be doing an upper-tailed test; if JMP has a negative test statistic, you must be doing a lower-tailed test. But, again, watch the way JMP has subtracted the two means to identify who is who.

For Question 1, note that you have been given the Standard Errors of x-bar (the SE values), so you will have to do some algebra to determine the standard deviations. I give you the formula for SE of x-bar back in Lesson 1 of my book and also again in Lesson 4 when I first start talking about standard errors. Click this link to see that formula:

Standard Error of the Sample Mean

Question 2 is a good run through of two-sample methods. Make sure you use your Rule of Thumb.

Here is how to do the JMP part of Question 3:

Open a New Data Table and type the data in manually in this manner: Name your first column "Price" or something like that, and type all the prices down that column. Which is to say, type in the four-bedroom selling prices down the column and then continue to type all the three-bedroom selling prices below that. Double click at the top to the right of the "Price" column heading to create a new column and name it something like "Type of Home". Down that column type something like "four-bedroom" repeatedly down that column in all the rows that have the prices for four bedroom homes. Then type something like "three-bedroom" repeatedly down the column in the rows that have three bedroom prices. You may want to type the phrase once and then copy and paste it down the rest of the relevant rows to ensure there are no typos. Once you have done that, double-click the "Type of Home" column heading and confirm that the Data Type is Character and the Modeling Type is Nominal and click OK.

Select Analyze, then Fit Y By X. Highlight the numeric column "Price" and click the Y, Response button. Highlight the character column "Type of Home" and click the X, Factor button. Click OK.

You should now see a graph with two vertical arrays of dots showing the prices of three and four bedroom homes separately. (If you don't see that graph. for example, if you see a Mosaic Plot instead, that means you do not have the Data Type and Modeling Type correct for your columns. Go back to your data table and make sure you have the correct Data Type and Modeling Type as I outline above.) Click the red triangle above the graph and select "Display Options" and select Box Plots to see side-by-side boxplots. That will enable you to get a feel for the symmetry or skewness of the distributions to help you decide if use of t is acceptable. Even if use of t is not acceptable, you are going to use it anyway. Click the red triangle again and select "Means and Std Dev" to get a summary of the means and standard deviations of the two samples. Click the red triangle again and select "t-Test" to get the output for a hypothesis test and confidence interval assuming unequal variances. Click the red triangle again and select "Means/Anova/Pooled t" to get the output that includes a hypothesis test and confidence interval assuming equal variances. Click the red triangle again and select "Set α level" to have the outputs change the confidence intervals to your desired level of confidence. For example, if you want 98% confidence intervals you would set alpha to be .02, or, if you want 90% confidence intervals, you would set alpha to be .10. In other words, α = 1 - C.

By the way, I have no idea what they are getting at in 3(e), so your guess is as good as mine. I think they mean they are not simple random samples (SRS) since the data is strictly from one place instead of all over the country. There are all sorts of reasons you could give as to why you should not use t in this case, I have no idea how you could justify using t if the samples are not random. You can only say that we must assume that these samples are representative of housing prices in the area. Note that there is also reason to believe these samples are not independent (a necessary condition of two-sample tests) because people are likely to set their asking price based on what other homes are selling for and that four-bedroom homes are likely to have their asking price influenced by the asking price of three-bedroom homes. I think a better question to ask would have been, "Give reasons why this data may not be reliable or may violate the conditions of two-sample testing."

You will need to copy and paste this output into a document to get ready to add your answer for for part (e) as well. Here is how to do that:

Click the thin blue line near the top of the window that has the histogram, etc. to reveal the toolbar. Select the icon that looks like a fat white cross or plus sign "+". This is your "Selection" tool. Your mouse cursor should now have changed from an arrow to that white cross. Click the title bar that says "Bivariate Fit of ..." at the top of the output and that should select the entire output (scatterplot, Summary of Fit, etc.). Right-click and select Copy.

Now, open whatever program you use for word processing (such as Word). In a new document, right-click and select Paste to paste your output into the document.

In your Word document, below the outputs you have pasted in, type in your answer for part (e).

You are now ready to save and upload the file that answers the question. In your Word document (or whatever program you are using), select "File" then "Save As" and select "PDF File". Type in whatever name you want the file to have in the "File name" section. Select which folder you want to save the file in (I suggest you select "Desktop" so that the file will just appear write on your desktop home screen. Click "Save" or "Publish". You should now have your file ready to upload into the assignment.

Do the JMP in Question 4 just like I showed you what to do in Question 3 above. Your first column should have all the SSHA scores and your second column will be a character column where you type in women and men in the appropriate rows. Always make the numeric column Y and the character column X when you select Fit Y By X.

Make sure you note the way that JMP has subtracted the two samples and confirm it is the way you wanted or fix it as I discussed above.