STAT 2000 Tips for Assignment 2 of 6

Published: Fri, 10/01/10

Grant's Updates for Stat 2000

Hi ,

You are receing this e-mail because you indicated when you signed up for Grant's Updates that you are taking Stat 2000 this term. If in fact, you are not taking Stat 2000, please reply to this e-mail and let me know, and I will fix that.

Throughout the term I will send you all sorts of tips to help you study and learn the course. You probably already have done so, but, if not, I strongly recommend you purchase my Basic Stats 2 Study Book. You will find it a great resource to learn the course. I pride myself in explaining things in clear, everyday language. I also provided numerous examples of all the key concepts with step-by-step solutions. You can order my book at UMSU Digital Copy Centre at University Centre at UM campus. They make the book to order so please allow one business day. The book is split into two volumes and each volume costs $45 + tax.

If you ever want to look back over a previous tip I have sent, do note that all my tips can be found in my archive. Click this link to go straight to my archive:

Grant's Updates Archive

Never forget, I am just a phone call or an e-mail away if you ever have any questions,

Grant

Tips for Assignment 2 of 6

Study Lessons 1, 2, and 4 in my study book (if you have it) to learn the concepts involved in this assignment.

Note that this entire assignment uses t, not z since they never give you the population standard deviation, σ.

For question 2: Personally, I would find it much easier to simply open a "New Data Table" in JMP, type the 31 scores we are given into column 1 manually, then double-click "Column 1" at the top and name the column "OC" or something.

If you really want to paste and stack the data, here's how.

To stack the given data in question 2: Copy and paste the given data into a New Data Table in JMP. In the toolbar at the top, select "Tables", then select "Stack". Highlight all of the columns in the "Select Columns" box and click "Stack Columns" and click OK. You will now see all of the data stacked into one column (there will be another column showing all the column names which you can ignore). Name the column something like "OC" and make sure its Data Type is Numeric and its Modeling Type is Continuous.

If you don't remember how to use JMP to get confidence intervals or to test hypotheses for the mean, re-read my Tips for Assignment 1 of 6 for Stat 2000 in my homework tips archive above.

To make a column with the logarithms: Double-click on the empty space next to the last column of data you have to make JMP create a new column for you. Name it something like "log(OC)". Double-click that new column heading to get the pop-up menu. Click the "Column Properties" button and select Formula. Now click the Edit Formula button. In the formula pop-up screen select "Transcendental" in the Functions(grouped) menu and then select "Log" in the sub-menu. You will see Log appear in the section below with a set of brackets around a red box. Highlight the OC column in the "Table Columns" section of this screen to make OC appear in that red box. Now click OK a few times to get back to your data table and you should see your Log(OC) column filled in with numbers. Each of those numbers is the natural log of the original OC scores. Which is to say, it is identical to computing the ln of each OC score by pressing the "ln" button on your calculator (which is right next to the "log" button on your calculator). For example, if your OC score was 49.9, then log(OC) would be ln(49.9) = 3.91002...

Once you have found the confidence interval limits for your Log(OC) scores, you can convert those back to OC limits by simply using the ex button on your calculator. You get ex by pressing "2nd F" "ln" or "SHIFT" "ln". For example, if your lower limit for Log(OC) was 5, then you would press "2nd F" "ln" 5 to compute e5 = 148.413... to get the corresponding lower limit for your OC score.

Why are they doing this? This all boils down to the reliability of our confidence intervals or hypothesis tests for means. Remember, our methods are only reliable if the sample mean is normally distributed. If n < 15, we can only trust our methods if our population is normal. If n ≥ 15, we can generally trust our methods even if the population is not normal. If the population is strongly skewed, we should use n ≥ 40. That is why they are having you make graphs. To get an idea of the possible shape of the population and therefore the reliability of your methods. Statisticians sometimes transform the data (by doing logarithms or something) in order to make a new population that is more normally distributed than the original population, and so to be able to get more reliable confidence intervals or hypothesis tests.

Which data do you think will make t more reliable in your problem? The OC data or the log(OC) data?

Make sure you have reviewed questions 19 and 20 in my Lesson 2 before you attempt question 3 on your assignment.

Be sure to use your Rule of Thumb (Lesson 4) for questions 4 to 7 to determine if your are using the pooled method or the generalized method. Note, if you are using an older edition of my study book, you must use that insanely complicated degrees of freedom formula for any question that requires the generalized method. Refer to #1 on the formula sheet included in your course outline to see that formula if you can't find it in my book (it is in most of the recent editions of my book, but it depends how old your book is).

For Question 4, note that you have been given the Standard Errors of x-bar (the SE values), so you will have to do some algebra to determine the standard deviations. I give you the formula for SE of x-bar back in Lesson 1 of my book and also again in Lesson 4 when I first start talking about standard errors.

Here is how to do the JMP part of Question 6:

Open a New Data Table and type the data in manually in this manner: Name your first column "Price" or something like that, and type all the prices down that column. Which is to say, type in the four-bedroom selling prices down the column and then continue to type all the three-bedroom selling prices below that. Double click at the top to the right of the "Price" column heading to create a new column and name it something like "Type of Home". Down that column type something like "four-bedroom" repeatedly down that column in all the rows that have the prices for four bedroom homes. Then type something like "three-bedroom" repeatedly down the column in the rows that have three bedroom prices. You may want to type the phrase once and then copy and paste it down the rest of the relevant rows to ensure there are no typos. Once you have done that, double-click the "Type of Home" column heading and confirm that the Data Type is Character and the Modeling Type is Nominal and click OK.

Select Analyze, then Fit Y By X. Highlight the numeric column "Price" and click the Y, Response button. Highlight the character column "Type of Home" and click the X, Factor button. Click OK.

You should now see a graph with two vertical arrays of dots showing the prices of three and four bedroom homes separately. Click the red triangle above the graph and select "Display Options" and select Box Plots to see side-by-side boxplots. That will enable you to get a feel for the symmetry or skewness of the distributions to help you decide if use of t is acceptable. Even if use of t is not acceptable, you are going to use it anyway. Click the red triangle again and select "Means and Std Dev" to get a summary of the means and standard deviations of the two samples. Click the red triangle again and select "t-Test" to get the output for a hypothesis test and confidence interval assuming unequal variances. Click the red triangle again and select "Means/Anova/Pooled t" to get the output that includes a hypothesis test and confidence interval assuming equal variances. Click the red triangle again and select "Set α level" to have the outputs change the confidence intervals to your desired level of confidence. For example, if you want 98% confidence intervals you would set alpha to be .02, or, if you want 90% confidence intervals, you would set alpha to be .10. In other words, α = 1 - C.

By the way, I have no idea what they are getting at in part (e), so your guess is as good as mine. I think they mean they are not simple random samples (SRS) since the data is strictly from one place instead of all over the country. You can take it from there.

Do the JMP in Question 7 just like I showed you what to do in Question 3 above. Your first column should have all the SSHA scores and your second column will be a character column where you type in male and female in the appropriate rows. Always make the numeric column Y and the character column X when you select Fit Y By X.

Note, you must save your document as a PDF file to upload it into Web Assign (no other format will be accepted). If you don't know how to do this for the software you are using try the help files or Google "save as pdf file" for some helpful steps or programs that enable you to save documents that way for free. MS Word 2007 is capable of saving as pdf. If you are using a different program and do not have a "save as pdf" option, Google "pdf995 download" for a free program that can be used to save documents in a pdf format. Note also that if you have had to download a program to create pdf files, most of these programs create a "pdf printer" in your printers list. In order to save the file as a pdf file, you would actually behave as though you were printing the file, but when you choose print, select your pdf printer and it will actually creatge a pdf file.