Stat 2000: Assignment 2 Tips (Classroom Lecture Sections)
Published: Mon, 02/18/13
My tips for Assignment 2 are coming below, but first a couple of announcements.
Please note that my first review seminar for
Stat 2000 will be on Feb. 24 (one week before the midterm exam). Unfortunately, that means the seminar is the weekend at the end of the week-long Midterm Break, but it is out of my hands. This seminar will cover the lessons in Volume 1 of my book.
For more info about the seminar, and to register if you have not already done so, please click this link:
I am also taking registrations for all my midterm exam
prep seminars (Calculus, Linear Algebra, and Statistics). Please click this link for more info and to register, if
you are interested:
Make sure you do: Tips on How to Do Well in Stat 2000
Did you read my Tips on what kind of calculator you should get?
Did you miss my Tips for Assignment 1?
Tips for Assignment 2 (Classroom Lecture Sections A01, A02, A03, etc.)
Don't have my book? You can download a free sample containing Lesson 3 at my website here:
You need to study Lesson 4: Comparing Two Means and Lesson 5: Oneway Analysis of Variance in my book (if you have it) to prepare for this assignment. Do
not attempt to do this assignment until you have studied ALL of these
lessons. Note that, if you are using an older edition of my book, you
may find that I teach Matched Pairs at the end of Lesson 2. Newer
editions teach Matched pairs in Lesson 4.
You must ask yourself whether question 1 is
a two-sample design or a matched pairs design and proceed accordingly.
Again, you must ask yourself that question again before answering questions 2 and 3.
The conditions that must be satisfied depend on whether you think it is
appropriate to use the Matched Pairs method, the Pooled Two-Sample
method, or the Generalized (or Conservative) Two-Sample method.
Note, if, when using JMP, you do not seem to be able to get
the graphs or options I describe in my steps, make sure you
double-click each column of data and confirm that it has the correct
Data Type and Modeling Type. If I do not specify, the Data Type should be Numeric and the Modeling Type should be Continuous.
To do the JMP work in questions 2 and 3, follow these steps. Note,
I do not want to tell you whether you should be using Matched Pairs or
Two-Sample Methods. You must figure that out for yourself, so I am
explaining how to do either on JMP. It is your responsibility to choose
the correct approach for each question. It is a vital skill to be able to distinguish Matched Pairs data from Two-Sample Data.
If you want to analyze Matched Pairs Data:
Let us assume you have matched pairs. Each pair has
an A score, and a B score. Open a "New Data Table", double-click Column 1
and name it appropriately (I will call it "A"). Double-click the
region to the right of Column 1 to create Column 2 and name it
appropriately (I will call it "B"). Enter all the A scores and B scores
in the columns. Select "Analyze" then "Matched Pairs".
Be sure to read the
entire problem to determine if they have specified the order they want
you to subtract (A - B or B - A). If not, you can, of course, do what
you like. If you want JMP to do A - B, in the Matched Pairs pop-up
menu, select B first, then click "Y, Paired Response", then select A and
click "Y, Paired Response". Thus, in the Y, Paired Response window,
you would see B listed above A. JMP always does Second - First, so
whichever is listed second in that window will be the front of the
subtraction. Click OK.
The output then gives you all you need. At the top,
you should see the title bar "Matched Pairs" then below it another title
bar called "Differences: ..." JMP is showing you in that Differences
bar if it is doing A - B or B - A. If it is not subtracting in the
order you wish, close the screen and redo the Matched Pairs command,
reversing the order you place the columns in the "Y, Paired Response"
section.
The
"t-Ratio" is your test statistic, and the three probabilities are the
three P-values for the two-tailed (Prob > |t|), upper-tailed (Prob > t), and lower-tailed
test (Prob < t).
If you want to analyze Two-Sample Data:
This is done very differently. Let us assume we have two
independent samples comparing the income of men and women. A set of
income scores for Men, and a set of income scores for Women. The key
thing to understand is that you will type all the scores down the first
column. Double-click Column 1 and give it a name that describes the
variable both scores are measuring. In my example, I would name the
column "Income". I would then type all the men's incomes down that
column and continue to type all the women's incomes below that. I would
now double-click the region at the top to the right of Column 1 to
create a new column. That column would be named whatever variable
distinguishes the two samples. In my example, I would name the column
"Sex". I would then type a word down that column that distinguishes the
two samples. I would type Men repeatedly down column 2 in all the rows
that have men's incomes in Column 1. Then I would type Women in the
rest of the rows that have women's incomes. I suggest you type your
first word, then copy and paste it in all the other relevant cells in
Column 2, then type your second word and copy and paste it to ensure
there are no typos. Thus, I would have two columns of data. The first
column shows all the numerical data scores (all the incomes) and the
second column labels the data in the first column telling me which group
the scores belong to (men or women).
Double-click Column 1 and confirm that its Data Type is Numeric and
its Modeling Type is Continuous, changing the settings if necessary.
Double-click Column 2 and confirm that its Data Type is Character and
its Modeling Type is Nominal, changing the settings if necessary.
Once you have decided who you are naming Sample 1 and Sample 2 (and
that means you will be subtracting in that order, "Sample 1" - "Sample
2") or they have told you which order they want, make sure JMP does it
the way you expect. Click Column 2 to highlight the entire column.
In the toolbar at
the top of the data table, select "Cols" (short for columns), then
select "Validation, List Check..." . It will show the labels you have
written for Column 2. The label
written first is what JMP will consider Sample 2. If you don't like
the order it shows, highlight the label and click "Move Up" or "Move
Down" to change the order of the labels. For example, if you want to do
A - B, it should show B above A in the List Check pop-up table. JMP always does everything ass-backwards!
Now select "Analyze, Fit Y By X". Select Column 1 and click "Y,
Response" and select Column 2 and click "X, Factor". Click OK. You
will see a graph with dots plotted representing all the scores in two
columns. Sample 1 should be the second column of dots, Sample 2 the
first column. If the order is reversed, go back to your data
spreadsheet and do the Column Validation List Check that I outlined in
the paragraph above and reverse the order the variable in column 2 is
listed.
If you don't see two columns of dots for the two samples, you have not labelled your data correctly! Close the screen and go back to your data table. Double-click Column 1 and make sure that its Data Type is Numeric and
its Modeling Type is Continuous, changing the settings if necessary.
Double-click Column 2 and confirm that its Data Type is Character and
its Modeling Type is Nominal, changing the settings if necessary.
Once you have the graph showing the vertical array of dots for your two samples, you are ready to analyze the data.
Click the red triangle and select "Means and Std Dev" to get a
summary of the means and standard deviations. Click the red triangle and select "Display Options" and select "Box Plots" if they have requested side-by-side boxplots. Click the red triangle
and select "Means/Anova/Pooled t" to get JMP to do the pooled two-sample
t test. Click the red triangle again and select "t-Test" to get JMP to
do the generalized two-sample t test (not pooling). You will note that,
when JMP does the pooled t-test, it says, under the "t Test" title bar,
"Assuming equal variances." When JMP does the generalized t-test, it
says, under the "t Test title bar, "Assuming unequal variances."
JMP gives 95%
confidence intervals for the difference in the two means by default. If
you want a different level of confidence, click the red
triangle again and select "Set α level" to have the outputs change the
confidence intervals to your desired level of confidence. For example,
if you want 98% confidence intervals you would set alpha to be .02, or,
if you want 90% confidence intervals, you would set alpha to be .10.
In
other words, α = 1 - C. Finally, if they want to see box plots, click
the red triangle and select "Display Options" and select Box Plots, or
whatever else they request. You can also deselect anything in the
Display Options they don't want to see. Never do this unless they
specifically request it though.
Question 1
This question is standard stuff if you have
studied the lesson. In part (e), where they ask you for the bounds in
the P-value, I don't know if it matters which bound you put in which
box. I suggest you put the lower bound first (in decimal form). For
example, if you find the P-value is between .20 and .10, put .10 in the
first box and .20 in the second box. Definitely do not put 10% and 20%.
Question 2
I do recommend you answer this question by hand,
but, since you have to use JMP anyway to get the exact P-value, make
sure that your calculated answers match JMP's values for the test
statistic, confidence interval, etc. before you submit your answers.
Question 3
Again, I recommend you answer this question by hand,
but, since you have to use JMP anyway to get the exact P-value, make
sure that your calculated answers match JMP's values for the test
statistic, confidence interval, etc. before you submit your answers. Note that there is no real work to do in question 3(f). I teach the relationship between t and F in Lesson 5 of my book.
Question 4
I teach you how to interpret confidence
intervals in Lesson 1 and how to interpret P-values in Lesson 2. Be
careful though, those were interpretations for confidence intervals or
P-values for the mean. Now you are interpreting for the difference between two means, so be careful in your wording.
You should study Lesson 5 in my book before attempting questions 5 and 6.
Question 5
This is an Anova question. It is very similar to my question 1 in Lesson 5. In part (d), when they ask for the values of ni, all they mean is tell them what n1 equals, n2 equals, and n3 equals.
I don't know how practical it is to type all of the
answers into the text box they provide. I recommend that you answer all
of the questions in a Word document (or similar word processor), then
paste in the JMP output into this document, and save it as a PDF file
ready to upload into the HTML editor. Or you could save the JMP output
as a separate PDF file rather than paste it into your Word document, and
then upload the Word file (saved as a PDF) and the JMP file (saved as a
PDF) as to separate links in the HTML editor.
In part (e), you will have to show your
calculations for the overall mean, MSG, MSE etc. like I do in my
question 1. You can use the Equation Editor in Word, if you want to
make it look pretty, or just type all the numbers in say things like
this, "SSG = 5(16-30) + 4(12-30) + ..." (I am making up those numbers.)
Then MSG = SSG/DFG = 51/2 = etc.
To summarize the results in your ANOVA table, you can
use the Insert Table feature of Word, or just use the Tab key to
separate the columns in the table if you want. Or, copy and paste the
ANOVA table JMP makes for you (of course, JMP better have come up with
the same numbers you did by hand).
To use JMP to do Anova:
Follow the same steps you do for two-sample data above.
The key thing to understand is that you will type all the scores
down the first column. Double-click Column 1 and give it a name that
describes the variable all the scores are measuring. For example, in question 5,
I would call the column "Return". I would then type all the numbers
for the rates of return for Financial, continue down the column typing in all
the Energy and Utilities rates of return as well. I would now double-click
the region at the top to the right of
Column 1 to create a new column. That column would be named whatever
variable distinguishes the three samples. For example, in question 5, I would name the
column "Industry". I would then type a word down that column that
distinguishes the three samples. I would type "Financial" repeatedly down column 2
in all the rows that have Financial rates of return in Column 1. Then I would type
"Energy" for the appropriate cells in column 2 and type "Utilities" for the rest of the rows. I suggest you
type your first word, then copy and paste it in all the other relevant
cells in Column 2, then type your second word and copy and paste it, and your third word to
ensure there are no typos. Thus, I would have two columns of data. The
first column shows all the numerical data scores (all the rates of return) and
the second column labels the data in the first column telling me which
group the scores belong to (Financial, Energy, or Utilities).
Double-click
Column 1 and confirm that its Data Type is Numeric and its Modeling
Type is Continuous, changing the settings if necessary. Double-click
Column 2 and confirm that its Data Type is Character and its Modeling
Type is Nominal, changing the settings if necessary.
Now select "Analyze, Fit Y By X". Select Column 1
and click "Y, Response" and select Column 2 and click "X, Factor".
Click OK. You will see a graph with dots plotted representing all the
scores in two columns. Financial should be the first column of dots; Energy, the second column; Utilities, the third column. If the order
is silly, go back to your
data spreadsheet and do the Column Validation List Check that I
outlined
in the paragraph above for two-sample data. If you don't see this
graph at all, you did not label your columns properly. Go back and make
sure Column 1 is Numeric and Continuous and Column 2 is Character and
Nominal.
Click the red
triangle and select "Means and Std Dev" to get a summary of the means
and standard deviations. Click the red triangle and select
"Means/Anova/Pooled t" to get JMP to do the Anova. Click the red
triangle and select "Display Options" and select the "Box Plot" to get the side-by-side boxplots they
request.
You will need to copy and paste this JMP output into
your Word document where you are answering all of the other questions.
Here is how to do
that:
Now, return to your Word document (or whatever program you use for word processing). Right-click and select Paste to paste your
output into the document wherever you want it to appear (next to part (h) perhaps, or perhaps next to part (c)).
Click the thin blue near the top of the JMP scatterplot screen, or press
"Alt" on your keyboard, to reveal a toolbar with a series of icons. If
you point your mouse at the icons, you should see, looking at the icons
left to right, the first icon is for a "New Data Table," the second
icon is for "New Script," the third is to "Open" a file, etc. Click the
icon that looks
like a fat white cross or plus sign "+". This is your "Selection"
tool. Your mouse cursor should now have changed from an arrow to that
white cross. Click the title bar that says "Oneway Analysis of ..." at the top
of the output and that should select the entire output (graphs, Summary of Fit, etc.). Right-click and select Copy.
After you have answered all of the questions in your Word document, save it as a PDF file and upload it into the HTML editor.
To upload your file into the text box they provide: Click "HTML
editor" below the text box to make a toolbar appear in the text box.
Click the toolbar option called "Link" and select "Website/Uploaded
File." In the pop-up window that appears, click the button called
"Find/Upload File" (it is at the bottom of the pop-up window, you may
have to enlarge the box or scroll down to see it). Click the "Browse"
button and find the histogram file you just saved. Either double-click
that file or select it and click "Open" and you should see the path to
that file appear in the Browse box. Click "Upload File" and its name
should appear in the "Uploaded Files" pop-up window. Select the file in
the list of "Uploaded Files" to highlight it and click OK and you
should see a link to the file appear in the text box.
Question 6 is very similar to my
questions 3 and 4 in Lesson 5. Also make sure you read my section about
Confidence Intervals in Anova in question 5 of that lesson before you answer 6(g). Again, I recommend you put the lower of the two bounds for your P-value in the first box for 6(c) and put the upper bound in the second box.