ICYMI Stat 1000: Tips for Assignment 1

Published: Tue, 09/29/15

Try a Free Sample of Grant's Audio Lectures
(on sale for only $30 until Sep. 30!)
Don't have my book or audio?  You can download a free sample of my book and audio lectures containing Lesson 1:
Did you read my tips on how to study and learn Stat 1000?  If not, here is a link to those important suggestions:
Did you read my Calculator Tips?  If not, here is a link to those important suggestions:
Tips for Assignment 1
Study Lesson 1 in my study book (see my free sample above if you don't have it) to learn the concepts involved in Assignment 1.  Remember my advice in the tips above.  Don't start working on the assignment too soon.  Study and learn the lesson first, and use the assignment to test your knowledge.  Of course, always seek out assistance from my book, your course notes, etc. if you ever hit a question you don't understand, but try not to be learning things as you do an assignment.  Learn first, then put your learning to the test.

To type in formulas you are using and to show your numbers subbed into the formulas click the button in the toolbar that looks like the Sigma Summation symbol (you have to click the "..." other options button to see the sigma formula input button.  Then click the various buttons to make your fractions and enter the symbols.

Exception: Always do any JMP stuff open-book.  Have my tips in front of you, and let me guide you step-by-step through any JMP stuff.  JMP is just "busy" work.  The sooner you get it done and can move on to productive things like understanding the concepts and interpreting the JMP outputs, the better off you will be.
Question 1
This is a standard question about classifying variables, similar to my question 1 in Lesson 1.
Question 2
Remember, if you find the total of the second column (the frequency or count column) in a frequency table, that will tell you n, the sample size.

This deals with some aspects of quantitative distributions. 

Part (a)
They want a decimal, not a percent.  For example, if you figured out that 20 out of 30 are in the given interval, then 20 divide by 30 is 0.6667, not 66.6667%.  The proportion is 0.6667.  Make sure you round off correctly!  They want four decimal places, so if the fifth decimal place is 5 or more, round up.

A relative frequency or proportion is the relevant count divided by n, the total sample size.  Throughout this course, always leave your answer in decimal form, do not change it into a percent unless they specifically request a percent.

Part (b)

Remember that a frequency table is a precursor to a histogram.  Visualize the histogram (don't actually make a histogram, just picture it in your mind) to help answer the question about shape. 

Part (c)

You cannot actually compute the median, mean or quartiles because you do not have the actual data.  You don't need to.  As I discussed in Lesson 1, the shape of the distribution is enough to know if the mean is larger, smaller or the same as the median. 

Part (d)

You do know the sample size, n, (the total count in the Frequency column), so you can use the steps I teach in Lesson 1 to find the location of any quartile.  Then just make a running total of the counts in the intervals.  How much data is in the first interval? (The count or frequency as given in the second column.)  Now add the count in the second interval (for example if there are 3 scores in the first interval, and 7 scores in the second interval, that means there are 3+7=10 scores in total in the first two intervals.  Those must be the 10 lowest scores in the data set.  Continue adding the frequencies in each interval until you reach or exceed the count you are looking for that marks the location of the first, second or third quartile as desired.
Question 3
Part (a)
They do not want you to attach the JMP output.  But don't forget to discuss and compare the shape, centre and spread, using whichever measures of shape, centre and spread are easily identified from the boxplots.

To make the side-by-side boxplots:

Open a "New Data Table" in JMP.

You will make two columns, but not the way you might think. DO NOT put CFL in one column and NFL in another!

Double-click Column 1 (or right-click and select Column Info) and name it Points Scored.  Type all 20 scores from the CFL first, then enter the 17 scores from the NFL data, giving you a total of 37 rows in the first column. 

Double-click the region to the right of Column 1 at the top to create Column 2 and name that column League.  Type CFL in the first 20 rows of that column (better yet, type it once, then copy and paste it into the next 19 rows; that way you ensure it is typed exactly the same in all 20 rows as is necessary).  Then type (or copy and paste) NFL in the remaining rows of column 2.

If you have done this correctly, you should now have two columns.  The first column shows the Points Scored of all 37 games you were given.  The second column shows the League each of those 37 games came from (CFL or NFL).

Make sure the column properties are correct!  Right-click at the top of each column and select Column Info to check what it says for Data Type and Modeling Type

For Points Scored, the Data Type should be Numeric and the Modeling Type should be Continuous.  If it is not, click the drop-down lists to change them.

For League, the Data Type should be Character and the Modeling Type should be Nominal.  If it is not, click the drop-down lists to change them.

To make the side-by-side boxplots:
Select "Analyze" then "Fit Y By X".  Highlight Points Scored and click "Y, Response".  Highlight League and click "X, Factor".  Click OK.  This should open a pop-up window with a bunch of dots arranged vertically in two columns on a graph for
CFL and NFL

If  that does not happen, you do not have the correct Data Type and Modeling Type for your data! Follow my instructions above to fix your column properties.

Now click the red triangle next to Oneway Analysis ... and select Quantiles.  Your side-by-side box plots should appear on the graph as well as a Quantiles output below that shows you the five-number summary among other things.  Click the red triangle again and select "Display Options" (down near the bottom of the menu), then deselect Grand Mean to get rid of the horizontal line in the graph showing the mean of all the scores (although, you are not asked to remove the Grand Mean line, so you don't have to if you don't want to). 

 Part (b)
This is just a matter of identifying the values of the scores depicted by dots above and below the whiskers of the boxplots for CFL and NFL.  Make sure you tell them the exact values that are outliers.
Question 4
Part (a)
Make sure you compute the five-number summary by hand, as I demonstrate in Lesson 1, question 4.

Part (b)
They want you to make the standard boxplot, then comment on the shape you see.  However, don't even think about whether there are outliers or not!  They want you to assume there were no outliers at all (they don't even ask you to think about outliers until the next part).  In other words, do the whiskers make the distribution appear skewed?

Part (c)
Now, they want you to use the 1.5 IQR Rule to identify the cut-offs for outliers. 

Part (d)
Now, count your outliers according to the limits found in part (c).

Part (e)
Now, they want you to make the outlier boxplot which will almost certainly cause you to change your opinion about skewness.  They are trying to show you how it is important to identify outliers first before you comment on the shape of a distribution.
Question 5
Parts (a) and (b)
This question should be done by hand (i.e. with your calculator, not with JMP).  Use the Stat Mode on your calculator to compute the Mean and Standard Deviation.  Don't you dare waste your time using the formulas to compute the mean and standard deviation.  That is what your Stat Mode on your calculator is for!

Check the Appendix at the back of my book to learn how to use the Stat Mode on your calculator.  Here is a link to a digital copy of that appendix:

Make sure you round the answers off to 4 decimal places before proceeding to answer the other parts of the question.  Always use four decimal places throughout this course unless specifically instructed to do otherwise.

Part (c)
Consider this:  Let's say you are taking a course, and your average mark so far is 65.  What will happen to your average if you score higher on the next test?  What if you score lower on the next test?  What would you have to get on the next test to keep your average 65?

Part (d)
There is no need to compute the standard deviation of these 8 scores!  Having decided what that new score must be in part (c), how much does that score deviate from the mean? If that is a larger deviation than the standard deviation you computed earlier, you have increased the overall standard deviation; if it is the same amount of deviation as earlier, you have not changed the standard deviation at all; if it has a smaller deviation, you have decreased your overall standard deviation. 

The closer a value is to the mean, the smaller its deviation from the mean.  Small deviations cause low standard deviations; large deviations cause high standard deviations.
Question 6
First, you need to know the scores attached to each letter grade.  An A+ is 4.5, A is 4, B+ is 3.5, B is 3, etc.

To compute your grade point average:
First, make a new column where you multiply each grade score by the number of credit hours.  For example, if you got a B+ in a 3 credit-hour course, you would multiply 3.5 by 3 to get 10.5 in this new column. Find the total of this new column and find the total number of credit hours.  Divide the total of the new column by the total number of credit hours to get the GPA.  Put another way, if you got a B+ in 3 credit-hour course, it is as though you scored 3.5 three separate times.  You could put your calculator in Stat Mode, and enter 3.5 in three separate times.  If you got an A in a 6 credit-hour course, you got 4.0 six times.  Enter 4.0 six separate times.  After you have entered all the data, your calculator will tell you the mean (your GPA).

An easy way to think of grade points is to consider the amount of credit hours as the frequency of that grade.  Gettting an A in a 3-credit hour course, is like getting an A 3 separate times.  Getting a C in a 6-credit hour course is like getting a C 6 separate times.  It is like finding the average of three A's and six C's.  The credit hours add weight to each score.