Stat 1000: Tips for Assignment 1

Published: Thu, 01/14/16

Try a Free Sample of Grant's Audio Lectures
(on sale for only $30 until Jan. 31!)
Don't have my book or audio?  You can download a free sample of my book and audio lectures containing Lesson 1:
Did you read my tips on how to study and learn Stat 1000?  If not, here is a link to those important suggestions:
Did you read my Calculator Tips?  If not, here is a link to those important suggestions:
Tips for Assignment 1
Study Lesson 1 in my study book (see my free sample above if you don't have it) to learn the concepts involved in Assignment 1.  Remember my advice in the tips above.  Don't start working on the assignment too soon.  Study and learn the lesson first, and use the assignment to test your knowledge.  Of course, always seek out assistance from my book, your course notes, etc. if you ever hit a question you don't understand, but try not to be learning things as you do an assignment.  Learn first, then put your learning to the test.

To type in formulas you are using and to show your numbers subbed into the formulas you can click the Equation Editor button in the toolbar that looks like the Sigma Summation symbol (you have to click the "..." other options button to see the sigma formula input button.  Then click the various buttons to make your fractions and enter the symbols.  However, the Equation Editor is extremely slow and clunky.  Personally, I would never use it.  Just type ordinary text explaining what you are doing if you think you should show some work.

Exception: Always do any JMP stuff open-book.  Have my tips in front of you, and let me guide you step-by-step through any JMP stuff.  JMP is just "busy" work.  The sooner you get it done and can move on to productive things like understanding the concepts and interpreting the JMP outputs, the better off you will be.  Then again, since you never have to upload the JMP printouts, perhaps you might not even bother to do the JMP at all.  Most questions can be answered by hand even when they told you to use JMP.
Questions 1 to 6
This is a standard question about classifying variables, similar to my question 1 in Lesson 1.
Question 7
Remember, if you find the total of the second column (the frequency or count column) in a frequency table, that will tell you n, the sample size.

This deals with some aspects of quantitative distributions. 

Part (a)
They want a decimal, not a percent.  For example, if you figured out that 20 out of 30 are in the given interval, then 20 divide by 30 is 0.6667, not 66.6667%.  The proportion is 0.6667.  Make sure you round off correctly!  They want four decimal places, so if the fifth decimal place is 5 or more, round up.

A relative frequency or proportion is the relevant count divided by n, the total sample size.  Throughout this course, always leave your answer in decimal form, do not change it into a percent unless they specifically request a percent.

Part (b)
Remember that a frequency table is a precursor to a histogram.  Visualize the histogram (don't actually make a histogram, just picture it in your mind) to help answer the question about shape. 

Part (c)
You cannot actually compute the median, mean or quartiles because you do not have the actual data.  You don't need to.  As I discussed in Lesson 1, the shape of the distribution is enough to know if the mean is larger, smaller or the same as the median. 

Part (d)
You do know the sample size, n, (the total count in the Frequency column), so you can use the steps I teach in Lesson 1 to find the location of any quartile.  Then just make a running total of the counts in the intervals.  How much data is in the first interval? (The count or frequency as given in the second column.)  Now add the count in the second interval (for example, if there are 3 scores in the first interval, and 7 scores in the second interval, that means there are 3+7=10 scores in total in the first two intervals.  Those must be the 10 lowest scores in the data set.  Continue adding the frequencies in each interval until you reach or exceed the count you are looking for that marks the location of the first, second or third quartile as desired.
Question 8
Part (a)
They do not want you to attach the JMP output.  But don't forget to discuss and compare the shape, centre and spread, using whichever measures of shape, centre and spread are easily identified from the boxplots.

If you do not have JMP, or don't want to go through the effort of doing it in JMP, you can just as easily do this question by hand.  Just do the five-number summaries for each data set and make your own side-by-side boxplots.

To make the side-by-side boxplots in JMP:

Open a "New Data Table" in JMP.

You will make two columns, but not the way you might think. DO NOT put Consumer in one column and Technology in another!

Double-click Column 1 (or right-click and select Column Info) and name it Return.  Type all 13 scores from the Consumer first, then enter the 17 scores from the Technology data, giving you a total of 30 rows in the first column. 

Double-click the region to the right of Column 1 at the top to create Column 2 and name that column Sector.  Type Consumer in the first 13 rows of that column (better yet, type it once, then copy and paste it into the next 12 rows; that way you ensure it is typed exactly the same in all 13 rows as is necessary).  Then type (or copy and paste) Technology in the remaining rows of column 2.

If you have done this correctly, you should now have two columns.  The first column shows the Return of all 30 stocks you were given.  The second column shows the Sector each of those 30 stocks came from (Consumer or Technology).

Make sure the column properties are correct!  Right-click at the top of each column and select Column Info to check what it says for Data Type and Modeling Type

For Return, the Data Type should be Numeric and the Modeling Type should be Continuous.  If it is not, click the drop-down lists to change them.

For Sector, the Data Type should be Character and the Modeling Type should be Nominal.  If it is not, click the drop-down lists to change them.

To make the side-by-side boxplots:
Select "Analyze" then "Fit Y By X".  Highlight
Return and click "Y, Response".  Highlight Sector and click "X, Factor".  Click OK.  This should open a pop-up window with a bunch of dots arranged vertically in two columns on a graph for Consumer and Technology

If  that does not happen, you do not have the correct Data Type and Modeling Type for your data! Follow my instructions above to fix your column properties.

Now click the red triangle next to Oneway Analysis ... and select Quantiles.  Your side-by-side box plots should appear on the graph as well as a Quantiles output below that shows you the five-number summary among other things.  Click the red triangle again and select "Display Options" (down near the bottom of the menu), then deselect Grand Mean to get rid of the horizontal line in the graph showing the mean of all the scores (although, you are not asked to remove the Grand Mean line, so you don't have to if you don't want to). 

Parts (b) and (c)
Think about why an investor would buy a stock.  What if they were a risk-taker and wanted to make lots of money (even at the risk of losing a lot)?  What if they were risk-averse and want to have a reasonable return but were especially concerned about not losing money?
Question 9
Part (a)
Do not use JMP for the stemplot in part (a)You can just type the stemplot directly into the text box they provide.  Note that you are told to trim the leaves.  That means that you cut away the last digit (don't round off, just cut it off as though it was never there in the first place).  For example, 539 would be trimmed to 53, not rounded off to 54.

I suggest you make the split stemplot on paper first, then transfer it to the box.  Use the vertical line on your computer keyboard to separate the stem from the leaves ("SHIFT \" will give you " | ").  Don't worry if your columns don't end up perfectly lined up, just do the best you can.  Be sure to label the first line in your stemplot "Stem | Leaf", then enter all the stems and leaves row-by-row underneath.  It will be pretty difficult to make the data line up nicely when you type the stemplot into the textbox, but I don't think you should waste much time trying to make it look pretty.

Part (b)
Although you should have a pretty good idea what the shape is after you have made the split stemplot, wait until you have made your histogram below and use it to confirm your opinion.  Both graphs should tell you the same thing about the shape of the distribution (although one may do a better job of revealing it than the other).  Then again, maybe you won't even bother to make the histogram since you aren't uploading it anyway.

Part (c)
Even though you are not uploading the histogram, don't miss the question you have also been asked to discuss:

Would it be more appropriate to summarize this data distribution with the five-number summary or with the mean and standard deviation?  Explain.

Note that you already know the answer to this question from the split stemplot.  As I discuss in Lesson 1, this is really asking you is this distribution symmetric or skewed (or with outliers).

To make the histogram in part (c):
First, enter the data into JMP manually:

Use the ORIGINAL DATA you were given, not the trimmed data.  There is never any need to trim data for a histogram.

Click the "New Data Table" icon on the toolbar at top left in the JMP home screen (or select "File" in the toolbar, then New, then Data Table).  You are automatically taken to an empty spreadsheet with one column. Double-click "Column 1" and change its name to Energy, or right-click "Column 1" and select "Column Info" and type in the name Energy and click OK. 

Now enter the data you have been given into the column.  Note you can use your arrow buttons or TAB button to move from one cell to the next as you enter your data.


Once you have entered all the data down your column, you are ready to make your histogram.  In the toolbar at the top, select Analyze then select Distribution.  In the "Select Columns" part of the pop-up window, select Energy to highlight it, and click the Y, Columns button.  You should see the column name appear in the section to the right of the "Y, Columns" button.  Click OK.

It now opens yet another pop-up window called "Distributions" where your histogram should appear.  Your histogram is sideways.

If you would like to orient the histogram the traditional way (but they don't insist you do this), click the red triangle next to Energy above the histogram and select Display Options from the drop-down menu.  Select Horizontal Layout to turn the histrogram the way we want.
 

If you want to hide all the other parts of the output (but they said you don't have to), click that same red triangle again and deselect "Outlier Box Plot" and anything else that has a check mark next to it.  Click the red triangle again, select "Display Options" and deselect "Quantiles" and "Summary Statistics" to make those parts disappear.  Alternatively, you can make the Quantiles and Summary Statistics disappear if you simply click the gray triangles (to the left of the red triangles) next to their title bars.  Click the gray triangles again to make them reappear.

Part (d)
You may prefer to make the timeplot by hand (just be reasonably neat when setting up your scales).  Personally, I think you could draw a rough sketch of this timeplot much faster than doing it in JMP.  You don't have to upload the graph anyway, so why bother.  Don't forget to answer their question about the overall trend observed.

To make the Time Series in part (d):

First, enter the data into JMP manually: Click the "New Data Table" icon on the toolbar at top left in the JMP home screen (or select "File" in the toolbar, then New, then Data Table).  You are automatically taken to an empty spreadsheet with one column. Double-click "Column 1" and change its name to Monthly Energy, or right-click "Column 1" and select "Column Info" and type in the name Monthly Energy and click OK. 

Enter all of the energy scores down the Monthly Energy column (as given in the second row of the data table they provide).  Make sure that you are entering the data in the order you have been given.

DO NOT EVEN TRY TO ENTER THE DATES (like Jan 2011) in your data table!  That entire row of information is irrelevant to what we will do, and is an absolute pain to deal with.

If you have done things correctly, you will have a data table with just one column, Monthly Energy, and that column has all the given energy scores in order from Jan 2011 to Dec 2012 even though there has been absolutely no mention of those dates in your data table.

You are now ready to make the time series.  Select Analyze in the toolbar, then select Modeling in the drop-down list and finally select time series.  Select the variable you are tracking, Monthly Energy, and click "Y, Time Series".  Click OK.  Just ignore that other pop-up menu asking about time lags or autocorrelations or whatever, click OK and move on.  None of that has anything to do with the time series.

You should now be looking at your Time Series with "Row" on the horizontal axis and Monthly Energy on the vertical axis.  Click the red triangle next to "Time Series Price" and deselect "Autocorrelation" and "Partial Autocorrelation" to remove those parts of the output.  Click the red triangle again, select "Graph" then deselect "Mean Line".  That removes the horizontal line in your time series showing the mean points score.

Question 10
Part (a)
Make sure you compute the five-number summary by hand, as I demonstrate in Lesson 1, question 4.

Part (b)
They want you to make the standard boxplot, then comment on the shape you see.  However, don't even think about whether there are outliers or not!  They want you to assume there were no outliers at all (they don't even ask you to think about outliers until the next part).  In other words, do the whiskers make the distribution appear skewed?

Part (c)
Now, they want you to use the 1.5 IQR Rule to identify the cut-offs for outliers. 

Part (d)
Now, count your outliers according to the limits found in part (c).

Part (e)
Now, they want you to make the outlier boxplot which will almost certainly cause you to change your opinion about skewness.  They are trying to show you how it is important to identify outliers first before you comment on the shape of a distribution.
Question 11
Parts (a) and (b)
This question should be done by hand (i.e. with your calculator, not with JMP).  Use the Stat Mode on your calculator to compute the Mean and Standard Deviation.  Don't you dare waste your time using the formulas to compute the mean and standard deviation.  That is what your Stat Mode on your calculator is for!

Check the Appendix at the back of my book to learn how to use the Stat Mode on your calculator.  Here is a link to a digital copy of that appendix:

Make sure you round the answers off to 4 decimal places before proceeding to answer the other parts of the question.  Always use four decimal places throughout this course unless specifically instructed to do otherwise.

Part (c)
Consider this:  Let's say you are taking a course, and your average mark so far is 65.  What will happen to your average if you score higher on the next test?  What if you score lower on the next test?  What would you have to get on the next test to keep your average 65?

Therefore, knowing what the mean was in part (a), what must the length of the 8th call be?

Part (d)
There is no need to compute the standard deviation of these 8 scores!  Having decided what that new score must be in part (c), how much does that score deviate from the mean? If that is a larger deviation than the standard deviation you computed earlier, you have increased the overall standard deviation; if it is the same amount of deviation as earlier, you have not changed the standard deviation at all; if it has a smaller deviation, you have decreased your overall standard deviation. 

The closer a value is to the mean, the smaller its deviation from the mean.  Small deviations cause low standard deviations; large deviations cause high standard deviations.
Question 12
First, you need to know the scores attached to each letter grade.  An A+ is 4.5, A is 4, B+ is 3.5, B is 3, etc.

To compute your grade point average:
First, make a new column where you multiply each grade score by the number of credit hours.  For example, if you got a B+ in a 3 credit-hour course, you would multiply 3.5 by 3 to get 10.5 in this new column. Find the total of this new column and find the total number of credit hours.  Divide the total of the new column by the total number of credit hours to get the GPA.  Put another way, if you got a B+ in 3 credit-hour course, it is as though you scored 3.5 three separate times.  You could put your calculator in Stat Mode, and enter 3.5 in three separate times.  If you got an A in a 6 credit-hour course, you got 4.0 six times.  Enter 4.0 six separate times.  After you have entered all the data, your calculator will tell you the mean (your GPA).

An easy way to think of grade points is to consider the amount of credit hours as the frequency of that grade.  Gettting an A in a 3-credit hour course, is like getting an A 3 separate times.  Getting a C in a 6-credit hour course is like getting a C 6 separate times.  It is like finding the average of three A's and six C's.  The credit hours add weight to each score.