Stat 1000: Tips for Assignment 1 (do these questions whether you have to hand them in or not!)

Published: Wed, 09/20/17

Try a Free Sample of Grant's Book and Audio Lectures
Don't have my book or audio?  You can download a free sample of my book and audio lectures containing all of Lesson 1:
Did you read my tips on how to study and learn this course?  If not, here is a link to those important suggestions:
Tips for Distance Assignment 1
Here is a link to the actual assignment, for those of you who don't have it:
Study Lesson 1: Displaying and Summarizing Data in my book, if you have it, to prepare for this assignment.

Of course, always seek out assistance from my book, your course notes, etc. if you ever hit a question you don't understand, but try not to be learning things as you do an assignment.  Learn first, then put your learning to the test.
Questions 1 to 3
This is a standard question about classifying variables, similar to my Lesson 1, #1.
Question 4
Remember: You always put your variable on the horizontal axis of a histogram, and you put the Frequency (Count) or Relative Frequency on the vertical axis.
Question 5
Be careful.  You are asked for the percentage of fares.  Count how many fares fit the condition and divide by n, the total sample size.  What is n here? How do we find it?
Questions 6 to 8
This should be pretty straightforward.
Question 9
Look at my Lesson 1, #19 and #20 for some logic puzzle examples about mean.
Question 10
This is an example of weighted mean. Unfortunately, I do not show examples of this in my book.  However, their Unit 1 Practice Questions 18-24 illustrate the weighted mean formula in action. This particular question has thrown in a twist because it is making you work backwards.
  1. Multiply the midterm score by its weight (.25), the assignment score by its weight (.15), and the project score by its weight (.10). You now know the total mark the student has earned so far.
  2. So, what weighted mark must the student get on the final to bring their total up to 75 (a B)?
  3. Remember, the final is worth 50%, the weighted mark needed is like getting a mark out of 50 on the exam.
Question 11
Visualize what the histogram would look like.  Remember: Medians are resistant to skewness and outliers.  Means are pulled away from the median in the direction of skew or outliers.
Question 12
I show you how to find medians, quartiles, interquartile range, etc. for the first time in Lesson 1, #4.
Question 13
Remember, if you find the total of the second column (the frequency or count column) in a frequency table, that will tell you n, the sample size.

You do know the sample size, n, (the total count in the Frequency column), so you can use the steps I teach in Lesson 1 to find the location of any quartile.  Then just make a running total of the counts in the intervals.  How much data is in the first interval? (The count or frequency as given in the second column.)  Now add the count in the second interval (for example if there are 3 scores in the first interval, and 7 scores in the second interval, that means there are 3+7=10 scores in total in the first two intervals.  Those must be the 10 lowest scores in the data set.  Continue adding the frequencies in each interval until you reach or exceed the count you are looking for that marks the location of the first, second or third quartile as desired.

Never forget that the first column is your variable.  That is what you are analyzing.  The median is a duration, the first and third quartiles are durations.  You can't possibly know the exact duration, because you haven't been shown the actual data set.  But you can tell which interval any particular score must have been in.  For example, we know the 5 lowest scores have a 0-25 duration because that interval is frequency 5.  Then the next 13 lowest scores have a 25-50 duration.  Etc.  We can also say the highest score has a 250-275 duration (because that interval has a frequency of 1; there is only one score in that interval).  The second highest score has a duration of 225-250.  The next 3 highest scores have a 200-225 duration.  Etc.
Questions 14 to 16
This should be pretty straightforward.  Note when they say "The distribution of American cars is most variable," they are saying it varies the most.  It has the greatest spread.  The greater the spread, the greater the variety of scores, the greater the variability.  Is that a true statement here?
Question 17
Make sure you compute the five-number summary by hand, as I demonstrate in Lesson 1, #4 to prepare for this question.  Now, they want you to use the 1.5 IQR Rule to identify the outliers. 
Question 18
Be clear what this question is asking. It is not asking for the limits you would compute from your 1.5 IQR Rule. It is asking where the whiskers will be drawn. The whiskers in a modified or outlier boxplot are drawn to the lowest and highest data scores that are NOT outliers.
Question 19
I show you how to compute the standard deviation by hand in my Lesson 1, #6.  But, you don't have to do that here.  You are welcome to just use your Stat Mode on your calculator to compute s as I show in Appendix A of my book.  Never compute standard deviation by the formula!  It takes too long, and is to messy.  Understand the formula, but nobody cares how you compute it.  It is multiple choice on the exam!
Question 20
This is a toughy.  It is really testing your understanding of standard deviation.  There is no calculation required here.  Remember, standard deviation is a measure of spread and indicates whether data tends to be close to the centre or far away from the centre.

Note that both of those Data Sets are symmetric.  That guarantees that the mean (and median) must be smack dab in the centre of each histogram.  So both data sets have the exact same mean.  How much does each bar deviate from the mean?  If a bar is close to the centre, the data in that bar will have small deviations from the mean.  If a bar is far away from the centre (in either direction), the data in that bar will have large deviations from the mean.  The larger the deviations, the larger the squared deviations, too.  Which data set will have the largest total of squared deviations.  That, data set would have the larger variance, and therefore also have the larger standard deviation.

Essentially, if a data set has a large standard deviation, that would suggest most of the data tends to far away from the centre.  If a data set has a small standard deviation, that would suggest most of the data would tend to be close to the centre.  Which data set has more data close to the centre?  Which data set has more data far away from the centre?
Question 21
Another logic puzzle.  Note that the two scores they are changing are the Minimum and the Maximum scores.  How does that affect the order of all the scores?  How does that affect the spread of all the scores?  How does that affect the total of all the scores?