• Base of statistics
• Central limit theorem
• Testing of hypothesis
• Time series analysis
• Decision theory
• Probability and distribution
• Normal distribution
• Design of experiment
• Correlation and regression analysis
• Statistical quality control
• Index number
• Vital statistics

# Variable and its type

Example:. The paragraphs below contain blanks which need to be filled in with an appropriate word. Write the correct word next to each number in the list which follows the paragraphs.

An important characteristic of the mean is that it is the balance point of a distribution, that is, it is the point around which all the deviations sum to zero.  An advantage of the median is that it is insensitive to extreme scores.  The mode is simply the most frequently occurring score.

A Sample size is the number of subjects or cases.  Sometimes you are given a percentage rather than the number of cases.  If you have a percentage, you find the number of cases by multiplying the percentage time’s  population size.

The summarized results, or descriptive measures for a population are called parameter , while these same types of measures for a sample are called statistic.

Research can either be applied or  Fundamental(basic).  In the first type treatments are given and the subjects’ response is noted.  In the second type, we simply observe to determine the status of what exists.

Based on a theory or a hunch, a scientist must develop research question and then plan for data collection, analysis, and interpretation to observe in order to answer the questions.

Example: What Is a Variable?

A variable is something that can be changed, such as a characteristic or value. Variables are generally used in psychology experiments to determine if changes to one thing result in changes to another.

Example. Name, define, and give an example of each of the four levels of measurement.

Ans: There are four types of level of measurement.

• Nominal
• Ordinal
• Interval
• Ratio

In nominal measurement the numerical values just “name” the attribute uniquely. No ordering of the cases is implied. For example, jersey numbers in basketball are measures at the nominal level. A player with number 30 is not more of anything than a player with number 15, and is certainly not twice whatever number 15 is.

In ordinal measurement the attributes can be rank-ordered. Here, distances between attributes do not have any meaning. For example, on a survey you might code Educational Attainment as 0=less than H.S.; 1=some H.S.; 2=H.S. degree; 3=some college; 4=college degree; 5=post college. In this measure, higher numbers mean more education. But distance from 0 to 1 is not same as 3 to 4. The interval between values is not interpretable in an ordinal measure.

In interval measurement the distance between attributes does have meaning. For example, when we measure temperature (in Fahrenheit), the distance from 30-40 is same as distance from 70-80. The interval between values is interpretable. Because of this, it makes sense to compute an average of an interval variable, where it doesn’t make sense to do so for ordinal scales. But note that in interval measurement ratios don’t make any sense – 80 degrees is not twice as hot as 40 degrees (although the attribute value is twice as large).

In ratio measurement there is always an absolute zero that is meaningful. This means that you can construct a meaningful fraction (or ratio) with a ratio variable. Weight is a ratio variable. In applied social research most “count” variables are ratio, for example, the number of clients in past six months. Because you can have zero clients.

Example: Describe, in your own words, the difference between a positively skewed, and a negatively skewed curve.

•           When the variable is skewed to the left (i.e., negatively skewed), the mean shifts to the left the most, the median shifts to the left the second most, and the mode the least affected by the presence of skew in the data.

•           Therefore, when the data are negatively skewed, this happens:

Mean < median < mode.

(Negative skewed)

•           When the variable is skewed to the right (i.e., positively skewed), the mean is shifted to the right the most, the median is shifted to the right the second most, and the mode the least affected.

•           Therefore, when the data are positively skewed, this happens:

Mean > median > mode.

Positive skewed

Example. Briefly describe how you would go about deciding which of the measures of central tendency the appropriate choice for your data is.

Ans:

The mean is ordinarily the preferred measure of central tendency. The mean is the arithmetic average of a distribution. The mean presented along with the variance and the standard deviation is the “best” measure of central tendency for continuous data.

There are some situations in which the mean is not the “best” measure of central tendency. In certain situations, the median is the preferred measure. These situations are as follows:

when you know that a distribution is skewed

when you believe that a distribution might be skewed

when you have a small number of subjects

The purpose for reporting the median in these situations is to combat the effect of outliers. Outliers affect the distribution because they are extreme scores. For example, in a distribution of people’s income, a person who has an income of over a million dollars would dramatically increase the mean income whereas in reality, most of the people in the distribution do not make that kind of money. In this case, the median is the preferred measure of central tendency.

The mode is rarely chosen as the preferred measure of central tendency. The mode is not usually used because the largest frequency of scores might not be at the center. The only situation in which the mode may be preferred over the other two measures of central tendency is when describing discrete categorical data. The mode is preferred in this situation because the greatest frequency of responses is important for describing categorical data.

 Type of Variable Best measure of central tendency Nominal Mode Ordinal Median Interval/Ratio (not skewed) Mean Interval/Ratio (skewed) Median

Example.  Write a paragraph that explains independent and dependent variables.

Independent variable:

The independent variable is the variable that is manipulated by the researcher.  The independent variable is something that is hypothesized to influence the dependent variable.  The researcher determines for the participant what level or condition of the independent variable that the participant in the experiment receives.  For example, each participant in the experiment may be randomly assigned to either an experimental condition or the control condition.

Dependent variable:

The dependent variable is the variable that is simply measured by the researcher.  It is the variable that reflects the influence of the independent variable.  For example, the dependent variable would be the variable that is influenced by being randomly assigned to either an experimental condition or a control condition.

Example of independent variable and dependent variable:

Consider that we wish to know whether listening to music would increase productivity in the workplace.  We randomly assign each participant in this experiment to either an experimental condition or a control condition.  In the experimental condition, participants listen to music while they work.  In the control condition, the participants do not listen to music while they work.  In this example, listening to music vs. not listening to music is the independent variable.  The dependent variable in this example is productivity.

Example:  The following is a collection of raw data.  Find the mean, median and mode of the data. Use the data below for number of pets of 24 individuals:

1          2          2          1          1          0          2          3          2          1          3          0

1          1          2          0          1          0          1          6          3          4          3          1

a. Find the mean of the data.

b. Find the median of the data.

c. Find the mode of the data.

Make a frequency distribution for the data above.

 Statistics VAR00001 N Valid 24 Missing 0 Mean 1.7083 Median 1.0000 Mode 1.00

Frequency distribution:

 VAR00001 Frequency Percent Valid Percent Cumulative Percent Valid .00 4 16.7 16.7 16.7 1.00 9 37.5 37.5 54.2 2.00 5 20.8 20.8 75.0 3.00 4 16.7 16.7 91.7 4.00 1 4.2 4.2 95.8 6.00 1 4.2 4.2 100.0 Total 24 100.0 100.0

Example Use the table of hypothetical data below to answer the questions in this section.

Teacher records were reviewed for students who planned to study health sciences at a well known college of health sciences.

 AnticipatedProgram of study Per cent of students who have failed to be accepted into the program Per cent of students who have repeated classes before acceptance Per cent of students who have repeated classes within the program Per cent of students who dropped out of the programs NursingN = 225 25==56.25=56 8==18 10 20==45 RadiographyN = 90 9 5==4.5=4 15 15==13.5=13 SonographyN = 75 5 2==1.5=1 5 12==9 Nuclear medicineN = 55 10 10==5.5=5 6 10==5.5=5 Occupational Therapy AssistantN = 40 5==2 10==4 5==2 15==6

485                                                                                                            78

1 Why would the information be important to the school?

This information is important to school because on the basis of this data, one can decide about the course selection, course drop down and predict the result.

2. What would you like to know about students were chosen for this study?

We would like to know the interest of the student in the program and the information about remaining student.

3. How many OTA students had to repeat classes before being accepted? 4

4. How many students who planned on going into nursing were not accepted? 56

5. What was the total number of students involved in the study? 485

6.  What percentage of all students did not get accepted into their chosen program?  (Be careful with this one) 54 %

7.  What appears to be the biggest area of concern for sonography program? Support your answer.

Percentage of students who repeated class before acceptance is considered to be biggest area of concern for sonography program because the resultant value of the data is very law.

8.  What percentage of students who applied for OTA was accepted? 30%

9.  How many students in the college had to retake programs before they were accepted into their chosen program? 32

10.  How many students dropped out of their programs?  What percentage of students is this?   78 and 16.08== 16%

Example:

The Florida congressional representatives are deliberating a pay raise for themselves and attempting to decide which measure of central tendency would best serve their purposes for an article in the Pensacola News Journal newspaper. Discuss each of the three measures of central tendency and provide a substantial rationale for a specific central tendency measure for the legislature to use for their cases.

The three measures of central tendency are the mean, median and the mode. A mean is adding up the numbers in a data set and then dividing by the number of numbers in that data set. A median is the number that is in the middle or the mean of two numbers in an even numbered data set, when all of the numbers are placed in numerical order in a data set. The mode is the number that occurs most often in a data set. When reporting a mean, median or mode one must also report the range (Bagwell, 2011). The range is the difference between the largest and smallest number.

The congressional representative should report the mode and the range as this would be a true representation of what most of them make. The median and the mean would be skewed upward since the representatives would have a base pay range.

Example:

A researcher has obtained information concerning socioeconomic (SES) background of 50 subjects in a survey categorized as low, medium, or high SES. What type of graph is the most appropriate for the researcher to use to display SES information in a descriptive report Discuss why you selected the type of graph and its benefits over other types of graphs.

A graph or graphic is a visual representation that should be used to increase comprehension of written or verbal content. There are seven main types of graphs: a flow chart, organizational chart, cosmography, pie chart, pictograph, bar graph and line graph (typesorgraphs.com). The most appropriate type of graph the researchers could use would be a pie chart. In the pie chart each subject would be equal to 2% of the chart and each SES background could be represented by a different color and then labeled. A pie chart would work nicely in this situation because we are representing three parts of a whole, that whole being 100% of the subjects. A line graph shows changes over time and would not be appropriate. A bar graph would be an accurate model but would not present the material as a whole but three different entities. The other types of graphs and charts previously mentioned would not be used for reporting data sets.

Example

Create an instrument with at least 10 items containing the four levels of data. The instrument should have at least 3 demographic items and 7 attitudinal items. Identify the level of data used for each item and discuss each of the four types thoroughly.

The four levels of data are nominal, ordinal, interval and ratio (Bagwell, 2011). Nominal data is data that is qualitative and cannot be manipulated by math. With Nominal Data, a number may replace a person’s answer. Ordinal Data is data that places thing in order without even intervals. Interval data is data that is quantitative with set intervals. Ratio Data is similar to interval data but has a true zero (Bagwell, 2011).

 Economics Course sub # age sex purp exp T_1 T_2 mid T_3 fin grd 1 2 3 4 5 6 7 8 9 10

Subject # – random number, 1-10, assigned to subject

Demographic Data

age – The age of the subject (Nominal)

sex – the gender of the subject; 1 – female, 2 – male (Nominal)

purp – Purpose for taking class; 1 – degree requirement, 2 personal growth (Nominal)

Attitudinal Data

Exp – You had a positive experience? 1 – disagree, 2 – slightly disagree, 3 – neither, 4 –

agree, 5 – strongly agree (Bagwell, 2011) (Ordinal)

T_1 – test number 1 (Interval/Ratio)

T_2 – test number 2 (Interval/Ratio)

mid – score of midterm (Interval/Ratio)

T_3 – test number 3 (Interval/Ratio)

fin – Final Exam (Interval/Ratio)

Example

Individuals with “exceptional” IQ’s make up only a small proportion of the normal curve (the left and right tails of the normal curve). Discuss the word “exceptional” used in this context and its relationship to probability and specific properties of the Normal Distribution.

If one were to type in “exceptional” into dictionary.com, they would read the definition as “forming an exception or rare instance; unusual; extraordinary” which is truly the case statistically speaking about the exterior quartiles. When we are talking about the left and right tails of the normal distribution we are talking about the “outliers”, the ones that do not fall within the “normal” middle two quartiles. Within a normal distribution, 95% fall within the middle two quartiles, 2.28% lie at either end within their own quartile and are therefore considered exceptional. An example of this would be a student with an exceptional IQ.

Example:

A CEO of City Bank has just reported to her stockholders that the average (mean) revenue for the bank for 2005 is 3.5 million dollars with a standard deviation of .1 million dollars. She is dismayed that a competitor bank only two blocks away (State Bank) has also reported average revenue for 2005 of 3.5 million dollars with a standard deviation of 1.1 million dollars. The City Bank CEO concludes that there is no difference in the two bank’s revenues for 2005. Is she correct? Why? Discuss thoroughly with appropriate rationale.

After much thought I can conclude that there is a slight difference in the two banks revenues. First I tried to refute this by coming up with a t-ratio but a mean minus the same mean is zero which won’t work. Thinking about it some more I realized that a standard deviation is only telling me how far away a number is from the middle. Although the reported revenue for each bank is the same, the standard deviation is slightly different. City Bank’s standard deviation is 1 million, whereas State Bank’s standard deviation is 1.1 million. The 1 million standard deviation tells me that more people were “closer” to the middle of the 3.5 million revenue. The 1.1 million dollar revenue tells me that people were dispersed further out from the middle. In other words, if I could create a bell curve, the bank with a standard deviation would have a steeper curve than that of the bank with a standard deviation of 1.1.

Example

Pediatric dentists say a child’s first dental exam should occur between ages 6 months and 1 year. The ages at first dental exam for a sample of children are shown in the distribution.

X                                 f

1                                  9

2                                  11

3                                  23

4                                  16

5                                  21

• Find the mean age of first dental exam for these children.

Find the Median:

Here, no of observations are 5 (odd) So,

M   =   (n+1)/2 th observation in the data  (frequency)

=       3rd observation in data

=       16 (while we arrange in order)

• Find the standard deviation:

=10.6570/80  =0.133213

Example

Compare the two groups in terms of measures central tendency, variability, skewness, and kurtosis.  What do these mesures tell you about the scores in the two groups?

 Statistics Cooperation skills training Non Cooperation skills training N Valid 15 15 Missing 0 0 Mean 106.80 93.60 Std. Error of Mean 2.109 1.869 Median 105.00 96.00 Mode 108 98 Std. Deviation 8.170 7.239 Variance 66.743 52.400 Skewness .706 -.572 Std. Error of Skewness .580 .580 Kurtosis -.015 -1.290 Std. Error of Kurtosis 1.121 1.121 Range 28 20 Minimum 96 82 Maximum 124 102 Sum 1602 1404 Percentiles 25 101.00 85.00 50 105.00 96.00 75 112.00 100.00

Here for the first variable, Z < 3M – 2    so, which is not a symmetric but it is negatively skewed distribution.

And for the second distribution, mode is the maximum and mean is the smallest so; it is also not symmetric but negatively skewed distribution.

Example:

Calculate the standard deviation for the following sample data using all methods: 2, 4, 8, 6, 10, and 12.
Solution:
Method-I: Actual Mean Method

Method-II: Taking assumed mean as  6

Method:III Using assume mean as Zero

Method-IV: By taking 2 as the common divisor

Example:
Calculate standard deviation from the following distribution of marks by using all the methods.

Solution:
Method-I: Actual Mean Method

Method-II: Taking assumed mean as 2

Method-III: Using assumed mean as Zero

Method-IV: By taking  2 as the common divisor

Example:   A corporation recruiting business graduates was particularly interested in hiring numerate graduates (graduates with quantitative skills). To check on the numeracy of applicants, a test of fifty questions was developed. In a pilot study, this test was administered to a sample of ten recent business graduates, resulting in the following numbers of correct answers:

 42 29 21 37 40 33 38 26 39 47

(a)   Find the sample mean number of correct answers.

(b)   Find the median for this sample.

(a)   Mean = 35.2

(b)   Median = 37.5.

(a)   Find the sample variance and standard deviation.

(b)   Find the mean absolute deviation.

(c)    Find the inter-quartile range.

(a)   Variance = 62.6; standard deviation = 7.9.

(c)Inter-quartileRange= 9.8.

Example:. Consider the following four populations:

(a)   1, 2, 3, 4, 5, 6, 7, 8

(b)   1, 1, 1, 1, 8, 8, 8, 8

(c)    1, 1, 4, 4, 5, 5, 8, 8

(d)  -6, -3, 0, 3, 6, 9, 12, 15

All of these populations have the same mean. Without doing the calculations, arrange the populations according to the magnitudes of their variances, from smallest to largest. Then check your intuition by calculating the four population variances.

Solution:

 Population Variance a 5.25 c 6.25 b 12.25 d 47.25

Example. The accompanying table shows test scores of the forty students in a class. Construct an appropriate histogram to summarize these data.

 54 56 56 59 60 62 62 66 67 68 68 70 70 73 73 73 75 77 78 79 79 81 81 82 83 83 85 86 86 88 89 89 90 90 91 93 93 94 95 98

Here is the histogram, using bins of width 10:

Example. The accompanying table shows percentage changes in the Consumer Price Index in theUnited Statesover a period of ten years. Draw a time plot of these data and verbally interpret the resulting picture.

 YEAR 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 % CHG. C.P.I. 3.8 3.9 3.8 1.1 4.4 4.4 4.6 6.1 3.1 2.9

Here is the time plot; it suggests that the consumer price index rose at a fairly steady rate during this period, usually between 2% and 5%.

Example. The table below gives information about the 25 largestU.S.common stock mutual funds. The first column is the decrease in the dollar value of one share on November 13, 1989. The second column is the percentage return during 1989 before November 13. Draw a scatter plot illustrating this information and discuss its features.

 Fund Loss 11/13 Gain before 11/13 1 4.7 38.0 2 4.4 24.0 3 3.3 13.3 4 3.0 19.9 5 4.1 36.8 6 4.7 24.5 7 5.0 29.6 8 3.6 28.0 9 4.9 24.6 10 6.0 31.2 11 4.0 21.5 12 3.3 19.4 13 4.7 30.8 14 5.2 32.3 15 5.8 50.9 16 4.7 30.8 17 3.8 25.6 18 4.4 32.9 19 4.2 24.7 20 4.9 30.7 21 3.0 20.3 22 6.4 39.5 23 5.4 30.3 24 3.3 18.7 25 3.8 20.3

Here is the scatter plot. Evidently the funds that enjoyed large gains during the first part of the year tended to suffer large losses on November 13.

Example. Collect data on any business or economic phenomenon of interest to you. Provide a graphical summary that gives a clear and accurate picture of these data. Now produce a misleading graph.

In this case, I use data from my B6014 teaching evaluations atColumbiaBusinessSchool. Here are two charts, both of which suggest that my evaluations have improved as I gained experience. In one of them the vertical axis is misrepresented to exaggerate the rate of improvement.

Example. Explain what can be learned about a population from each of the following measures.

The mean can be viewed as a point of balance in a data set. Like a fulcrum, it balances the values on one side of the mean with the values on the other side, taking into account the distance of each value from the mean.

(a)The median is similar to the mean, except that it does not take into account the distance of each value from the mean. The median is less influenced by outliers than the mean.

(b)The standard deviation is a measure of dispersion, which, like the mean, is heavily influenced by outliers. This is because the standard deviation formula involves squaring the difference between each value and the mean.

(c) The inter-quartile range is also a measure of dispersion; its relation to the standard deviation is somewhat analogous to the relation of the median to the mean. The inter-quartile range is less susceptible to influence by outliers than the standard deviation.

(d)   9. If the standard deviation of a population is zero, what can you say about the members of that population?

(e)   We can infer that all values in the population are equal; there is no variation in the population.

Example. Shown below are percentages returns of the ten largestU.S.general stock mutual funds over a one-year period, ending September 17, 1993.

 27.9 11.6 17.6 26.6 15.6 12.4 22.4 18.5 22.9 25

 a Mean 20.05 b Median 20.45 c Variance 30.08 d St. Dev. 5.48 e Range 16.3 f Inter-quartileRange 8.375

Example. The following data are the book values (in dollars, i.e., net worth divided by number of outstanding shares) for a random sample of 50 stocks from the New York Stock Exchange:

 7 9 8 6 12 6 9 15 9 16 8 5 14 8 7 6 10 8 11 4 10 6 16 5 10 12 7 10 15 7 10 8 8 10 18 8 10 11 7 10 7 8 15 23 13 9 8 9 9 13

(a)               On the basis of these data, are the book values on the New York Stock Exchange likely to be high or low? Explain.

One way to get a quick idea of a distribution is to look at a histogram:

We see that the distribution is skewed right, meaning that most observations are clustered at the lower end of the distribution. In this sense, we can say that book values are likely to be low.

(b)               Are you more likely to find a stock with a book value below \$10 or above \$20? Explain.

Notice that 28 of the 50 stocks in our sample have values below \$10, whereas only 1 out of 50 has a value above \$20. Based on this sample, we conclude that it is much more likely to find a stock with a book value below \$10 than to find one above \$20.

Example. The following data represent the annual family premium rates (in thousands of dollars) charged by 36 randomly selected HMOs throughout theUnited States:

 3.8 4.1 4.7 5.2 2.8 5.6 4.9 6.7 9.2 4.9 4.9 4.9 5.2 5.9 5.2 4.8 4.8 9.1 4.6 8 4.9 4.2 4.1 5.3 5.5 8 7.2 7.2 4.1 4.5 8 4.4 4.2 4.6 4.2 4.8

(a)               Does there appear to be a concentration of premium rates in the center of the distribution?

Once again, a histogram is a useful tool:

There is a concentration, but not really in the center — the \$4,000 to \$5,000 range seems to be the most common.

(b)               Your friend Kathy Rae said that her family has been considering whether or not to join an HMO. Based on your findings in parts (a) and (b), what would you tell her?

She could pay anywhere between \$2,800 and \$9,200, but she will most likely pay \$4,000 to \$5,000.

Example. The following data represent the number of cases of salad dressing purchased per week by a local supermarket chain over a period of 30 weeks:

 Cases Cases Cases Week Purchased Week Purchased Week Purchased 1 81 11 86 21 91 2 61 12 133 22 99 3 77 13 91 23 89 4 71 14 111 24 96 5 69 15 86 25 108 6 81 16 84 26 86 7 66 17 131 27 84 8 111 18 71 28 76 9 56 19 118 29 83 10 81 20 88 30 76

(a)               Construct the frequency distribution and the percentage distribution.

Not surprisingly, the two charts look very similar.

(b)               On the basis of the results of (b), does there appear to be any concentration of the number of cases of salad dressing ordered by the supermarket chain around specific values?Yes, it looks like order quantities are concentrated between 81 and 100 cases.

(c)                If you had to make a prediction of the number of cases of salad dressing that would be ordered next week, how many cases would you predict? Why?

Just looking at the charts, you would probably guess than the next order would be about 90 cases.

Example. The following data represent the amount of soft drink filled in a sample of 50 consecutive 2-liter bottles, The results, listed horizontally in the order of being filled, were:

 2.109 2.086 2.066 2.075 2.065 2.057 2.052 2.044 2.036 2.038 2.031 2.029 2.025 2.029 2.023 2.02 2.015 2.014 2.013 2.014 2.012 2.012 2.012 2.01 2.005 2.003 1.999 1.996 1.997 1.992 1.994 1.986 1.984 1.981 1.973 1.975 1.971 1.969 1.966 1.967 1.963 1.957 1.951 1.951 1.947 1.941 1.941 1.938 1.908 1.894

(a)               Construct the frequency distribution and the percentage distribution.

(The percentage distribution looks the same, except for the Y axis.)

(b)               On the basis of the results of (a), does there appear to be any concentration of the amount of soft drink filled in the bottles around specific values?

Yes, the amount of soft drink seems to be concentrated around 2.00 liters.

(c)                If you had to make a prediction of the amount of soft drink filled in the next bottle, what would you predict? Why?From looking at the histogram, you would probably expect to see somewhere between 1.95 and 2.05 liters. However, you would get a completely different impression from a time-series chart:

The time-series chart seems to indicate that most of the variability in the process is the result of the mean drifting downward over time. We’d expect the next value to be between 1.85 and 1.90.

Example

. The following data represent the number of daily calls received at a toll-free telephone number of a large European airline over a period of 30 consecutive nonholiday workdays (Monday to Friday):

 Day No. of Calls Day No. of Calls Day No. of Calls Day No. of Calls 1 3,060 9 3,235 17 2,685 25 3,252 2 3,370 10 3,174 18 3,618 26 3,161 3 3,087 11 3,603 19 3,369 27 3,186 4 3,135 12 3,256 20 3,353 28 3,347 5 3,805 13 3,075 21 3,277 29 3,275 6 3,234 14 3,187 22 3,066 30 3,129 7 3,105 15 3,060 23 3,341 8 3,168 16 3,004 24 3,181

(a)               Form the frequency distribution and percentage distribution.

(b)               Form the cumulative percentage distribution.

Example:

(a)               Compute the mean, median, and mode.(b)               Compute the range, interquartile range, variance, standard deviation, and coefficient of variation.Using Excel functions, you can get the following output:

 mean 6.00 median 7.00 mode #N/A range 7.00 interquartile range 4.00 variance 8.50 standard deviation 2.92 coefficient of variation 48.6%

Notes:

• There is no mode — no value appears more often than any other value.
• The interquartile range was done using the Excel QUARTILE function; you will get a different answer depending on which quartile method you use.
• For the variance and standard deviation, we have a small sample, and therefore use (n – 1) in the denominator. In Excel, this means using VAR (not VARP) and STDEV (not STDEVP).

(c)                Describe the shape.There isn’t much to say (there are only five data!), but we can conclude from the fact that the median is greater than the mean that the distribution is skewed to the left.