Free Statistics Help Book
An Interactive Multimedia introductory-level statistics book.
The book features interactive demos, simulations and case studies.
Chapter
Section
Estimation :

# Bias and Variability Simulation

Questions to be answered before the simulation are not yet implemented in this test version.

Begin by answering the questions, even if you have to guess. The first time you answer the questions you will not be told whether you are correct or not.

General Instructions

This simulation lets you explore various aspects of sampling distributions. When it begins, a histogram of a normal distribution is displayed at the topic of the screen.

The distribution portrayed at the top of the screen is the population from which samples are taken. The mean of the distribution is indicated by a small blue line and the median is indicated by a small purple line. Since the mean and median are the same, the two lines overlap. The red line extends from the mean one standard deviation in each direction. Note the correspondence between the colors used on the histogram and the statistics displayed to the left of the histogram.

The second histogram displays the sample data. This histogram is initially blank. The third and fourth histograms show the distribution of statistics computed from the sample data. The number of samples (replications) that the third and fourth histograms are based on is indicated by the label “Reps=.”

Basic operations

The simulation is set to initially sample five numbers from the population, compute the mean of the five numbers, and plot the mean. Click the “Animated sample” button and you will see the five numbers appear in the histogram. The mean of the five numbers will be computed and the mean will be plotted in the third histogram. Do this several times to see the distribution of means begin to be formed. Once you see how this works, you can speed things up by taking 5, 1,000, or 10,000 samples at a time.

Choosing a statistic

The following statistics can be computed from the samples by choosing form the pop-up menu:

1. Mean: Mean
2. sd: Standard deviation of the sample (N is used in the denominator)
3. Variance: Variance of the sample (N is used in the denominator)
4. variance (U): Unbiased estimate of variance (N-1 is used in denominator)
5. MAD: Mean absolute value of the deviation from the mean
6. Range: Range

Selecting a sample size

The size of each sample can be set to 2, 5, 10, 16, 20 or 25 from the pop-up menu. Be sure not to confuse sample size with number of samples.

Comparison to a normal distribution

By clicking the “Fit normal” button you can see a normal distribution superimposed over the simulated sampling distribution.

Changing the population distribution

You can change the population by clicking on the top histogram with the mouse and dragging.

Step By Step Instructions

1. First refresh your memory about the meaning of the sampling distribution of the mean by clicking the “animate” button. You will see five scores sampled from the population at the top and float down to the sample data graph. The mean of these five scores is shown in blue and drops down to the graph below which is where the distribution of means is shown. Do this a few more times and then click the 10,000 button several times to see the distribution of means for a large number of samples. Various statistics are shown to the left of the distribution. Compare the mean of the distribution of means to the population mean of 16. It should be very close, but may not match exactly because the simulated distribution of means is just an approximation of the sampling distribution.

2. Test to see if the mean the distribution of means for N=10 is equal to the population mean (keep in mind, since this is an approximation, it may not be exactly equal). If it is equal then the mean is an unbiased estimate.

3. Change the parent population to the “skewed distribution” and see if the mean of the distribution of means is equal to the population mean of 8.08. If so, the mean is unbiased for this skewed distribution.

4. Choose the median as the statistic. Try some simulations with the normal distribution. If the sample median is an unbiased estimate of the population median, then the mean of the distribution of the median will equal the median of the population.

5. Try some simulations with the skewed distribution and check with the median is a biased estimate.

6. Choose the variance as the statistic. Estimate the distribution of the variance and not its shape.

7. Set the sample size to 5 and use the normal population. Estimate the sampling distribution of the variance with the simulation. Note whether the mean is equal to the population variance of 25.

8. Choose the mean for one graph and the median for the other. Make the sample size the same for both graphs and estimate the sampling distributions. Compare the spreads of the distributions. The standard deviation of the distributions are the standard errors of the mean and median respectively. Which statistic has the smaller standard error? Try this with different sample sizes. For sample size 25, note the ratio of the standard error of the median to the standard error of the mean.

9. Compare the standard errors for the mean and the median with the skewed distribution. Try several sample sizes.

10. Try different shapes of distributions and not the relative sizes of the standard errors of the mean and median.

Summary

The mean is an unbiased estimate of the population mean regardless of the shape of the population distribution. The median is an unbiased estimate for normal distributions but not for skewed distributions. When the formula with N in the denominator is used to estimate variance, the mean of the sampling distribution is less than the population variance and is therefore a biased estimate. The mean has less sample variability than the median for most distributions including those with relatively large skew. However, there are some extreme distributions for which the median has less sample variability.