Module 10: Inference for Means
Estimating a Population Mean (1 of 3)
Estimating a Population Mean (1 of 3)
Learning outcomes
- Construct a confidence interval to estimate a population mean when conditions are met. Interpret the confidence interval in context.
- Interpret the meaning of a confidence level associated with a confidence interval.
In “Estimating a Population Mean,” we focus on how to use a sample mean to estimate a population mean. This is the type of thinking we did in Modules 7 and 8 when we used a sample proportion to estimate a population proportion. Let’s take a moment to review what we learned in the modules Linking Probability to Statistical Inference and Inference for One Proportion, and then we’ll see how it relates to the current module.
- In Linking Probability to Statistical Inference, we noted that random samples vary, so we expect to see variability in sample proportions. In the section “Distribution of Sample Means” in that module, we made the same observations about sample means. In both cases, a normal model is a good fit for the sampling distribution when appropriate conditions are met.
- We also noted in that module that a sample proportion is an estimate for the population proportion. We do not expect the sample proportion to equal the population proportion, so there is some error. The error is due to random chance. Likewise, a sample mean is an estimate for the population mean, but there will be some error due to random chance.
Comment
Recall that, in Inference for One Proportion, we adjusted the standard error by replacing p with the sample proportion. Doing so made sense because the goal of the confidence interval is to estimate p. So the margin of error in the confidence interval formula changed. Here is the adjusted formula.
[latex]\hat{p}\pm 2\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}[/latex]
This adjustment changed the normality conditions. We use this adjusted confidence interval to estimate p when the successes and failures in the actual sample are at least 10.
We will eventually have to adjust the standard error for the sampling distribution of sample means, too. It makes sense because in many situations we will not know the population standard deviation, σ. This adjustment is more complicated than the adjustment to standard error for sample proportions, so before we do it, let’s practice finding the confidence interval for µ assuming we know σ.
Assuming we know σ is realistic when a lot of previous research has been done. For example, when we are estimating height, weight, or scores on a standardized test, previous research gives us reliable values for σ.
Example
Estimating Mean SAT Math Score
The SAT is the most widely used college admission exam. (Most community colleges do not require students to take this exam.) The mean SAT math score varies by state and by year, so the value of µ depends on the state and the year. But let’s assume that the shape and spread of the distribution of individual SAT math scores in each state is the same each year. More specifically, assume that individual SAT math scores consistently have a normal distribution with a standard deviation of 100.
An educational researcher wants to estimate the mean SAT math score (μ) for his state this year. The researcher chooses a random sample of 650 exams in his state. The average score is 475 (so [latex]\bar{x}[/latex]= 475). Estimate the mean SAT math score in this state for this year.
We answer this question by computing and interpreting a confidence interval.
Checking conditions:
From our work in “Distribution of Sample Means,” we know that a normal model is a good fit for the distribution of sample means from random samples if one of two conditions is met:
- The population of individual values is normal (in which case the sample size is not important).
- If we do not know if the population of individual values is normal, then we must have a large sample size (more than 30).
Because we assume that the distribution of individual SAT math scores is normal in this example, a normal model is also a good fit for the distribution of sample means. Even if the population distribution had not been normal, the sample size is large enough that the normal distribution would still apply to the sample means. So we can use the confidence interval formula given above.
Finding the margin of error:
Keep in mind that the sample mean, [latex]\bar{x}[/latex], is only a single-value estimate for the population mean, μ. Because it comes from a random sample, we expect there to be some error in the estimate. But how much error should we expect?
We know that the sample distribution of means is approximately normal because conditions are met. Recall that in a normal model, 95% of the values fall within 2 standard deviations of the mean, so we use 2 standard errors for our margin of error. This was part of the empirical rule from the module Probability and Probability Distribution.
standard error is [latex]\sigma/\sqrt{n}= 100/\sqrt{650}\approx 3.9[/latex]
margin of error is 2(3.9) [latex]\approx[/latex] 7.8
Finding the confidence interval:
We are 95% confident that [latex]\bar{x}[/latex] falls within 7.8 points of μ. This also means that we are 95% confident that μ falls within 7.8 points of [latex]\bar{x}[/latex]. So we construct a 95% confidence interval from this sample mean by adding and subtracting 7.8 points. The 95% confidence interval is shown.
[latex]\bar{x}\pm[/latex] margin of error
[latex]\bar{x}\pm 2[/latex] (standard error)
475±7.8
(467.2,484.8)
Conclusion:
We are 95% confident that the mean SAT math score in this state this year is between 467.2 and 484.8. Recall from our previous work that being 95% confident means this method, in the long run, captures the true population mean (μ) about 95% of the time.
Summary
If we want to estimate µ, a population mean, we want to calculate a confidence interval. The 95% confidence interval is:
[latex]\bar{x}\pm 2\frac{\sigma}{\sqrt{n}}[/latex]
We can use this formula only if a normal model is a good fit for the sampling distribution of sample means. If the sample size is large (n > 30), we can use a normal model. If the sample size is not greater than 30, then we can use a normal model only if the variable is normally distributed in the population. As always, we must have a random sample. If the sample is not random, we cannot use it to estimate µ.
We say we are 95% confident that this interval contains µ, which means that in the long run, 95% of these confidence intervals contain µ.
Try It
Constructing a Confidence Interval for Pregnancy Length
Is smoking during pregnancy associated with premature births? To investigate this question, researchers selected a random sample of 114 pregnant women who were smokers. The average pregnancy length for this sample of smokers was 260 days. From a large body of research, it is known that length of human pregnancy has a standard deviation of 16 days. The researchers assume that smoking does not affect the variability in pregnancy length.
Comment
In our work with confidence intervals for estimating a population mean, µ, we require the population standard deviation, σ, to be known. In practice, σ usually is unknown. However, in some situations, especially when a lot of research has been done on the quantitative variable whose mean we are estimating (such as IQ, height, weight, scores on standardized tests), it is reasonable to assume that σ is known. On the next page, we learn how to proceed when σ is unknown.
- Concepts in Statistics. Provided by: Open Learning Initiative. Located at: http://oli.cmu.edu. License: CC BY: Attribution