An oil painting of the Great Falls of Snake River, with dry desert cliffs and desert shrubs surrounding a raging river with the tops of waterfalls being exposed and heavy cloud cover.

Evidence-Based Approaches to Public Health: Biostatistics – Inferential Statistics: Confidence Intervals

In this tutorial, we will explore the concept of confidence intervals, a fundamental tool in inferential statistics. Confidence intervals are used to estimate the range within which a population parameter (e.g., a mean or proportion) is likely to lie, based on sample data. Understanding confidence intervals is essential for making informed decisions in public health research and is also a key topic for the Certified in Public Health (CPH) exam.

By the end of this tutorial, you will understand what a confidence interval is, how to calculate it, and how to interpret it. Practice questions are provided to help reinforce your understanding.

Introduction to Confidence Intervals
What is a Confidence Interval?
- Definition of Confidence Intervals
- Confidence Level (e.g., 95%, 99%)
- Interpretation of Confidence Intervals
How to Calculate Confidence Intervals
- Confidence Interval for a Population Mean
- Confidence Interval for a Population Proportion
Practice Questions
Conclusion

1. Introduction to Confidence Intervals

In inferential statistics, researchers often use data from a sample to make inferences about a population. A confidence interval (CI) is a range of values, derived from sample data, that is likely to contain the true population parameter (such as a mean or proportion). Unlike a single point estimate (e.g., a sample mean), a confidence interval provides a range of plausible values for the population parameter, along with an associated level of confidence.

For example, instead of estimating that the average blood pressure in a population is exactly 120 mmHg, a confidence interval might state that the true average is between 118 and 122 mmHg, with 95% confidence.

2. What is a Confidence Interval?

2.1 Definition of Confidence Intervals

A confidence interval is a range of values around a sample estimate (such as a sample mean) that is likely to contain the true population parameter. It is typically expressed with a confidence level, such as 95%, which indicates how confident we are that the interval contains the true parameter. A 95% confidence interval means that if we were to take 100 different samples and calculate a confidence interval for each, we would expect about 95 of them to contain the true population parameter.

2.2 Confidence Level (e.g., 95%, 99%)

The confidence level reflects the degree of certainty that the confidence interval contains the true population parameter. Common confidence levels are 90%, 95%, and 99%, with 95% being the most frequently used in public health research. A higher confidence level means a wider confidence interval, providing greater assurance that the interval contains the true value, but at the cost of precision.

95% Confidence Interval: This means that there is a 95% chance that the interval contains the true population parameter.
99% Confidence Interval: This provides more confidence but results in a wider interval.

2.3 Interpretation of Confidence Intervals

Interpreting a confidence interval involves understanding what it tells you about the population parameter:

Example: A 95% confidence interval for the mean systolic blood pressure in a population is 118 to 122 mmHg. This means that we are 95% confident that the true mean blood pressure in the population lies between 118 and 122 mmHg.
It is important to note that the confidence interval does not guarantee that the true parameter is within the interval for any given sample. Instead, it reflects the likelihood that the interval, calculated from repeated samples, will capture the true parameter.

3. How to Calculate Confidence Intervals

The calculation of a confidence interval depends on the type of data and whether the population standard deviation is known or not. In most public health applications, we calculate confidence intervals for the population mean or proportion based on sample data.

3.1 Confidence Interval for a Population Mean

The formula for calculating a confidence interval for a population mean when the sample standard deviation (s) is known or estimated is:

[math] \text{CI} = \overline{x} \pm Z \left( \frac{s}{\sqrt{n}} \right) [/math]

Where:

CI: The confidence interval.
x̄: The sample mean.
Z: The Z-score corresponding to the desired confidence level (e.g., Z = 1.96 for 95% confidence).
s: The sample standard deviation.
n: The sample size.

Steps to Calculate the Confidence Interval for a Mean:

Step 1: Calculate the sample mean (x̄).
Step 2: Determine the standard error of the mean (SE = s / √n).
Step 3: Multiply the Z-score (based on the desired confidence level) by the standard error.
Step 4: Add and subtract this value from the sample mean to get the confidence interval.

3.2 Confidence Interval for a Population Proportion

The formula for calculating a confidence interval for a population proportion is:

[math] \text{CI} = p \pm Z \left( \sqrt{\frac{p(1 – p)}{n}} \right) [/math]

Where:

p: The sample proportion (e.g., the percentage of individuals with a particular characteristic).
Z: The Z-score corresponding to the desired confidence level.
n: The sample size.

Steps to Calculate the Confidence Interval for a Proportion:

Step 1: Calculate the sample proportion (p).
Step 2: Calculate the standard error of the proportion (SE = √[p(1 – p) / n]).
Step 3: Multiply the Z-score by the standard error.
Step 4: Add and subtract this value from the sample proportion to get the confidence interval.

4. Practice Questions

Test your understanding of confidence intervals with these practice questions. Try answering them before checking the solutions.

Question 1:

A study reports that the mean cholesterol level in a sample of 100 adults is 200 mg/dL, with a standard deviation of 15 mg/dL. Calculate the 95% confidence interval for the mean cholesterol level in the population.

Answer 1:

Answer, click to reveal

Step 1: Sample mean (x̄) = 200 mg/dL

Step 2: Standard error (SE) = s / √n = 15 / √100 = 1.5

Step 3: Z-score for 95% confidence = 1.96

Step 4: Margin of error = 1.96 * 1.5 = 2.94

Step 5: Confidence interval = 200 ± 2.94 = [197.06, 202.94]

Question 2:

A survey finds that 40% of respondents support a new public health policy, with a sample size of 500. Calculate the 95% confidence interval for the population proportion.

Answer 2:

Answer, click to reveal

Step 1: Sample proportion (p) = 0.40

Step 2: Standard error (SE) = √[p(1 – p) / n] = √[0.40(0.60) / 500] = 0.0219

Step 3: Z-score for 95% confidence = 1.96

Step 4: Margin of error = 1.96 * 0.0219 = 0.0429

Step 5: Confidence interval = 0.40 ± 0.0429 = [0.3571, 0.4429]

5. Conclusion

Confidence intervals are an essential tool in inferential statistics that provide a range of plausible values for a population parameter, based on sample data. They offer more information than a single point estimate and allow researchers to make informed conclusions about the population.

Remember:

A confidence interval gives a range of values within which we are confident that the true population parameter lies.
The confidence level indicates the degree of certainty (e.g., 95% confidence) that the interval contains the true parameter.
The width of the confidence interval depends on the sample size, variability, and the chosen confidence level.

Final Tip for the CPH Exam:

Ensure you understand how to calculate and interpret confidence intervals for both means and proportions, and keep in mind that the standard error and margin of error are unique values, as there might be a question or two on that point as well.

Humanities Moment

The featured image of this article is Great Falls of Snake River, Idaho Territory (1876) by Thomas Moran (American, 1837-1926). Moran was a prominent American landscape painter of the Hudson River School and the Rocky Mountain School, renowned for his vivid depictions of the American West, notably Yellowstone National Park, which his artwork helped establish. Influenced significantly by British artist J.M.W. Turner, Moran excelled in multiple mediums including oils, watercolors, and chromolithography. His celebrated works such as The Grand Canyon of the Yellowstone (1872) solidified his reputation, earning him national acclaim and securing his legacy as one of America’s premier landscape artists.

Epidemiology, Broadly Speaking