
Evidence-Based Approaches to Public Health: Biostatistics – Probability: Probability Distributions (Normal, Binomial, Poisson)
In this tutorial, we will explore three key probability distributions that are commonly used in public health research: the normal distribution, the binomial distribution, and the Poisson distribution. Each distribution describes the behavior of data in different situations, and understanding them is essential for making accurate statistical inferences. This knowledge is critical for public health professionals and a key part of the Certified in Public Health (CPH) exam.
By the end of this tutorial, you will understand what each probability distribution represents, how to calculate probabilities using these distributions, and when to use each distribution. Practice questions are provided to reinforce your understanding.
Table of Contents:
- Introduction to Probability Distributions
- The Normal Distribution
- Definition of the Normal Distribution
- Properties of the Normal Distribution
- Using Z-Scores with the Normal Distribution
- The Binomial Distribution
- Definition of the Binomial Distribution
- Properties of the Binomial Distribution
- Formula for Binomial Probability
- The Poisson Distribution
- Definition of the Poisson Distribution
- Properties of the Poisson Distribution
- Formula for Poisson Probability
- Practice Questions
- Conclusion
1. Introduction to Probability Distributions
A probability distribution describes how the values of a random variable are distributed. It shows the likelihood of different outcomes or events occurring. In public health research, different types of probability distributions are used to model different types of data, such as the number of new disease cases in a given time period or the distribution of health measures like blood pressure.
The three most commonly used distributions in public health are the normal, binomial, and Poisson distributions.
2. The Normal Distribution
The normal distribution (also called the Gaussian distribution) is the most widely used probability distribution in statistics. It describes data that are symmetrically distributed around the mean, with most values clustering around the center and fewer values occurring as you move away from the mean. The normal distribution is often used to model continuous data such as height, weight, or cholesterol levels.
2.1 Definition of the Normal Distribution
A normal distribution is a continuous probability distribution that is symmetric around its mean. The probability density function (PDF) of a normal distribution is bell-shaped and defined by two parameters: the mean (μ) and the standard deviation (σ).
The formula for the normal distribution is:
[math] f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(x – \mu)^2}{2\sigma^2}} [/math]
Where:
- x: The variable of interest.
- μ: The mean of the distribution.
- σ: The standard deviation of the distribution.
- e: Euler’s number (approximately 2.718).
2.2 Properties of the Normal Distribution
- The mean, median, and mode of a normal distribution are all equal.
- The distribution is symmetric around the mean.
- Approximately 68% of the data lie within one standard deviation of the mean, 95% within two standard deviations, and 99.7% within three standard deviations (this is known as the empirical rule).
2.3 Using Z-Scores with the Normal Distribution
Z-scores are used to standardize values in a normal distribution, allowing comparisons between different datasets or variables. A Z-score represents the number of standard deviations a value is from the mean.
The formula for calculating a Z-score is:
[math] Z = \frac{x – \mu}{\sigma} [/math]
Where x is the value of interest, μ is the mean, and σ is the standard deviation.
3. The Binomial Distribution
The binomial distribution is used to model the number of successes in a fixed number of trials, where each trial has only two possible outcomes (e.g., success or failure). It is often used in public health research to model outcomes such as the number of individuals who develop a disease after exposure or the number of individuals who test positive in a diagnostic test.
3.1 Definition of the Binomial Distribution
A binomial distribution describes the probability of obtaining a fixed number of successes in a fixed number of independent trials, where each trial has the same probability of success (p).
The formula for binomial probability is:
[math] P(X = k) = \binom{n}{k} p^k (1 – p)^{n-k} [/math]
Where:
- P(X = k): The probability of getting exactly k successes.
- n: The total number of trials.
- k: The number of successes.
- p: The probability of success on a single trial.
- (1 – p): The probability of failure on a single trial.
- n choose k: The binomial coefficient, which represents the number of ways to choose k successes from n trials.
3.2 Properties of the Binomial Distribution
- The number of trials (n) is fixed.
- Each trial is independent of the others.
- Each trial has two possible outcomes: success or failure.
4. The Poisson Distribution
The Poisson distribution is used to model the number of events that occur in a fixed interval of time or space, where the events occur randomly and independently. It is commonly used in public health to model the number of disease cases or the number of emergency room visits within a given time period.
4.1 Definition of the Poisson Distribution
A Poisson distribution describes the probability of a given number of events occurring in a fixed interval, given the average number of events (λ) expected in that interval.
The formula for Poisson probability is:
[math] P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!} [/math]
Where:
- P(X = k): The probability of observing exactly k events.
- λ: The average number of events in the interval.
- k: The number of events.
- e: Euler’s number (approximately 2.718).
- k!: The factorial of k.
4.2 Properties of the Poisson Distribution
- The average number of events (λ) is constant over time or space.
- Events occur independently of each other.
- The variance of the Poisson distribution is equal to the mean (λ).
5. Practice Questions
Test your understanding of probability distributions with these practice questions. Try answering them before checking the solutions.
Question 1:
In a population, the average number of asthma-related emergency room visits per day is 3. What is the probability of exactly 5 visits occurring on a given day? Assume a Poisson distribution.
Answer 1:
Answer, click to reveal
Here, λ = 3, and we want to find the probability of k = 5 visits. Using the Poisson formula:
[math] P(X = 5) = \frac{3^5 e^{-3}}{5!} = \frac{243 \times 0.0498}{120} = 0.1008 [/math]
Question 2:
A clinical trial involves 10 patients, and the probability that a patient will respond to the treatment is 0.7. What is the probability that exactly 7 patients will respond? Assume a binomial distribution.
Answer 2:
Answer, click to reveal
Using the binomial formula:
[math] P(X = 7) = \binom{10}{7} (0.7)^7 (0.3)^3 = \frac{10!}{7!(10 – 7)!} \times (0.7)^7 \times (0.3)^3 [/math]
Calculating this gives a probability of approximately 0.2668.
Question 3:
What proportion of data lies between one standard deviation above and below the mean in a normal distribution?
Answer, click to reveal
Answer 3:
Approximately 68% of the data in a normal distribution lie within one standard deviation of the mean.
6. Conclusion
Probability distributions are essential tools in public health research, helping us model different types of data and make inferences about populations. The normal, binomial, and Poisson distributions are widely used to represent different data types and scenarios.
Remember:
- The normal distribution is used for continuous data that are symmetrically distributed around the mean.
- The binomial distribution is used for modeling the number of successes in a fixed number of independent trials.
- The Poisson distribution is used for modeling the number of events occurring in a fixed interval of time or space.
Final Tip for the CPH Exam:
Make sure you understand the key properties of each probability distribution and how to apply them to real-world public health scenarios. In particular, make sure to remember the key differences between normal, binomial, and Poisson distributions and when they are used. Practice using the formulas to calculate probabilities, as this knowledge will be crucial for answering questions related to probability distributions on the Certified in Public Health (CPH) exam.
Humanities Moment
The featured image for this CPH Focus is The Wave (1917) by Christopher R. W. Nevinson (English, 1889 – 1946). Nevinson was a British painter and printmaker renowned for his powerful depictions of World War I, blending Futurism and Cubism to portray the mechanized brutality of modern warfare. He also occasionally used these styles in pieces showing natural beauty, as above. Initially a radical modernist, he later shifted to realism as the horrors of war outpaced abstraction. Though once hailed as one of Britain’s most important war artists, his postwar career was marred by personal conflict, exaggeration, and declining influence.