
CPH Focus: Evidence-Based Approaches to Public Health: Regression Analysis: Logistic Regression
In this tutorial, we will explore the topic of logistic regression, a key statistical method used in public health research. Unlike linear regression, which is used for continuous outcomes, logistic regression is employed when the outcome variable is binary (e.g., disease vs. no disease). Understanding logistic regression is essential for the Certified in Public Health (CPH) exam and for analyzing health data where outcomes are categorical.
By the end of this tutorial, you will understand what logistic regression is, how it is used in public health, and how to interpret its results. We will also include practice questions to help reinforce your understanding.
Table of Contents:
- Introduction to Logistic Regression
- When to Use Logistic Regression
- Key Concepts in Logistic Regression
- Odds and Odds Ratios
- Logit Function
- Interpreting Coefficients
- Assumptions of Logistic Regression
- Practice Questions
- Conclusion
1. Introduction to Logistic Regression
Logistic regression is a statistical method used to model the relationship between a set of independent variables (predictors) and a binary dependent variable (outcome). The outcome variable is categorical and typically represents two possible outcomes, such as having a disease (yes/no) or being a smoker (yes/no).
In public health, logistic regression is commonly used to examine the association between risk factors (e.g., age, smoking status) and health outcomes (e.g., presence or absence of a disease).
The general form of the logistic regression equation is:
[math] \text{logit}(p) = \ln \left( \frac{p}{1 – p} \right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_k X_k [/math]
Where:
- logit(p): The log-odds of the probability of the event occurring (e.g., having the disease).
- p: The probability of the event occurring (e.g., probability of disease).
- β0: The intercept term.
- β1, β2, …, βk: The coefficients for each predictor variable X1, X2, …, Xk.
Logistic regression transforms the probability of the outcome into a continuous log-odds scale, which allows the model to estimate the effect of each predictor on the likelihood of the outcome occurring.
2. When to Use Logistic Regression
Logistic regression is used when the dependent variable is binary or dichotomous. Some common scenarios where logistic regression is applied in public health research include:
- Predicting whether a patient has a disease (yes/no) based on risk factors like age, gender, or smoking status.
- Assessing the likelihood of a public health intervention’s success (effective/ineffective) based on demographic and behavioral factors.
- Modeling the probability of an outcome, such as mortality (alive/dead) or disease progression (progressed/not progressed).
If the outcome is continuous (e.g., blood pressure levels), linear regression is used instead of logistic regression.
3. Key Concepts in Logistic Regression
3.1 Odds and Odds Ratios
An important concept in logistic regression is the odds. The odds represent the ratio of the probability that an event occurs to the probability that it does not occur:
[math] \text{Odds} = \frac{p}{1 – p} [/math]
Where:
- p: The probability of the event occurring (e.g., having a disease).
- 1 – p: The probability of the event not occurring (e.g., not having the disease).
The odds ratio (OR) is a measure used to describe the association between a predictor variable and the outcome. The odds ratio tells us how the odds of the outcome change with a one-unit increase in the predictor variable.
For example, an odds ratio of 2 means that the odds of the outcome occurring are twice as high for each one-unit increase in the predictor.
3.2 Logit Function
The logit function is the natural logarithm of the odds. Logistic regression models the logit of the probability as a linear function of the predictors:
[math] \text{logit}(p) = \ln \left( \frac{p}{1 – p} \right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_k X_k [/math]
By transforming the probability into the logit, logistic regression can handle the nonlinear relationship between the predictors and the binary outcome.
3.3 Interpreting Coefficients
The coefficients (β) in a logistic regression model represent the change in the log-odds of the outcome for a one-unit change in the predictor variable. However, odds ratios are easier to interpret in practice. The odds ratio is calculated by exponentiating the coefficient:
[math] \text{Odds Ratio} = e^{\beta} [/math]
An odds ratio of:
- Greater than 1: Indicates that the predictor increases the odds of the outcome.
- Equal to 1: Indicates no effect on the odds of the outcome.
- Less than 1: Indicates that the predictor decreases the odds of the outcome.
For example, if the odds ratio for smoking status (smoker vs. non-smoker) is 3, this means that smokers have three times the odds of developing the disease compared to non-smokers.
4. Assumptions of Logistic Regression
Like any statistical model, logistic regression relies on certain assumptions. These include:
- Linearity in the logit: The relationship between the independent variables and the log-odds of the dependent variable should be linear.
- Independent observations: The observations should be independent of each other.
- No multicollinearity: The independent variables should not be highly correlated with each other.
- Large sample size: Logistic regression requires a sufficiently large sample size to provide reliable estimates, especially when there are multiple predictors.
5. Practice Questions
Test your understanding of logistic regression with these practice questions. Try answering them before checking the solutions.
Question 1:
A study is conducted to determine the likelihood of developing diabetes (yes/no) based on body mass index (BMI). What type of regression analysis should be used?
Answer 1:
Answer: Click to reveal
The outcome variable (developing diabetes) is binary (yes/no), so logistic regression should be used to analyze the relationship between BMI and the likelihood of developing diabetes.
Question 2:
In a logistic regression model, the odds ratio for smoking status (smoker vs. non-smoker) is 2.5. What does this odds ratio mean in terms of the likelihood of developing lung cancer?
Answer 2:
Answer: Click to reveal
An odds ratio of 2.5 means that smokers have 2.5 times the odds of developing lung cancer compared to non-smokers.
Question 3:
What assumption is required for the relationship between the independent variables and the dependent variable in logistic regression?
Answer 3:
Answer: Click to reveal
Logistic regression assumes that the relationship between the independent variables and the log-odds of the dependent variable is linear (known as linearity in the logit).
6. Conclusion
Logistic regression is a powerful tool for analyzing binary outcomes in public health research. By estimating the odds of an outcome occurring, logistic regression helps researchers understand the relationship between risk factors and health outcomes. This method is widely used in studies examining disease risk, intervention effectiveness, and health behaviors.
Always keep these key points in mind:
- Logistic regression is used when the outcome variable is binary (e.g., disease/no disease).
- The model estimates the log-odds of the outcome, which can be transformed into odds ratios for easier interpretation.
- Odds ratios indicate how the odds of the outcome change with a one-unit increase in a predictor variable.
Final Tip for the CPH Exam:
Ensure you understand how to interpret odds ratios from logistic regression and when to use this method. Practice applying logistic regression to real-world public health problems, as this knowledge is essential for the Certified in Public Health (CPH) exam and for analyzing data in health research.
Humanities Moment
The featured image for this CPH Focus is Seascape (1913) by Constant Permeke (Belgian, 1886–1952). Permeke was a pivotal figure in Flemish Expressionism, acclaimed for his evocative paintings and powerful sculptures depicting the harsh lives of fishermen, farmers, and workers. Drawing from his experiences in war and personal hardship, Permeke’s robust forms and earthy palettes express deep empathy for his subjects, while his later years reveal a gentle refinement in color and line. Despite adversity, his work profoundly shaped Belgian modern art in both painting and sculpture.