PyFriday Tutorial: How to Calculate T-Tests in Python

A pencil drawing of a riverbank with trees by Francis Seymour Haden

T-tests are a fundamental statistical tool used in various fields, including public health, to compare the means of two groups. Essentially, a T-test helps determine whether the observed differences in data sets (such as rates of a disease, responses to a treatment, or other health indicators) are statistically significant or likely due to chance. This test is vital in research and data analysis, as it provides a scientific basis for drawing conclusions about populations based on sample data. Whether it’s comparing the effectiveness of different treatments, understanding health outcomes across different demographics, or analyzing behavioral health interventions, T-tests offer a reliable method for making informed, evidence-based decisions in public health and beyond. Here, we’ll take a look at how to run a T-Test in Python.

Background: Understanding T-Tests

Before we look at how to run T-Tests in Python, it’s worth understanding what we’re actually doing. As mentioned above, a T-Test is at its core, a comparison between two groups to see if they are different from each other in a way that cannot be explained by simple chance.

Types of T-Tests

There are 4 main groups of T-Tests that one may need to choose from when testing a hypothesis. These are:

Sample Dependent:
- Independent Samples T-Test (Two-Sample T-Test):
  - Purpose: Compares the means of two independent or unrelated groups.
  - Example: Comparing the blood pressure of patients who received two different treatments.
- Paired Samples T-Test (Dependent Samples T-Test):
  - Purpose: Compares the means of the same group or matched pairs at two different times or under two different conditions.
  - Example: Measuring students’ test scores before and after a training program to assess its effectiveness.
- One-Sample T-Test:
  - Purpose: Compares the mean of a single group against a known mean (or a theoretical value like zero).
  - Example: Testing if the average height of a group of individuals differs from a specific value, like the national average.
Tail Dependent:
- One-Tailed T-Test:
  - Purpose: Tests for the possibility of the relationship in one specific direction.
  - Example: Testing if a new drug increases recovery rates compared to the existing standard (only looking for an increase, not a decrease).
- Two-Tailed T-Test:
  - Purpose: Tests for the possibility of the relationship in either direction, without specifying which direction is expected.
  - Example: Testing if a new drug changes recovery rates (either increase or decrease) compared to the existing standard.
Assumed Variance:
- Equal Variance T-Test (Pooled Variance T-Test):
  - Purpose: Used when the variances of the two groups are assumed to be equal.
  - Example: Comparing the test scores of students from two classes where the variation in scores is similar in both classes.
- Unequal Variance T-Test (Welch’s T-Test):
  - Purpose: Used when the variances of the two groups are not assumed to be equal.
  - Example: Comparing the test scores of students from two schools where one school shows much greater variation in scores than the other.
Assumption of Non-Normal Distribution:
- Robust T-Tests:
  - Purpose: Used when data do not meet the assumptions of normality. These are variations of the T-test that are less sensitive to non-normal distributions.
  - Example: Comparing means from data with outliers or skewed distribution.

This is a very broad overview of each category, and we’ll cover more in our Epi Explained series entry on T-Tests. For now, we’ll focus on perhaps the broadest and most generally applicable T-Test for our tutorial, the 2 tailed T-Test. Let’s get started.

Mathematical Underpinnings

In terms of the underlying math for T-Tests, it’s not quite as easy as our previous topics of Odds Ratio and Relative Risk, and can even seem a bit intimidating, but let’s break it down a little bit.

[math] t = \frac{\bar{X}_1 – \bar{X}_2}{\sqrt{s^2 \left(\frac{1}{n_1} + \frac{1}{n_2}\right)}} [/math]
The above formula is for what is called the T-Statistic. Its components are interpreted as:

[math] \bar{X}_1[/math] : The mean of Group 1
[math] \bar{X}_2[/math] : The mean of Group 2
[math] s^2[/math] : The Pooled Variance, which requires another formula we’ll cover below.
[math] \frac{1}{n_1} [/math]: 1/ the sample size of Group 1.
[math] \frac{1}{n_2} [/math]: 1/ the sample size of Group 2.

Now for the Pooled Variance. In short, Pooled variance is a weighted average of the variances of the two above samples. As a reminder, variance is the squared difference between each value of a group and the mean of said group, finally divided by the number of the group – 1. the formula for Pooled Variance is:

[math] s_p^2 = \frac{(n_1 – 1) \times s_1^2 + (n_2 – 1) \times s_2^2}{n_1 + n_2 – 2} [/math]

Where:

[math]s_p^2 [/math] is the pooled variance.
[math]n_1[/math] is the sample size for Group 1.
[math]n_2[/math]is the sample size for Group 2.
[math]s_1^2[/math] is the variance for Group 1.
[math]s_2^2[/math] is the variance for Group 2.

It might be worth noting that the denominator, [math]{n_1 + n_2 – 2}[/math] is the calculation for the Degrees of Freedom (DF), which will be a critical calculation in many other statistical operations.

After calculating through these equations, the final step is to reference a T-Distribution table with your Degrees of Freedom and the level of significance (p value) you wish to use, and see if the value is larger than the referenced table value. If the equation value is more than the reference value, then your results could be considered significant. Otherwise, any differences are likely to be due to chance.

With the underlying math out of the way, let’s look at how we can do a T-Test in Python, using real world data.

Calculating A T-Test in Python

As usual, if you’d like to follow along with full code write-up and data, please go to Cody’s Github and download the T-Tests folder.

Step 1: Library set up

For our tutorial, we’re only going to to install pandas, scipy and numpy. We’re also going to shorten up a few names using theascommand:

Step 2: Collect and Format Data

Now, we can take a look at our data. For this tutorial, we’ll be borrowing some data from the Northern Irish Statistics and Research Agency (NISRA), particularly their data around the number of suicides in North and South Belfast Assembly Areas from 1997 to 2018. We can have as a null hypothesis that there is no difference in suicide rates between North and South Belfast. I’ve pulled a CSV which has two rows (one for North Belfast, another for South), and the columns are
Assembly Area and then every other column is a year. This closely resembles data you’re likely to experience in the field, as it’s not formatted how we’d like. Let’s quickly read in the file before we get too ahead of ourselves.

Now that our data is cleaner, we can format it a bit. While there are perhaps more efficient ways to go about it, I think this makes for a good time to look at long and wide dataframe transformations. As our data is now, it’d be very hard to fit into a T-Test calculation, as we cannot easily iterate across columns to collect the values (counts per year) we need for our groups (location). It could be said to be very wide. Let’s transform it to a longer format.

Now there’s quite a bit to unpack from one line of code. Here, we’re taking df and applying a method to it, melt from the pandas library. Within it, we see id_vars=['Assembly Area'] which is telling Python that we want the Assembly Areas to essentially be a key identifier for our future rows. Then, we aim to make a new column called Year which pivots those many columns for each year to rows. Lastly, we have value_name='Count' which takes what used to be the values within the tables and sets it as a new column that is appropriately transformed so now instead of having a wide dataframe of 2 rows by 23 columns, we instead have a long one of 44 rows by 3 columns.

This does make the data a bit cleaner, but ideally we’ll have to make this a bit wider by have a separate column for each of our location groups, and a reference column to our Year. This will look remarkably similar to above.

Now that our data is properly formatted, let’s get to testing.

Step 3:T-Test, The Easy Way

The easiest way to run a T-Test in Python, now that we have our data formatted, is just calling the T-Test function against our groups, like so:

For this, we’re reaching into the stats library and using its standard T-Test for Independence function. From there, we’re taking all values for both of our Assembly Areas, dropping any empty values from the calculation, and assuming that the variance is not equal between these two populations. Then, we can print out our values.
When we look at t_test_result in our Console, we can see that there’s a very significant difference between the two locations.

And there we have it.

Step 4: T-Test, The Hard Way

Say for whatever reason, you don’t quite trust the above function to do all the hard work for you and want to handle this all manually. Then, we can use the code below which is fairly self-explanatory (given the above mathematical discussion). There isn’t too much to delve into as it’s mostly simple calculations that are absent any complex functions.

Conclusion

Here, we took a look at what T-Tests are, the math underneath the term, and even how to use them in Python to answer a question about suicide rates in Northern Ireland.

Humanities Moment

This tutorials featured image is Sunset in Ireland (1863) by Francis Seymour Haden (English, 1818-1910). Haden, a surgeon by profession, also helped revitalized the practice of etching in the English art scene. He was also an avid collector of Rembrandt, and a president of the Society of Royal Printmakers in London.

Epidemiology, Broadly Speaking