Statistics - One Sample t-Test
Table of Contents
This article explains the one-sample t-test used in statistics.
Furthermore, we will proceed with the one-sample t-test using the Python Scipy library.
One Sample t-Test #
The one-sample t-test is one of the hypothesis testing methods used in statistical analysis, utilized to test the mean of a single sample. It is commonly applied to check if the population mean is equal to a specific value.
The one-sample t-test consists of the following steps:
1. Hypothesis Setting #
H₀ : 𝜇 = 𝜇₀ → Null Hypothesis | The population mean is equal to the sample mean. |
---|---|
H₁ : 𝜇 ≠ 𝜇₀ → Alternative Hypothesis | The population mean is not equal to the sample mean. |
2. Sampling #
Extract a sample from the population and calculate the mean of that sample.
3. Calculation of the Test Statistic #
Calculate the t-statistic, which represents the difference between the sample mean and the expected mean based on the hypothesis.
4. Decision / Conclusion #
If the calculated t-statistic falls within the rejection region, reject the null hypothesis and accept the alternative hypothesis.
Otherwise, do not reject the null hypothesis.
- In the case of a two-tailed test, the rejection regions are the symmetric ends of the t-distribution.
If the null hypothesis is rejected, it is concluded that the sample is different from the population.
Conversely, if it is not rejected, it is concluded to be not statistically significant.
Using Python Library Scipy #
Next, we will proceed with the one-sample t-test using the Python Scipy library.
The data we are dealing with here contains the circumference, height, and volume of 31 trees.
We want to see if the mean of this sample is consistent with the population mean through a one-sample t-test. The hypothesis is as follows:
The significance level is set at 0.05.
Hypothesis Testing
Null Hypothesis : The mean is 75.
Alternative Hypothesis : The mean is not 75.
Let’s first load the data.
>>> import pandas as pd
>>> df = pd.read_csv("./data/trees.csv")
>>> df.head()
Girth | Height | Volume | |
---|---|---|---|
0 | 8.3 | 70 | 10.3 |
1 | 8.6 | 65 | 10.3 |
2 | 8.8 | 63 | 10.2 |
3 | 10.5 | 72 | 16.4 |
4 | 10.7 | 81 | 18.8 |
Also, let’s calculate the mean of ‘Height’.
>>> result = df['Height'].mean()
>>> round(result, 2) # (Rounded to the second decimal place)
76.0
Next, we will load the Scipy library for a one-sample t-test.
>>> from scipy import stats
The one-sample t-test uses ttest_1samp in Scipy.
Reference : https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_1samp.html
Then, let’s calculate the test statistic for hypothesis testing.
>>> from math import sqrt
>>> t_score, p_value = stats.ttest_1samp(df['Height'], popmean=75)
>>> print(round(t_score, 2), round(p_value, 2))
0.87 0.39
popmean is the same as the mean expected in the null hypothesis.
After calculating the p-value of the above statistics (rounded to the fourth decimal place), let’s check whether to reject or not reject the null hypothesis under the significance level of 0.05.
>>> print(round(p_value, 4))
0.3892
>>> if p_value >= 0.05:
print("Accept")
else:
print("Reject")
Accept
Therefore, we did not reject the null hypothesis that the sample mean is equal to the population mean.