Explore the intricacies of hypothesis testing, a cornerstone of statistical analysis. Dive into methods, interpretations, and applications for making datadriven decisions.
In this Blog post we will learn:
 What is Hypothesis Testing?
 Steps in Hypothesis Testing
2.1. Set up Hypotheses: Null and Alternative
2.2. Choose a Significance Level (α)
2.3. Calculate a test statistic and PValue
2.4. Make a Decision  Example : Testing a new drug.
 Example in python
 Conclusion
1. What is Hypothesis Testing?
In simple terms, hypothesis testing is a method used to make decisions or inferences about population parameters based on sample data. Imagine being handed a dice and asked if it’s biased. By rolling it a few times and analyzing the outcomes, you’d be engaging in the essence of hypothesis testing.
Think of hypothesis testing as the scientific method of the statistics world. Suppose you hear claims like “This new drug works wonders!” or “Our new website design boosts sales.” How do you know if these statements hold water? Enter hypothesis testing.
2. Steps in Hypothesis Testing
 Set up Hypotheses: Begin with a null hypothesis (H0) and an alternative hypothesis (Ha).
 Choose a Significance Level (α): Typically 0.05, this is the probability of rejecting the null hypothesis when it’s actually true. Think of it as the chance of accusing an innocent person.
 Calculate Test statistic and PValue: Gather evidence (data) and calculate a test statistic.
 Make a Decision:
 pvalue: This is the probability of observing the data, given that the null hypothesis is true. A small pvalue (typically ≤ 0.05) suggests the data is inconsistent with the null hypothesis.
 Decision Rule: If the pvalue is less than or equal to α, you reject the null hypothesis in favor of the alternative.
2.1. Set up Hypotheses: Null and Alternative
Before diving into testing, we must formulate hypotheses. The null hypothesis (H0) represents the default assumption, while the alternative hypothesis (H1) challenges it.
For instance, in drug testing,
H0 : “The new drug is no better than the existing one,”
H1 : “The new drug is superior.”
2.2. Choose a Significance Level (α)
When You collect and analyze data to test H0 and H1 hypotheses. Based on your analysis, you decide whether to reject the null hypothesis in favor of the alternative, or fail to reject / Accept the null hypothesis.
The significance level, often denoted by $α$, represents the probability of rejecting the null hypothesis when it is actually true.
In other words, it’s the risk you’re willing to take of making a Type I error (false positive).
Type I Error (False Positive):
 Symbolized by the Greek letter alpha (α).
 Occurs when you incorrectly reject a true null hypothesis. In other words, you conclude that there is an effect or difference when, in reality, there isn’t.
 The probability of making a Type I error is denoted by the significance level of a test. Commonly, tests are conducted at the 0.05 significance level, which means there’s a 5% chance of making a Type I error.
 Commonly used significance levels are 0.01, 0.05, and 0.10, but the choice depends on the context of the study and the level of risk one is willing to accept.
Example: If a drug is not effective (truth), but a clinical trial incorrectly concludes that it is effective (based on the sample data), then a Type I error has occurred.
Type II Error (False Negative):
 Symbolized by the Greek letter beta (β).
 Occurs when you accept a false null hypothesis. This means you conclude there is no effect or difference when, in reality, there is.
 The probability of making a Type II error is denoted by β. The power of a test (1 – β) represents the probability of correctly rejecting a false null hypothesis.
Example: If a drug is effective (truth), but a clinical trial incorrectly concludes that it is not effective (based on the sample data), then a Type II error has occurred.
Balancing the Errors:
In practice, there’s a tradeoff between Type I and Type II errors. Reducing the risk of one typically increases the risk of the other. For example, if you want to decrease the probability of a Type I error (by setting a lower significance level), you might increase the probability of a Type II error unless you compensate by collecting more data or making other adjustments.
It’s essential to understand the consequences of both types of errors in any given context. In some situations, a Type I error might be more severe, while in others, a Type II error might be of greater concern. This understanding guides researchers in designing their experiments and choosing appropriate significance levels.
2.3. Calculate a test statistic and PValue
Test statistic:
A test statistic is a single number that helps us understand how far our sample data is from what we’d expect under a null hypothesis (a basic assumption we’re trying to test against). Generally, the larger the test statistic, the more evidence we have against our null hypothesis. It helps us decide whether the differences we observe in our data are due to random chance or if there’s an actual effect.
Pvalue:
The Pvalue tells us how likely we would get our observed results (or something more extreme) if the null hypothesis were true.
It’s a value between 0 and 1.
– A smaller Pvalue (typically below 0.05) means that the observation is rare under the null hypothesis, so we might reject the null hypothesis.
– A larger Pvalue suggests that what we observed could easily happen by random chance, so we might not reject the null hypothesis.
2.4. Make a Decision
Relationship between $α$ and PValue
When conducting a hypothesis test:
 We first choose a significance level ($α$), which sets a threshold for making decisions.

We then calculate the pvalue from our sample data and the test statistic.

Finally, we compare the pvalue to our chosen $α$:
 If $p−value≤α$: We reject the null hypothesis in favor of the alternative hypothesis. The result is said to be statistically significant.
 If $p−value>α$: We fail to reject the null hypothesis. There isn’t enough statistical evidence to support the alternative hypothesis.
3. Example : Testing a new drug.
Imagine we are investigating whether a new drug is effective at treating headaches faster than drug B.
Setting Up the Experiment:
You gather 100 people who suffer from headaches. Half of them (50 people) are given the new drug (let’s call this the ‘Drug Group’), and the other half are given a sugar pill, which doesn’t contain any medication.
 Set up Hypotheses:
Before starting, you make a prediction:
 Null Hypothesis (H0): The new drug has no effect. Any difference in healing time between the two groups is just due to random chance.
 Alternative Hypothesis (H1): The new drug does have an effect. The difference in healing time between the two groups is significant and not just by chance.
 Choose a Significance Level (α): Typically 0.05, this is the probability of rejecting the null hypothesis when it’s actually true

Calculate Test statistic and PValue:
After the experiment, you analyze the data. The “test statistic” is a number that helps you understand the difference between the two groups in terms of standard units.For instance, let’s say:
 The average healing time in the Drug Group is 2 hours.
 The average healing time in the Placebo Group is 3 hours.
The test statistic helps you understand how significant this 1hour difference is. If the groups are large and the spread of healing times in each group is small, then this difference might be significant. But if there’s a huge variation in healing times, the 1hour difference might not be so special.
Pvalue
Imagine the Pvalue as answering this question: “If the new drug had NO real effect, what’s the probability that I’d see a difference as extreme (or more extreme) as the one I found, just by random chance?”
For instance:
 Pvalue of 0.01 means there’s a 1% chance that the observed difference (or a more extreme difference) would occur if the drug had no effect. That’s pretty rare, so we might consider the drug effective.
 Pvalue of 0.5 means there’s a 50% chance you’d see this difference just by chance. That’s pretty high, so we might not be convinced the drug is doing much.
 Making a Decision
Commonly, researchers use a threshold (like 0.05, which is 5%) to decide: If the Pvalue is less than ($α$) 0.05: the results are “statistically significant,” and they might reject the null hypothesis, believing the new drug has an effect.
 If the Pvalue is greater than ($α$) 0.05: the results are not statistically significant, and they don’t reject the null hypothesis, remaining unsure if the drug has a genuine effect.
4. Example in python
For simplicity, let’s say we’re using a ttest (common for comparing means). Let’s dive into Python:
import numpy as np
from scipy import stats
# 1. Setting Up the Experiment
# Sample data: recovery times (in days) with & without the new drug
without_drug = np.array([10, 11, 9, 12, 13])
with_drug = np.array([7, 6, 5, 6, 7])
# 2. Choose a Significance Level (α) : Commonly 0.05
# 3. Calculate Test statistic and PValue
t_statistic, p_value = stats.ttest_ind(without_drug, with_drug)
print(f"tstatistic: {t_statistic}")
print(f"pvalue: {p_value}")
tstatistic: 5.999999999999999
pvalue: 0.00032339322188514914
Making a Decision: “The results are statistically significant! pvalue < 0.05, The drug seems to have an effect!” If not, we’d say, “Looks like the drug isn’t as miraculous as we thought.”
5. Conclusion
Hypothesis testing is an indispensable tool in data science, allowing us to make datadriven decisions with confidence. By understanding its principles, conducting tests properly, and considering realworld applications, you can harness the power of hypothesis testing to unlock valuable insights from your data.