How to implement common statistical significance tests and find the p value?

Statistical Significance Tests In R

How to implement and interpret the commonly used statistical significance tests in R? Understand the purpose, when to use and how to interpret the test results and the p value.

Statistical significance tests and p value. Photo by Austin Neill.

Contents

1. Correlation Test and Introduction to p value
2. One Sample t-Test
3. Wilcoxon Signed Rank Test
4. Two Sample t-Test and Wilcoxon Rank Sum Test
5. Shapiro Test
6. Kolmogorov And Smirnov Test
7. Fisher’s F-Test
8. Chi-Squared Test
9. More Commonly Used Tests

You may have come across arbitrary statistical claims on newspapers and magazines. Have you ever wondered if they are really true? Even if you are provided with the data behind the claims, how do you conclude if such claims are real (or not)?

Well, statistical significance tests can help you with that. Not just newspaper claims, they have wide use cases in industrial, technological and scientific applications as well.

1. Correlation Test and Introduction to p value

Why is it used?

To test the linear relationship between two continuous variables.

The cor.test() function computes the correlation between two continuous variables and test if the y is dependent on the x. The null hypothesis is that the true correlation between x and y is zero.

cor.test(x, y) # where x and y are numeric vectors.
cor.test(cars$speed, cars$dist)
#> Pearson's product-moment correlation
#> 
#> data:  cars$speed and cars$dist
#> t = 9.464, df = 48, p-value = 1.49e-12
#> alternative hypothesis: true correlation is not equal to 0
#> 95 percent confidence interval:
#>   0.6816422 0.8862036
#> sample estimates:
#>       cor 
#> 0.8068949

So what is p value and why is it needed?

The p-value roughly indicates the probability of an uncorrelated system to produce datasets that have a correlation at least as extreme as the one computed.

That’s probably hard to read. Let me try to make it a bit easier.

Let’s take two datasets, one with 10 observations and the other has 1000.

A correlation value of .8 on the dataset with 1000 observations is not the same as getting .8 correlation with 10 observations. Because as you add more data, the correlation may come down (or go up). It could have so happened, the 10 observations chosen were luckily strongly correlated compared to the rest of the population.

So the computed metric (in this case, correlation) needs a measure of ‘trust’, to determine if it is statistically significant. This applies to not just correlation but to any metric like the mean, the difference of means, etc.

In simpler terms, the lower the p-value, the lesser the chance that this much correlation happened as a matter of chance. As a result, the p-value has to be very low in order for us to trust the calculated metric. The lower the p-value (< 0.01 or 0.05 typically), stronger is the significance of the relationship.

Also remember, the p-value is not an indicator of the strength of the relationship, just the statistical significance. The strength is measured by the correlation itself.

How to interpret?

If the p-Value is less than 0.05, we reject the null hypothesis that the true correlation is zero (i.e. they are independent). So in this case, we reject the null hypothesis and conclude that dist is dependent on speed.

2. One Sample t-Test

Example claim: Average height of Ethiopian male rabbit is 1.5ft

Why is it used?

One sample t-Test is a commonly used significant test that is used to test if the mean of a sample from a normal distribution could reasonably be a specific value.

It is a parametric test, which means there is an underlying assumption that the sample you are testing is from a probability distribution, like the normal distribution.

If no such assumption is made, you may use the Wilcoxon signed rank test, a non-parametric test discussed in next section.

set.seed(100)
x <- rnorm(50, mean = 10, sd = 0.5)  draw from normal distribn
t.test(x, mu=10) # testing if mean of x could be
#> One Sample t-test
#> 
#> data:  x
#> t = 0.70372, df = 49, p-value = 0.4849
#> alternative hypothesis: true mean is not equal to 10
#> 95 percent confidence interval:
#>   9.924374 10.157135
#> sample estimates:
#> mean of x 
#>  10.04075

How to interpret?

In above case, the p-Value is not less than significance level of 0.05, therefore the null hypothesis that the mean=10 cannot be rejected.

Also, note that the 95% confidence interval range includes the value 10 within its range. So, it is ok to say the mean of ‘x’ is 10, especially since ‘x’ is assumed to be normally distributed. In case, a normal distribution is not assumed, use Wilcoxon signed rank test shown in next section.

Note: Use conf.level argument to adjust the confidence level.

3. Wilcoxon Signed Rank Test

Example claim: Mean unit sales per day at ‘SparesMart’ – a local spare parts retailer is 24 units per day.

Why / When is it used?

Wilcoxon signed rank test is used to test if the mean of a sample can reasonably be a specific value when a probability distribution is not assumed.

It can be an alternative to t-Test, especially when the data sample is not assumed to follow a normal distribution. It is a non-parametric method used to test if an estimate is different from its true value.

numeric_vector <- c(20, 29, 24, 19, 20, 22, 28, 23, 19, 19)
wilcox.test(numeric_vector, mu=20, conf.int = TRUE)
#>  Wilcoxon signed rank test with continuity correction
#>
#> data:  numeric_vector
#> V = 30, p-value = 0.1056
#> alternative hypothesis: true location is not equal to 20
#> 90 percent confidence interval:
#>  19.00006 25.99999
#> sample estimates:
#> (pseudo)median 
#>       23.00002

How to interpret?

If p-Value < 0.05, reject the null hypothesis and accept the alternative mentioned in your R code’s output. Type example(wilcox.test) in R console for more illustration.

4. Two Sample t-Test and Wilcoxon Rank Sum Test

Both t.Test and Wilcoxon rank test can be used to compare the mean of 2 samples. The difference is t-Test assumes the samples being tests is drawn from a normal distribution, while, Wilcoxon’s rank sum test does not.

How to implement in R?

Pass the two numeric vector samples into the t.test() when sample is distributed ‘normal’y and wilcox.test() when it isn’t assumed to follow a normal distribution.

x <- c(0.80, 0.83, 1.89, 1.04, 1.45, 1.38, 1.91, 1.64, 0.73, 1.46)
y <- c(1.15, 0.88, 0.90, 0.74, 1.21)
wilcox.test(x, y, alternative = "g")  # g for greater
#=> Wilcoxon rank sum test
#=> 
#=> data:  x and y
#=> W = 35, p-value = 0.1272
#=> alternative hypothesis: true location shift is greater than 0

With a p-Value of 0.1262, we cannot reject the null hypothesis that both x and y have same means.

t.test(1:10, y = c(7:20)) # P = .00001855
#=> Welch Two Sample t-test
#=> 
#=> data:  1:10 and c(7:20)
#=> t = -5.4349, df = 21.982, p-value = 1.855e-05
#=> alternative hypothesis: true difference in means is not equal to 0
#=> 95 percent confidence interval:
#=>   -11.052802  -4.947198
#=> sample estimates:
#=> mean of x mean of y 
#=>       5.5      13.5

With p-Value < 0.05, we can safely reject the null hypothesis that there is no difference in mean.

What if we want to do a 1-to-1 comparison of means for values of x and y?

Use paired = TRUE for 1-to-1 comparison of observations.

t.test(x, y, paired = TRUE) # when observations are paired, use 'paired' argument.
wilcox.test(x, y, paired = TRUE) # both x and y are assumed to have similar shapes

When can I conclude if the mean’s are different?

Conventionally, If the p-Value is less than the significance level (ideally 0.05), reject the null hypothesis that both means are the are equal.

5. Shapiro Test

Why is it used?

To test if a sample follows a normal distribution.

shapiro.test(numericVector) # Does numericVector follow a normal disbn?

Let’s see how to do the test on a sample from a normal distribution.

# Example: Test a normal distribution
set.seed(100)
normaly_disb <- rnorm(100, mean=5, sd=1) # generate a normal distribution
shapiro.test(normaly_disb)  # the shapiro test.
#> Shapiro-Wilk normality test
#>
#> data:  normaly_disb
#> W = 0.98836, p-value = 0.535

How to interpret?
The null hypothesis here is that the sample being tested is normally distributed. Since the p-Value is not less than the significance level of 0.05, we don’t reject the null hypothesis. Therefore, the tested sample is confirmed to follow a normal distribution (thou, we already know that!).

# Example: Test a uniform distribution
set.seed(100)
not_normaly_disb <- runif(100)  # uniform distribution.
shapiro.test(not_normaly_disb)
#>     Shapiro-Wilk normality test

#> data:  not_normaly_disb
#> W = 0.96509, p-value = 0.009436

How to interpret?

If p-Value is less than the significance level of 0.05, the null-hypothesis that it is normally distributed can be rejected, which is the case here.

6. Kolmogorov And Smirnov Test

Kolmogorov-Smirnov test is used to check whether 2 samples follow the same distribution.

ks.test(x, y) # x and y are two numeric vector

When x and y are from different distributions

# From different distributions
x <- rnorm(50)
y <- runif(50)
ks.test(x, y)  # perform ks test
#> Two-sample Kolmogorov-Smirnov test
#> 
#> data:  x and y
#> D = 0.58, p-value = 4.048e-08
#> alternative hypothesis: two-sided

When both x and y are from normal distribution.

# Both from normal distribution
x <- rnorm(50)
y <- rnorm(50)
ks.test(x, y)  # perform ks test
#> Two-sample Kolmogorov-Smirnov test
#> 
#> data:  x and y
#> D = 0.18, p-value = .3959
#> alternative hypothesis: two-sided

How to tell if they are from the same distribution ?

If p-Value < 0.05 (significance level), we reject the null hypothesis that they are drawn from the same distribution. In other words, p < 0.05 implies x and y from different distributions

7. Fisher’s F-Test

Fisher’s F test can be used to check if two samples have the same variance.

var.test(x, y)  # Do x and y have the same variance?

Alternatively fligner.test() and bartlett.test() can be used for the same purpose.

8. Chi-Squared Test

Chi-squared test in R can be used to test if two categorical variables are dependent, by means of a contingency table.

Example use case: You may want to figure out if big budget films become box-office hits. We got 2 categorical variables (Budget of film, Success Status) each with 2 factors (Big/Low budget and Hit/Flop), which forms a 2 x 2 matrix.

chisq.test(table(categorical_X, categorical_Y), correct = FALSE)  # Yates continuity correction not applied
#or
summary(table(categorical_X, categorical_Y)) # performs a chi-squared test.
# Sample results
#> Pearson's Chi-squared test
#> data:  M
#> X-squared = 30.0701, df = 2, p-value = 2.954e-07

How to tell if x, y are independent?

There are two ways to tell if they are independent:

By looking at the p-Value: If the p-Value is less than 0.05, we fail to reject the null hypothesis that the x and y are independent. So for the example output above, (p-Value=2.954e-07), we reject the null hypothesis and conclude that x and y are not independent.

From Chi.sq value: For 2 x 2 contingency tables with 2 degrees of freedom (d.o.f), if the Chi-Squared calculated is greater than 3.841 (critical value), we reject the null hypothesis that the variables are independent. To find the critical value of larger d.o.f contingency tables, use qchisq(0.95, n-1), where n is the number of variables.

9. More Commonly Used Tests

# Fisher's exact test to test independence of rows and columns in contingency table
fisher.test(contingencyMatrix, alternative = "greater")  
friedman.test()  # Friedman's rank sum non-parametric test 

There are more useful tests available in various other packages.

The package lawstat has a good collection. The outliers package has a number of test for testing for presence of outliers.