Menu

Measures of Central Tendency – A Clear Guide with Examples on Measures of Central Tendency

When diving into the world of statistics, you’ll frequently come across the term “measures of central tendency”. But what exactly does it mean, and why is it so important? Let’s break it down, step by step, with practical examples to drive the point home.

In this blog post we will learn:

  1. What Are Measures of Central Tendency?
  2. Why Are Measures of Central Tendency Important?
  3. Mean (Average)
    3.1. Arithmetic Mean for Ungrouped Data
    3.2. Arithmetic Mean for Grouped Data
    3.3. Geometric Mean
    3.4. Harmonic Mean
  4. Median
    4.1. Median for Ungrouped Data
    4.2. Median for Grouped Data:
  5. Mode
    5.1. Mode for Ungrouped Data:
    5.2. Mode for Grouped Data:
  6. Conclusion

1. What Are Measures of Central Tendency?

Measures of central tendency give us a single value that attempts to describe a set of data by identifying its central position. Think of them as a way to summarize large amounts of data with one representative number.

There are three primary measures of central tendency:

1. Mean

2. Median

3. Mode

2. Why Are Measures of Central Tendency Important?

  1. Simplicity: They offer a quick snapshot of your data set. For instance, if someone asks about the typical score on a test, telling them the mean gives a good idea.

  2. Comparisons: You can easily compare two or more data sets. If the mean temperature in July is 78°F and in August is 85°F, you quickly realize August was warmer on average.

  3. Foundations for Other Analyses: Many statistical tests and models require the use of the mean or other central measures as part of their computations.


3. Mean (Average)

3.1. Arithmetic Mean for Ungrouped Data

The mean, often referred to as the average, is the sum of all the data points divided by the number of data points.

How to Calculate:
$ \text{Mean} = \frac{\text{Sum of all data points}}{\text{Number of data points}} $

Formula:
$ \text{Arithmetic Mean (AM)} = \frac{\sum_{i=1}^{n} x_i}{n} $

Where:
– $ x_i $ represents each number in the collection
– $ n $ is the total number of values

Example: Consider the test scores of five students: 85, 89, 91, 78, and 90.

$ \text{Mean} = \frac{85+89+91+78+90}{5} = \frac{433}{5} = 86.6 $

The mean score of the students is 86.6.

def arithmetic_mean(numbers):
    return sum(numbers) / len(numbers)

scores = [85, 89, 91, 78, 90]
print(f"The mean score is: {arithmetic_mean(scores)}")
The mean score is: 86.6
# Here's how you can compute the mean using numpy library in Python:

import numpy as np

scores = [85, 89, 91, 78, 90]
mean = np.mean(scores)
print(f"The mean score is: {mean}")

The mean score is: 86.6

3.2. Arithmetic Mean for Grouped Data

What is Grouped Data?

Grouped data is data organized into classes or intervals. This presentation simplifies large datasets, making them easier to analyze. The computation of mean, median, and mode for grouped data differs from ungrouped data.

Grouped Data Example:

Classes Frequencies
10-20 5
20-30 10
30-40 20
40-50 15

For grouped data, the mean is given by:

$ \text{Mean} = \frac{\sum f \times x}{N} $

Where:
– $ f $ is the frequency of each group.
– $ x $ is the midpoint of each group.
– $ N $ is the total number of observations, which is the sum of all frequencies.

def grouped_mean(classes, frequencies):
    midpoints = [(low + high) / 2 for (low, high) in classes]
    total_data_points = sum(frequencies)
    return sum(f * m for f, m in zip(frequencies, midpoints)) / total_data_points

# Example data
classes = [(10, 20), (20, 30), (30, 40), (40, 50)]
frequencies = [5, 10, 20, 15]

print(f"Grouped Mean: {grouped_mean(classes, frequencies)}")
Grouped Mean: 34.0

3.3. Geometric Mean

The geometric mean provides an average value that indicates the central tendency of a set of numbers by using the product of their values. It’s particularly helpful when comparing products of numbers, like growth rates.

Formula:
$ \text{Geometric Mean (GM)} = \sqrt[n]{\prod_{i=1}^{n} x_i} $

Example:
Consider the numbers 2 and 8:
$ GM = \sqrt[2]{2 \times 8} = 4 $

import math

def geometric_mean(numbers):
    product = 1
    for num in numbers:
        product *= num
    return math.pow(product, 1/len(numbers))

scores = [85, 89, 91, 78, 90]
geometricMean = geometric_mean(scores)
print(f"The geometric mean is: {geometricMean}")
The geometric mean is: 86.4644309042832

Alternatively, you can use the scipy library which provides a function for geometric mean

from scipy.stats import gmean

scores = [85, 89, 91, 78, 90]
geometric_mean = gmean(scores)
print(f"The geometric mean is: {geometric_mean}")

The geometric mean is: 86.4644309042832

3.4. Harmonic Mean

The harmonic mean is an important concept when we’re looking at rates or proportions. Instead of giving equal weight to all values in a dataset, the harmonic mean gives more weight to smaller values.

Formula:
$ \text{Harmonic Mean (HM)} = \frac{n}{\sum_{i=1}^{n} \frac{1}{x_i}} $

Where:
– $ n $ is the total number of values
– $ x_i $ represents each number in the collection

Example:
For the numbers 2, 3, and 4:
$ HM = \frac{3}{\frac{1}{2} + \frac{1}{3} + \frac{1}{4}} $

def harmonic_mean(numbers):
    return len(numbers) / sum(1/num for num in numbers)

values = [85, 89, 91, 78, 90]
print(harmonic_mean(values))
86.32403550081507

Alternatively, you can use the scipy library which provides a function for Harmonic Mean

from scipy.stats import hmean

scores = [85, 89, 91, 78, 90]
harmonic_mean = hmean(scores)
print(f"The harmonic mean is: {harmonic_mean}")
The harmonic mean is: 86.32403550081507

Properties of mean:

  1. It uses all values in the dataset.
  2. The mean is affected by extreme values (outliers). An outlier can skew the mean.
  3. If you add or subtract a constant from every value, the mean will increase or decrease by the same constant.
  4. If you multiply or divide every value by a constant, the mean will also be multiplied or divided by that constant.
  5. The sum of the deviations of each value from the mean is always zero.

4. Median

4.1. Median for Ungrouped Data

The median is the middle value of an ordered data set. If the data set has an odd number of observations, the median is the middle number. If it has an even number of observations, the median is the average of the two middle numbers.

How to Find the Median:

  • Arrange the data points in ascending order.
  • If there’s an odd number of data points, the middle one is the median.
  • If there’s an even number of data points, average the two middle ones.

Example: Using the same student scores: 85, 89, 91, 78, and 90.

Ordered set: 78, 85, 89, 90, 91

The median is 89 since it’s the third (or middle) score in the ordered set.

Properties of Median:

  1. It divides the data set into two equal parts. 50% of the data lies below the median and 50% above the median.
  2. The median is not affected by extreme values (outliers). Even if you change the value of an outlier, the median will remain 3. the same as long as the order of the data does not change.
  3. For a dataset with odd numbers of observations, the median is the center value.
  4. For a dataset with even numbers of observations, the median is the average of the two center values.
def calculate_median(scores):
    # Sort the scores
    sorted_scores = sorted(scores)

    # Find the number of scores
    n = len(sorted_scores)

    # Check if even or odd
    if n % 2 == 1:  # Odd number of scores
        return sorted_scores[n // 2]
    else:  # Even number of scores
        left_of_middle = sorted_scores[(n - 1) // 2]
        right_of_middle = sorted_scores[n // 2]
        return (left_of_middle + right_of_middle) / 2

scores = [85, 89, 91, 78, 90]
median = calculate_median(scores)
print(f"The median score is: {median}")

The median score is: 89

Here’s a method utilizing the numpy library, which offers a built-in function to compute the median:

import numpy as np

scores = [85, 89, 91, 78, 90]
median = np.median(scores)
print(f"The median score is: {median}")
The median score is: 89.0

4.2. Median for Grouped Data:

To compute the median for grouped data:
1. Identify the median class, where the cumulative frequency first exceeds $ N/2 $.
2. Use the formula:
$ \text{Median} = L + \left( \frac{\frac{N}{2} – CF}{f} \right) \times w $
Where:
– $ L $ is the lower class boundary of the median class.
– $ CF $ is the cumulative frequency of the class before the median class.
– $ f $ is the frequency of the median class.
– $ w $ is the width of the median class.

def grouped_median(classes, frequencies):
    total_data_points = sum(frequencies)
    median_point = total_data_points / 2

    cum_freq = 0
    for i, freq in enumerate(frequencies):
        cum_freq += freq
        if cum_freq >= median_point:
            lower_bound, _ = classes[i]
            if i == 0:
                cum_freq_before_median_class = 0
            else:
                cum_freq_before_median_class = cum_freq - freq
            class_width = classes[i][1] - classes[i][0]
            return lower_bound + ((median_point - cum_freq_before_median_class) / freq) * class_width

# Example data
classes = [(10, 20), (20, 30), (30, 40), (40, 50)]
frequencies = [5, 10, 20, 15]

print(f"Grouped Mean: {grouped_median(classes, frequencies)}")
Grouped Mean: 35.0

5. Mode

5.1. Mode for Ungrouped Data:

The mode refers to the number that appears most frequently in a data set.

How to Identify the Mode: Simply identify which data point or points occur most frequently.

Example: Consider the shoe sizes of seven individuals: 7, 7, 8, 8, 8, 9, 10.

The shoe size 8 appears three times, more than any other size, making 8 the mode.

Properties of Mode:

  1. A dataset may have no mode if no value is repeated.
  2. A dataset can have one mode (unimodal), two modes (bimodal), or multiple modes (multimodal).
  3. The mode is not necessarily unique, meaning a dataset can have more than one mode.
  4. The mode is the only measure of central tendency that can be used with nominal data (data that can be categorized but not ordered or quantified).
  5. The mode is not affected by extreme values (outliers) unless the frequency of the outliers surpasses that of the current mode.
# Here's how you can compute the mode using Python:

from collections import Counter

def calculate_mode(numbers):
    # Count the occurrences of each number
    count = Counter(numbers)

    # Find the maximum frequency
    max_freq = max(count.values())

    # Extract numbers that have the maximum frequency
    modes = [num for num, freq in count.items() if freq == max_freq]

    return modes

numbers = [7, 7, 8, 8, 8, 9, 10]
mode = calculate_mode(numbers)
print(f"The mode is: {mode}")

The mode is: [8]
# Here's how you can compute the mode using statistics library in Python:

import statistics

numbers = [7, 7, 8, 8, 8, 9, 10]
mode = statistics.mode(numbers)
print(f"The mode is: {mode}")
The mode is: 8

5.2. Mode for Grouped Data:

The modal class is the class with the highest frequency. The mode is then estimated using:
$ \text{Mode} = L + \left( \frac{d1}{d1 + d2} \right) \times w $
Where:
– $ L $ is the lower class boundary of the modal class.
– $ d1 $ is the difference between the frequency of the modal class and the previous class.
– $ d2 $ is the difference between the frequency of the modal class and the next class.
– $ w $ is the width of the modal class.

Example:

Given the data:

Classes: 10-20, 20-30, 30-40, 40-50

Frequencies: 5, 10, 20, 15

def grouped_mode(classes, frequencies):
    mode_class_index = frequencies.index(max(frequencies))
    lower_bound, _ = classes[mode_class_index]
    class_width = classes[mode_class_index][1] - classes[mode_class_index][0]

    if mode_class_index == 0:
        d1 = frequencies[0] - 0
    else:
        d1 = frequencies[mode_class_index] - frequencies[mode_class_index-1]

    if mode_class_index == len(frequencies) - 1:
        d2 = frequencies[-1] - 0
    else:
        d2 = frequencies[mode_class_index] - frequencies[mode_class_index+1]

    return lower_bound + (d1 / (d1 + d2)) * class_width

# Example data
classes = [(10, 20), (20, 30), (30, 40), (40, 50)]
frequencies = [5, 10, 20, 15]

print(f"Grouped Mode: {grouped_mode(classes, frequencies)}")
Grouped Mode: 36.666666666666664

6. Conclusion

Measures of central tendency, including the mean, median, and mode, provide a way to summarize and understand complex data sets. They give us insights, allow for comparisons, and form the basis for many advanced statistical procedures. As you delve deeper into statistics, you’ll appreciate the foundational knowledge of these central measures and their significance in data interpretation.

Course Preview

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science