AdaBoost – An Introduction to AdaBoost

Adaboost is one of the earliest implementations of the boosting algorithm. It forms the base of other boosting algorithms, like gradient boosting and XGBoost.

This tutorial will take you through the math behind implementing this algorithm and also a practical example of using the scikit-learn Adaboost API.

1) What is boosting?

Boosting is a general ensemble method that creates a strong classifier from a number of weak ones. We first build a weak model and then build a second model based on the errors from the first model. This process is repeated over and over again until we build a classifier that can make predictions accurately and the error is minimised.

Boosting differs from bagging in that it trains the weak learners sequentially and not in parallel. This process can be described as:
– Train the model h1 on the whole set
– Train the model h2 with exaggerated data on the regions in which h1 performs poorly
– Train the model h3 with exaggerated data on the regions in which h1 ≠ h2 … and so on

2) What is Adaboost?

AdaBoost is one of the first boosting algorithms to have been introduced. It is mainly used for classification, and the base learner (the machine learning algorithm that is boosted) is usually a decision tree with only one level, also called as stumps.

It makes use of weighted errors to build a strong classifier from a series of weak classifiers.

Algorithm behind Adaboost

It works in the following steps:

Initially, Adaboost selects a training subset randomly
It iteratively trains the AdaBoost machine learning model by selecting the training set based on the accurate prediction of the last training
It assigns the higher weight to wrong classified observations so that in the next iteration these observations will get the high probability for classification
Also, It assigns the weight to the trained classifier in each iteration according to the accuracy of the classifier. The more accurate classifier will get high weight
This process iterates until the complete training data fits without any error or until reached to the specified maximum number of estimators

What are weighted errors?

The basic concept behind Adaboost is to set the weights of classifiers and training the data sample in each iteration such that it ensures the accurate predictions of unusual observations.
The weight of each sample for the first iteration is :

                                            weight(xi) = 1/n
                                            where,   
                                               xi = i'th sample 
                                               n = number of samples

How is an AdaBoost model actually trained?

A weak classifier (decision stump) is prepared on the training data using the weighted samples. Only binary (two-class) classification problems are supported, so each decision stump makes one decision on one input variable and outputs a +1.0 or -1.0 value for the first or second class value.

The misclassification rate is calculated for the trained model.

                                    Error = sum(w(i) * terror(i)) / sum(w)

which is the weighted sum of the misclassification rate, where w is the weight for sample i and terror is the prediction error for sample i which is 1 if misclassified and 0 if correctly classified.

For example, if we had 3 samples with the weights 0.01, 0.5 and 0.2. The predicted values were -1, -1 and -1, and the actual output variables in the instances were -1, 1 and -1, then the terrors would be 0, 1, and 0. The misclassification rate would be calculated as:

                        Error = (0.01* 0 + 0.5* 1 + 0.2* 0) / (0.01 + 0.5 + 0.2) = 0.704

What is a stage value?

A stage value is calculated for the trained model which provides a weighting for any predictions that the model makes. The stage value for a trained model is calculated as follows:

                                        stage = ln((1-error) / error)

where stage is the stage value used to weight predictions from the model, ln() is the natural logarithm and error is the misclassification error for the model. The effect of the stage weight is that more accurate models have more weight or contribution to the final prediction.

The training weights are updated giving more weight to incorrectly predicted outcomes, and less weight to correctly predicted outcomes.

For example, the weight of one training instance (w) is updated using:

                                           w = w * exp(stage * terror)

Where w is the weight for a specific training instance, exp() is the numerical constant e raised to (stage + terror), stage is the misclassification rate for the weak classifier and terror is the error the weak classifier made predicting the output variable for the training instance, evaluated as:

                                        terror = 0 if(y == p), otherwise 1

Where y is the output variable for the sample xi and p is the prediction from the weak learner.

This has the effect of not changing the weight if the training instance was classified correctly and making the weight slightly larger if the weak learner misclassified the instance. After updation of weights this iteration is done again, and again until the number of maximum iterations have been reached.

The picture below shows a visual explanation for how Adaboost works. The + and – signs are to divided by the dotted line. For each misclassified observation, note that the weight of that observation increases for the next weak learner.

The final strong learner combines all these weak learners to get classify the signs.

3) AdaBoost – In Action

You can implement AdaBoost on any dataset that has a binary output to be predicted. Here we use the inbuilt dataset from sklearn, ‘iris’, where there are 50 entries and 3 classes of flowers. Each entry belongs to a type a flower plant. You first have to import all relevant libraries from sklearn library and then use the classifier AdaBoostClassifier().

# Load libraries
from sklearn.ensemble import AdaBoostClassifier
from sklearn import datasets

# Import train_test_split function
from sklearn.model_selection import train_test_split

#Import scikit-learn metrics module for accuracy calculation
from sklearn import metrics

# Load data
iris = datasets.load_iris()
X = iris.data
y = iris.target

You now have to split the data into training and test sets to make predictions and calculate the accuracy of predictions made by Adaboost algorithm. This is done using train_test_split().

# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3) # 70% training and 30% test

The next step is to implement AdaBoost on the dataset we have loaded. There are three main parameters of the AdaBoostClassifier(). They are:
* n_estimators – The maximum number of weak learners that AdaBoost is allowed to build.
* learning_rate – Learning rate shrinks the contribution of each classifier by learning_rate.
* base_estimator – This is the algorithm which is boosted to build the complete model. By default it is set as DecisionTreeClassifier(max_depth=1). The depth here signifies the number of levels in the descision tree. As deptht is 1 in this case, it means we are working with stumps.

# Create adaboost classifer object
AdaBoostClf = AdaBoostClassifier(n_estimators=50,
                         learning_rate=1)
# Train Adaboost Classifer
model = AdaBoostClf.fit(X_train, y_train)

#Predict the response for test dataset
y_pred = model.predict(X_test)

The final test of our model is finding the accuracy of its predictions. We compare y_pred and y_test.

# Model Accuracy, how often is the classifier correct?
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

Accuracy: 0.9333333333333333

The accuracy achieved is 93.33 percent, which is a pretty good score for a basic boosting algorithm like AdaBoost. This is a proof of how effective boosting is in increasing the perfomance of even simple algorithms such as decision trees.

References

An Introduction to Gradient Boosting Decision Trees

Machine Learning

KL Divergence – What is it and mathematical details explained

Oct 02, 2023

Machine Learning

Probe Method – How to select features for ML models

Sep 30, 2023

Machine Learning

Cook’s Distance for Detecting Influential Observations

Aug 09, 2023

Machine Learning

How to detect outliers with z-score

Aug 05, 2023

Machine Learning

How to detect outliers using Z score?

Aug 01, 2023

Machine Learning

How to detect outliers using IQR and Boxplots?

Jul 30, 2023

Python

Machine Learning

AdaBoost – An Introduction to AdaBoost

Contents:

1) What is boosting?

2) What is Adaboost?

Algorithm behind Adaboost

What are weighted errors?

How is an AdaBoost model actually trained?

What is a stage value?

3) AdaBoost – In Action

References

More Articles

KL Divergence – What is it and mathematical details explained

Probe Method – How to select features for ML models

Cook’s Distance for Detecting Influential Observations

How to detect outliers with z-score

How to detect outliers using Z score?

How to detect outliers using IQR and Boxplots?

Similar Articles

Complete Introduction to Linear Regression in R

How to implement common statistical significance tests and find the p value?

Logistic Regression – A Complete Tutorial With Examples in R

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos: