Drop a Query

# Machine Learning

## KL Divergence – What is it and mathematical details explained

At its core, KL (Kullback-Leibler) Divergence is a statistical measure that quantifies the dissimilarity between two probability distributions. Think of it like a mathematical ruler that tells us the “distance” or difference between two probability distributions. Remember, in data science, we’re often working with probabilities – the chances of events happening. So, if we have …

## Probe Method – How to select features for ML models

The Probe method is a highly intuitive approach to feature selection. If a feature in the dataset contains only random numbers, it is not going to be a useful feature. Any feature that has lower feature importance than a random feature is suspicious. In this one, we will see: What is the Probe Method for …

## Cook’s Distance for Detecting Influential Observations

Cook’s distance is a measure computed to measure the influence exerted by each observation on the trained model. It is measured by building a regression model and therefore is impacted only by the X variables included in the model. What is Cooks Distance? Cook’s distance measures the influence exerted by each data point (row / …

## How to detect outliers with z-score

Z score, also called as standard score, is used to scale the features in a dataset for machine learning model training. It can also be used to detect outliers. In this one, we will first see how to compute Z-scores and then use it to detect outliers. How is Z-score used in machine learning? Now, …

## How to detect outliers using Z score?

Z score is one of the most important concepts in statistics. It is also called standard score. Typically it is used to scale the features for machine learning. But can also be used to detect outliers. Also Read: How to detect outliers with IQR and Box Plots How is Z-score used in machine learning? Now, …

## How to detect outliers using IQR and Boxplots?

Let’s understand what are outliers, how to identify them using IQR and Boxplots and how to treat them if appropriate. 1. What are outliers? In statistics, outliers are those specific data points that differ significantly from other data points in the dataset. There can be various reasons behind the outliers. It can be because of …

## MICE imputation – How to predict missing values using machine learning in Python

MICE Imputation, short for ‘Multiple Imputation by Chained Equation’ is an advanced missing data imputation technique that uses multiple iterations of Machine Learning model training to predict the missing values using known values from other features in the data as predictors. What is MICE Imputation? You can impute missing values by predicting them using other …

## Spline Interpolation – How to find the polynomial curve to interpolate missing values

Spline interpolation is a special type of interpolation where a piecewise lower order polynomial called spline is fitted to the datapoints. That is, instead of fitting one higher order polynomial (as in polynomial interpolation), multiple lower order polynomials are fitted on smaller segments. This can be implemented in Python. You can do non-linear spline interpolation …

## Interpolation in Python – How to interpolate missing data, formula and approaches

Interpolation can be used to impute missing data. Let’s see the formula and how to implement in Python. But, you need to be careful with this technique and try to really understand whether or not this is a valid choice for your data. Often, interpolation is applicable when the data is in a sequence or …

## Missing Data Imputation Approaches | How to handle missing values in Python

Machine Learning works on the idea of garbage in – garbage out. If you put in useless junk data to the machine learning algorithm, the results will also be, well, ‘junk’. The quality and consistency of results depend on the data provided. Missing values in data degrade the quality. Why clean the data before training …

## Exploratory Data Analysis (EDA) – How to do EDA for Machine Learning Problems using Python

Exploratory Data Analysis, simply referred to as EDA, is the step where you understand the data in detail. You understand each variable individually by calculating frequency counts, visualizing the distributions, etc. Also the relationships between the various combinations of the predictor and response variables by creating scatterplots, correlations, etc. EDA is typically part of every …

## ML Modeling – Problem statement and Data description

ML modeling is the step where machine learning is used to find patterns in data and use that learned knowledge to predict an outcome. The type of ML modeling we are going to solve in this problem is called ‘Churn Modeling’. Let’s first understand the Churn modeling problem statement and then go over the data …

Adaboost is one of the earliest implementations of the boosting algorithm. It forms the base of other boosting algorithms, like gradient boosting and XGBoost. This tutorial will take you through the math behind implementing this algorithm and also a practical example of using the scikit-learn Adaboost API. Contents: What is boosting? What is Adaboost? Algorithm …

## How to formulate machine learning problem

Let’s understand how to define and formulate the machine learning problem (for predictive modeling) from a business problem. This structured approach should help you apply the process to most other types of predictive modeling problems at work. Introduction Often in ML teams, you will hear from the business/company departments about the problems and issues they …

## Build your first ML project

Let’s build your first machine learning project with Python from scratch. “But I am a complete beginner, I am not ready yet!..” – Your mind voice. If you have been looking to get started in ML, but can’t really figure out how and where to start, then this one is for you. Just read on.. …

## Setup Python environment for ML

Python is the most popular programming language used for AI and machine learning. Let’s see how to setup python environment for ML using anaconda. How to install Python? Simply visit Python.org, go to downloads section, download latest version that shows there and install it like you do for any other software. To do machine learning …

## Train Test Split – How to split data into train and test for validating machine learning models?

The train-test split technique is a way of evaluating the performance of machine learning models. Whenever you build machine learning models, you will be training the model on a specific dataset (X and y). Once trained, you want to ensure the trained model is capable of performing well on the unseen test data as well. …

## Task Checklist for Almost Any Machine Learning Project

A cheat sheet of tasks and things to take care of for every end-to-end ML projects. In this, I write down a check list of items and tasks to check whenever you start with a new Data Science / ML project. Once you start off with the project there will be so many things going …

## Data Science Roadmap – How to become a Data Scientist? (6 month self study plan)

Today, I discuss the Data Science Roadmap, the missing guide to self study machine learning. I’ll discuss what exactly you need to know and do in order to self study Data science / ML / AI / Stats. I will provide you with some of the best resources for each topic, why you need to …

## Why learn the math behind Machine Learning and AI?

Why learn the math behind machine learning algorithms when you can readily implement it using the python libraries like scikit-learn, h2o, statsmodels etc? This is a fair question especially coming from beginners when it is easy to implement ML with few lines of code and get the results fast. Now, you must understand that learning …

Course Preview