Drop a Query

# Python

## How to convert Python code to Cython (and speed up 100x)?

Using Cython, you can speed up existing Python code by an order of 100x or more. This is possible because Cython converts some of the Python code to C by doing some basic code changes. Even without any code change a speed up of 2x is commonly observed, like in this post example. Because, everything …

## How to convert Python to Cython inside Jupyter Notebooks?

Let’s see how to cythonize Python code inside Jupyter notebooks step by step. In this post we will see how to: Define and time a Python Function to benchmark How to run Python using Cython in Jupyter Notebook Let’s cythonize the function But, let’s first answer a basic question: What is the difference between CPython …

## KL Divergence – What is it and mathematical details explained

At its core, KL (Kullback-Leibler) Divergence is a statistical measure that quantifies the dissimilarity between two probability distributions. Think of it like a mathematical ruler that tells us the “distance” or difference between two probability distributions. Remember, in data science, we’re often working with probabilities – the chances of events happening. So, if we have …

## Probe Method – How to select features for ML models

The Probe method is a highly intuitive approach to feature selection. If a feature in the dataset contains only random numbers, it is not going to be a useful feature. Any feature that has lower feature importance than a random feature is suspicious. In this one, we will see: What is the Probe Method for …

## Cook’s Distance for Detecting Influential Observations

Cook’s distance is a measure computed to measure the influence exerted by each observation on the trained model. It is measured by building a regression model and therefore is impacted only by the X variables included in the model. What is Cooks Distance? Cook’s distance measures the influence exerted by each data point (row / …

## How to detect outliers with z-score

Z score, also called as standard score, is used to scale the features in a dataset for machine learning model training. It can also be used to detect outliers. In this one, we will first see how to compute Z-scores and then use it to detect outliers. How is Z-score used in machine learning? Now, …

## How to detect outliers using Z score?

Z score is one of the most important concepts in statistics. It is also called standard score. Typically it is used to scale the features for machine learning. But can also be used to detect outliers. Also Read: How to detect outliers with IQR and Box Plots How is Z-score used in machine learning? Now, …

## How to detect outliers using IQR and Boxplots?

Let’s understand what are outliers, how to identify them using IQR and Boxplots and how to treat them if appropriate. 1. What are outliers? In statistics, outliers are those specific data points that differ significantly from other data points in the dataset. There can be various reasons behind the outliers. It can be because of …

## Select columns in PySpark dataframe – A Comprehensive Guide to Selecting Columns in different ways in PySpark dataframe

Apache PySpark is a powerful big data processing framework, which allows you to process large volumes of data using the Python programming language. PySpark’s DataFrame API is a powerful tool for data manipulation and analysis. One of the most common tasks when working with DataFrames is selecting specific columns. In this blog post, we will …

## Introduction to PySpark – Unleashing the Power of Big Data using PySpark

Introduction As we continue to generate massive volumes of data every day, the importance of scalable data processing and analysis tools cannot be overstated. One such powerful tool is Apache Spark, an open-source, distributed computing system that has become synonymous with big data processing. In this blog post, we will introduce you to PySpark, the …

## Install opencv python – A Comprehensive Guide to Installing “OpenCV-Python”

OpenCV (Open Source Computer Vision Library) is an open-source computer vision and machine learning software library. OpenCV-Python is a Python wrapper for the original OpenCV C++ library. Let’s see how it install OpenCV in python. Introduction OpenCV enables users to perform image and video processing tasks with ease. In this blog post, we will provide …

## install pip mac – How to install pip in MacOS?: A Comprehensive Guide

Pip is a widely used package manager for Python, allowing you to install and manage Python packages easily. In this blog post, we’ll explore various methods to install Pip on MacOS. I’ll provide clear, reproducible code examples for each method, making it easy for you to get started with Pip on your MacOS system. Using …

## Scrapy vs. Beautiful Soup: Which is better for web scraping?

Web scraping is the technique of extracting data from a specific website or web page. This has wide applications in: Research and publication purposes Competitor and market studies Creating data for machine learning models The extracted data can be stored in any format be it a csv, txt, json, API etc so that it can …

## add Python to PATH – How to add Python to the PATH environment variable in Windows?

1. What is the purpose of adding Python to the PATH environment variable? Adding Python to the PATH environment variable in Windows allows you to run Python commands from any directory within the command prompt. Here are the steps to add Python to the PATH variable: 2. What is the PATH environment variable in Windows? …

Adaboost is one of the earliest implementations of the boosting algorithm. It forms the base of other boosting algorithms, like gradient boosting and XGBoost. This tutorial will take you through the math behind implementing this algorithm and also a practical example of using the scikit-learn Adaboost API. Contents: What is boosting? What is Adaboost? Algorithm …

## Numpy.random.randint() in python

numpy.random.randint function is used to get random integers from low to high values. The low value is included while the high value is excluded in the calculations. The output values are taken from the discrete uniform distribution of the range values. random.randint(low, high=None, size=None, dtype=int) Purpose: The numpy random randint function used for creating a …

## How to use numpy.random.uniform() in python.

The np.random.uniform() function is used to create an array with random samples from a uniform probability distribution of given low and high values. random.uniform(low=0.0, high=1.0, size=None) Purpose: The numpy random uniform function used for creating a numpy array with random float values from low to high interval. Parameteres: Low: float or array-like of floats,optional: Lowest …

## Numpy.sort() in python

The np.sort() function is used to sort the array along a specified axis. Numpy.sort (a, axis=- 1, kind=None, order=None) Purpose: This function is used for sorting the array. Parameters: arr:a:array_like array to be sorted. axis: None or int,optional Axis on which we perform the arithmetic mean if specified. otherwise, the arr will be flattened. kind: …

## numpy.median() – How to compute median in Python

numpy.median function is used to calculate the median of an array along a specific axis or multiple axes. Median is defined as the middle value separating the higher half from the lower half of a data sample in other words median is a value in the middle when you sort the values. In this post, …

## Setup Python environment for ML

Python is the most popular programming language used for AI and machine learning. Let’s see how to setup python environment for ML using anaconda. How to install Python? Simply visit Python.org, go to downloads section, download latest version that shows there and install it like you do for any other software. To do machine learning …

Course Preview