Confidence interval is a measure to quantify the uncertainty in an estimated statistic (like the mean) when the true population parameter is unknown. Training…
P Value is a probability score that is used in statistical tests to establish the statistical significance of an observed effect. Though p-values are…
Python datatable is the newest package for data manipulation and analysis in Python. It carries the spirit of R’s data.table with similar syntax. It…
Vector Autoregression (VAR) is a forecasting algorithm that can be used when two or more time series influence each other. That is, the relationship…
Mahalanobis distance is an effective multivariate distance metric that measures the distance between a point and a distribution. It is an extremely useful metric…
Principal Components Analysis (PCA) is an algorithm to transform the columns of a dataset into a new set of features called Principal Components. By…
Matplotlib histogram is used to visualize the frequency distribution of numeric array by splitting it to small equal-sized bins. In this article, we explore…
Time series is a sequence of observations recorded at regular time intervals. This guide walks you through the process of analyzing the characteristics of…
In this post, we discuss techniques to visualize the output and results from topic model (LDA) based on the gensim package. Topic modeling visualization…