NLP Archives - Machine Learning Plus

What is Tokenization in Natural Language Processing (NLP)?

Tokenization is the process of breaking down a piece of text into small units called tokens. A token may be a word, part of a word or just characters like punctuation. It is one of the most foundational NLP task and a difficult one, because every language has its own grammatical constructs, which are often …

What is Tokenization in Natural Language Processing (NLP)? Read More »

Text Summarization Approaches for NLP – Practical Guide with Generative Examples

Complete Guide to Natural Language Processing (NLP) – with Practical Examples

3 Comments / NLP / By Shrivarsheni

Natural language processing (NLP) is the technique by which computers understand the human language. NLP allows you to perform a wide range of tasks such as classification, summarization, text-generation, translation and more. NLP has advanced so much in recent times that AI can write its own movie scripts, create poetry, summarize text and answer questions …

Complete Guide to Natural Language Processing (NLP) – with Practical Examples Read More »

Building chatbot with Rasa and spaCy

SpaCy Text Classification – How to Train Text Classification Model in spaCy (Solved Example)?

2 Comments / NLP / By Shrivarsheni

Text Classification is the process categorizing texts into different groups. SpaCy makes custom text classification structured and convenient through the textcat component. Text classification is often used in situations like segregating movie reviews, hotel reviews, news data, primary topic of the text, classifying customer support emails based on complaint type etc. For many real-life cases, …

SpaCy Text Classification – How to Train Text Classification Model in spaCy (Solved Example)? Read More »

spaCy Tutorial – Complete Writeup

spaCy is an advanced modern library for Natural Language Processing developed by Matthew Honnibal and Ines Montani. This tutorial is a complete guide to learn how to use spaCy for various tasks. Overview 1. Introduction The Doc object 2. Tokenization with spaCy 3. Text-Preprocessing with spaCy 4. Lemmatization 5. Strings to Hashes 6. Lexical attributes …

spaCy Tutorial – Complete Writeup Read More »

101 NLP Exercises (using modern libraries)

2 Comments / NLP / By Shrivarsheni

I hope you found this useful. For more such posts, stay tuned to our page ! Desired Output: #> [(‘incredible’, 0.90), #> (‘awesome’, 0.82), #> (‘unbelievable’, 0.82), #> (‘fantastic’, 0.77), #> (‘phenomenal’, 0.76), #> (‘astounding’, 0.73), #> (‘wonderful’, 0.72), #> (‘unbelieveable’, 0.71), #> (‘remarkable’, 0.70), #> (‘marvelous’, 0.70)] Difficulty Level : L2 22. How to …

101 NLP Exercises (using modern libraries) Read More »

Training Custom NER models in SpaCy to auto-detect named entities [Complete Guide]

15 Comments / NLP / By Shrivarsheni

Named-entity recognition (NER) is the process of automatically identifying the entities discussed in a text and classifying them into pre-defined categories. Categories could be entities like ‘person’, ‘organization’, ‘location’ and so on. The spaCy library allows you to train NER models by both updating an existing spacy model to suit the specific context of your …

Training Custom NER models in SpaCy to auto-detect named entities [Complete Guide] Read More »

Topic modeling visualization – How to present the results of LDA models?

29 Comments / NLP / By Selva Prabhakaran

In this post, we discuss techniques to visualize the output and results from topic model (LDA) based on the gensim package. Topic modeling visualization – How to present the results of LDA models? Contents Introduction Import NewsGroups Dataset Tokenize Sentences and Clean Build the Bigram, Trigram Models and Lemmatize Build the Topic Model Presenting the …

Topic modeling visualization – How to present the results of LDA models? Read More »

Cosine Similarity – Understanding the math and how it works (with python codes)

16 Comments / NLP / By Selva Prabhakaran

Cosine similarity is a metric used to measure how similar the documents are irrespective of their size. Mathematically, it measures the cosine of the angle between two vectors projected in a multi-dimensional space. The cosine similarity is advantageous because even if the two similar documents are far apart by the Euclidean distance (due to the …

Lemmatization Approaches with Examples in Python

10 Comments / NLP / By Selva Prabhakaran

Lemmatization is the process of converting a word to its base form. The difference between stemming and lemmatization is, lemmatization considers the context and converts the word to its meaningful base form, whereas stemming just removes the last few characters, often leading to incorrect meanings and spelling errors. Comparing Lemmatization Approaches in Python. Photo by …

Lemmatization Approaches with Examples in Python Read More »

LDA in Python – How to grid search best topic models?

80 Comments / NLP / By Selva Prabhakaran

Python’s Scikit Learn provides a convenient interface for topic modeling using algorithms like Latent Dirichlet allocation(LDA), LSI and Non-Negative Matrix Factorization. In this tutorial, you will learn how to build the best possible LDA topic model and explore how to showcase the outputs as meaningful results. Contents 1. Introduction 2. Load the packages 3. Import …

LDA in Python – How to grid search best topic models? Read More »

Topic Modeling with Gensim (Python)

229 Comments / NLP / By Selva Prabhakaran

Topic Modeling is a technique to extract the hidden topics from large volumes of text. Latent Dirichlet Allocation(LDA) is a popular algorithm for topic modeling with excellent implementations in the Python’s Gensim package. The challenge, however, is how to extract good quality of topics that are clear, segregated and meaningful. This depends heavily on the …

Topic Modeling with Gensim (Python) Read More »

NLP