NLP Archives - Shubhanshu Gupta

Re-Imagining Topic Modeling in NLP: A Break from Conventional Approach

I recently spoke about Contextual Topic Modeling in NLP, at Google’s La Kopi event for developers. The feedback I received made my talk, a special one. So many folks reached out and mentioned that they found the topic, content and the technique, quite intriguing and helped them approach Topic Modeling in NLP from a different angle. So, I decided to post the talk here and I have also added the transcript below.

Collocations in NLP using NLTK Library

Collocations are phrases or expressions containing multiple words, that are highly likely to co-occur. For example – ‘social media’, ‘school holiday’, ‘machine learning’, ‘Universal Studios Singapore’, etc.

Multi Class Classification in Text using R: Predicting Ted Talk Ratings

Multi Class Classification in Text

This blog is in continuation to my NLP blog series. In the previous blogs, I discussed data pre-processing steps in R and recognizing emotions present in ted talks. In this blog, I am going to predict the ratings of the ted talks given by viewers. This would require Multi Class Classification and quite a bit of data cleaning and preprocessing. We will discuss each step in detail below. So, let’s dive in.

Continue reading “Multi Class Classification in Text using R: Predicting Ted Talk Ratings”

Emotions in Ted Talks: Text Analytics in R

Image result for emotions nlp

This post is in continuation with my NLP blog series. You might want to checkout my previous blog in which I discussed data pre-processing in R. In this blog, I will determine the emotions in Ted Talks. At the end, I will compute a HeatMap of emotions and talks to aid in our visualization.

So, without further ado, let’s dive in!

Continue reading “Emotions in Ted Talks: Text Analytics in R”

Data Preprocessing in R

data preprocessing

I have recently got my hands dirty with Natural Language Processing (NLP). I know, it’s a little late to the party but I am at least in the party!

To start with a general overview, I implemented quite a few tasks related to NLP including Text Classification, Document Similarity, Part-of-Speech (POS) Tagging, Emotion Recognition, etc. These tasks were made possible by implementing text pre-processing (noise removal, stemming) and text to features (TF-IDF, N-Grams, Topic Modeling, etc). I implemented these in both R and Python. So, I will try to jot down my experiences in both of these environments. Therefore, I will write this as a blog series, wherein each blog will discuss only one particular thing implemented in one particular environment.

Continue reading “Data Preprocessing in R”