Text Mining Archives - Shubhanshu Gupta

Collocations in NLP using NLTK Library

Collocations are phrases or expressions containing multiple words, that are highly likely to co-occur. For example – ‘social media’, ‘school holiday’, ‘machine learning’, ‘Universal Studios Singapore’, etc.

Emotions in Ted Talks: Text Analytics in R

Image result for emotions nlp

This post is in continuation with my NLP blog series. You might want to checkout my previous blog in which I discussed data pre-processing in R. In this blog, I will determine the emotions in Ted Talks. At the end, I will compute a HeatMap of emotions and talks to aid in our visualization.

So, without further ado, let’s dive in!

Continue reading “Emotions in Ted Talks: Text Analytics in R”

Data Preprocessing in R

data preprocessing

I have recently got my hands dirty with Natural Language Processing (NLP). I know, it’s a little late to the party but I am at least in the party!

To start with a general overview, I implemented quite a few tasks related to NLP including Text Classification, Document Similarity, Part-of-Speech (POS) Tagging, Emotion Recognition, etc. These tasks were made possible by implementing text pre-processing (noise removal, stemming) and text to features (TF-IDF, N-Grams, Topic Modeling, etc). I implemented these in both R and Python. So, I will try to jot down my experiences in both of these environments. Therefore, I will write this as a blog series, wherein each blog will discuss only one particular thing implemented in one particular environment.

Continue reading “Data Preprocessing in R”