Collocations are phrases or expressions containing multiple words, that are highly likely to co-occur. For example – ‘social media’, ‘school holiday’, ‘machine learning’, ‘Universal Studios Singapore’, etc.Continue reading “Collocations in NLP using NLTK Library”
This blog is in continuation to my NLP blog series. In the previous blogs, I discussed data pre-processing steps in R and recognizing emotions present in ted talks. In this blog, I am going to predict the ratings of the ted talks given by viewers. This would require Multi Class Classification and quite a bit of data cleaning and preprocessing. We will discuss each step in detail below. So, let’s dive in.
This post is in continuation with my NLP blog series. You might want to checkout my previous blog in which I discussed data pre-processing in R. In this blog, I will determine the emotions in Ted Talks. At the end, I will compute a HeatMap of emotions and talks to aid in our visualization.
So, without further ado, let’s dive in!
I have recently got my hands dirty with Natural Language Processing (NLP). I know, it’s a little late to the party but I am at least in the party!
To start with a general overview, I implemented quite a few tasks related to NLP including Text Classification, Document Similarity, Part-of-Speech (POS) Tagging, Emotion Recognition, etc. These tasks were made possible by implementing text pre-processing (noise removal, stemming) and text to features (TF-IDF, N-Grams, Topic Modeling, etc). I implemented these in both R and Python. So, I will try to jot down my experiences in both of these environments. Therefore, I will write this as a blog series, wherein each blog will discuss only one particular thing implemented in one particular environment.