Collocations in NLP using NLTK Library

Collocation in NLTK

Collocations are phrases or expressions containing multiple words, that are highly likely to co-occur. For example – ‘social media’, ‘school holiday’, ‘machine learning’, ‘Universal Studios Singapore’, etc.

Continue reading “Collocations in NLP using NLTK Library”

Saturday Kids: Code in the Community Experience

Kids of 8-10 years of age are incredibly smart who are treading high on the curve of curiosity and learning. Thus, it’s equally challenging to teach such kids. Did I just write challenging? Did I not mention that I feel a strange pull for anything challenging? Jokes apart, in June I came across an opportunity to teach Python/Scratch to kids in Singapore. The program briefed a 10 week Code in the Community program run by Saturday Kids in collaboration with Google. This post is an account of my experience and learnings throughout these 10 weeks with Saturday Kids.

Continue reading “Saturday Kids: Code in the Community Experience”

Time Series Analysis using Pandas

Time series, a series of data points ordered in time. Pretty intuitive, isn’t it? Time series analysis helps in businesses in analyzing past data, predict trends, seasonality, and numerous other use cases. Some examples of time series analysis in our day to day lives include:

  • Measuring weather
  • Measuring number of taxi rides
  • Stock prediction

In this blog, we will be dealing with stock market data and will be using Python 3, Pandas and Matplotlib.

Continue reading “Time Series Analysis using Pandas”

Handling Imbalanced Dataset with SMOTE in Python

Close your eyes and imagine that you live in a utopian world of perfect data. What do you see? What do you wish to see? Wait! are you imagining a flawless balanced dataset? A collection of data whose labels form a magnificent 1:1 ratio: 50% of this, 50% of that; not a bit to the left, nor a bit to the right. Just perfectly balanced, as all things should be. Now open your eyes, and come back to the real world. Well, this blog is all about how to handle imbalanced datasets.

Continue reading “Handling Imbalanced Dataset with SMOTE in Python”

A Survey of API Management Platforms

In my previous blog, I discussed how I landed up interning at Dentsu. I also discussed that I worked on scouting and building a POC for a cloud agnostic, open source API management tool/platform which could help in setting up API design, gateway, store, and analytics. In this blog, I will be jotting down my work in much more detail.

We will be exploring four API Management platforms, namely:

Continue reading “A Survey of API Management Platforms”

What am I doing right now? Internship.. Studies.. or Both?!

My experience of hunting for and landing an internship in Singapore.

Internship experience
Bhagavad Gita

सुखदु:खे समे कृत्वा लाभालाभौ जयाजयौ |

ततो युद्धाय युज्यस्व नैवं पापमवाप्स्यसि ||

Chapter 2 Verse 38, Bhagavad Gita

Shree Krishna says Fight for the sake of duty, treating alike happiness and distress, loss and gain, victory and defeat. Fulfilling your duty and responsibility in this way, you will never incur sin.

Arjuna’s was apprehensive that by killing his enemies, he would incur sin. Shree Krishna addresses his apprehension and he advises him to do his duty (dharma), without attachment to the fruits of his action. Such an attitude will release him from any sinful reactions.

Continue reading “What am I doing right now? Internship.. Studies.. or Both?!”

Multi Class Classification in Text using R: Predicting Ted Talk Ratings

Multi Class Classification in Text

This blog is in continuation to my NLP blog series. In the previous blogs, I discussed data pre-processing steps in R and recognizing emotions present in ted talks. In this blog, I am going to predict the ratings of the ted talks given by viewers. This would require Multi Class Classification and quite a bit of data cleaning and preprocessing. We will discuss each step in detail below. So, let’s dive in.

Continue reading “Multi Class Classification in Text using R: Predicting Ted Talk Ratings”

Emotions in Ted Talks: Text Analytics in R

Image result for emotions nlp

This post is in continuation with my NLP blog series. You might want to checkout my previous blog in which I discussed data pre-processing in R. In this blog, I will determine the emotions in Ted Talks. At the end, I will compute a HeatMap of emotions and talks to aid in our visualization.

So, without further ado, let’s dive in!

Continue reading “Emotions in Ted Talks: Text Analytics in R”

Data Preprocessing in R

data preprocessing

I have recently got my hands dirty with Natural Language Processing (NLP). I know, it’s a little late to the party but I am at least in the party!

To start with a general overview, I implemented quite a few tasks related to NLP including Text Classification, Document Similarity, Part-of-Speech (POS) Tagging, Emotion Recognition, etc. These tasks were made possible by implementing text pre-processing (noise removal, stemming) and text to features (TF-IDF, N-Grams, Topic Modeling, etc). I implemented these in both R and Python. So, I will try to jot down my experiences in both of these environments. Therefore, I will write this as a blog series, wherein each blog will discuss only one particular thing implemented in one particular environment.

Continue reading “Data Preprocessing in R”

DataKind Singapore

I have recently shifted gears in my life. A shift to academics after spending quite sometime in the industry has been equally exciting and challenging at the same time. Even with all the diversity in the cohort of my own program (in terms of work experience and country of origin) along with a campus hustling with activities all day, I wanted to explore the communities ingrained deep within the culture of Singapore. A natural choice for me, was to look for meetups.

Continue reading “DataKind Singapore”