Blog

Emotions in Ted Talks: Text Analytics in R

Image result for emotions nlp

This post is in continuation with my NLP blog series. You might want to checkout my previous blog in which I discussed data pre-processing in R. In this blog, I will determine the emotions in Ted Talks. At the end, I will compute a HeatMap of emotions and talks to aid in our visualization.

So, without further ado, let’s dive in!

Continue reading “Emotions in Ted Talks: Text Analytics in R”

Data Preprocessing in R

data preprocessing

I have recently got my hands dirty with Natural Language Processing (NLP). I know, it’s a little late to the party but I am at least in the party!

To start with a general overview, I implemented quite a few tasks related to NLP including Text Classification, Document Similarity, Part-of-Speech (POS) Tagging, Emotion Recognition, etc. These tasks were made possible by implementing text pre-processing (noise removal, stemming) and text to features (TF-IDF, N-Grams, Topic Modeling, etc). I implemented these in both R and Python. So, I will try to jot down my experiences in both of these environments. Therefore, I will write this as a blog series, wherein each blog will discuss only one particular thing implemented in one particular environment.

Continue reading “Data Preprocessing in R”

DataKind Singapore

I have recently shifted gears in my life. A shift to academics after spending quite sometime in the industry has been equally exciting and challenging at the same time. Even with all the diversity in the cohort of my own program (in terms of work experience and country of origin) along with a campus hustling with activities all day, I wanted to explore the communities ingrained deep within the culture of Singapore. A natural choice for me, was to look for meetups.

Continue reading “DataKind Singapore”

My learning experience with Google’s Machine Learning Crash Course

Google’s Machine Learning Crash Course (MLCC)

I came to know about Google’s Machine Learning Crash Course (MLCC) from Sundar Pichai’s tweet. I then enquired about it with some close acquaintances working in Google. I was soon pretty convinced of pursuing this course, after their good words about it and my own research on the course content. This post is going to be an account of my learnings from MLCC. I will structure the learnings in such a way that it will look more like a review. I will also include what I really liked about the course and things which I think they can possibly improve, if the creators are planning to update the course content. So, let’s get started!

Continue reading “My learning experience with Google’s Machine Learning Crash Course”

Engineering challenges of streaming a million concurrent JSON data streams from product to CRM

At Truebil, I was fortunate enough to be given an opportunity to solve a unique engineering problem. We had already outsourced CRM development to a third party but I had to integrate the data flow to CRM from our product and back. I mentioned earlier that I was given a unique engineering problem because of the challenges it posed. The challenge didn’t lie in the CRM integration alone, but the fact that Truebil has umpteen number of in-house products which spawn data close to about 1 Million data streams per hour. Besides building a bastion of such magnitude, I knew that it would be equally challenging to work with three different verticals and stakeholders. This post is going to be an account of my experiences dealing with two things. First, how Truebil catered to the transfer of a million JSON data stream to and from CRM without hampering its operations and customer support. The second experience deals in the art of working with different verticals. Not that I have mastered the art or something, but I will share my own experiences and learnings in this post.

  

Continue reading “Engineering challenges of streaming a million concurrent JSON data streams from product to CRM”