In my last post reflecting on the year that went by, I shared that I got a new role of Data Product Manager in Citi. Well, it’s almost 4 months for me in this role and I thought of writing another post reflecting on my learnings and experience of first 90 days.
So I thought of breaking down this post and share 7 initial impressions, in the form of learnings. A point to note is that these are based on completely personal experience, subjective opinions derived from being in a Product Management role in a Banking & Financial Services company. Hence, these impressions and learnings that I accrued from my role could vary! So now let’s get into it the meat of the substance.
7 key highlights and learnings of product management role in my first 90 days:
Well you know how much someone procrastinates, when that person writes a reflection of the past year in April of the following year! But to be really honest, I was having a roller-coaster of a ride during the end of 2021 and beginning of 2022. Let’s start!
I recently spoke about Contextual Topic Modeling in NLP, at Google’s La Kopi event for developers. The feedback I received made my talk, a special one. So many folks reached out and mentioned that they found the topic, content and the technique, quite intriguing and helped them approach Topic Modeling in NLP from a different angle. So, I decided to post the talk here and I have also added the transcript below.
Over the last few years, I have presented a lot of projects involving Data Science models, Natural Language Processing to be more precise, to various stakeholders and leadership teams. While it’s super important to convey your technicalities, results and all the hard work you have put in building the Data Science models, visualizations, etc., what’s more important is how you convey those things! In this blog, we will talk about the art of story telling!
A while ago, I wrote about opening up my calendar for mentorship. Soon, quite a few people talked to me about career switches, interviews for university admissions, life in & career opportunities in Singapore, etc. I eventually asked all of them to score my mentoring. While the conversations are definitely very subjective, I decided that the scoring could be objective. I have decided to open source the evaluation criteria that I use for gauging how effective I am as a mentor. Besides, I will also put up and be very transparent about the scores that I receive, where am I lagging and where I am doing good.
Disclaimer: Even though I realize that feedback is very important, it’s voluntary and optional exercise for the mentees and I do not nudge anyone repeatedly to fill up my evaluation. Thus, I will keep on updating the chart as and when I get more responses.
You will find the evaluation criteria structured in the following way:
This post is a digression from the other data science blogs that I have written in the past and more so, from the work that I do in my day-to-day job. Well, I don’t mean digression in a negative connotation. I enjoyed and learnt so much that I implemented many of the strategies in my own website. In this post, I will be discussing how I did a deep dive on a 1 liner problem statement by my client, “The bounce rate has gone up since last few months from what it was before, Why?” That may seem trivial to investigate and analyze, but the lack of details and granularity, made the problem statement very broad and open ended. Not enough clarity also makes it pretty easy to hit a roadblock very early in the process, especially when you don’t know where to start. फ़िक्र न करें (Fear not)! You will see a structured way to approach such kind of problem statement.
Data cleaning is an essential step to prepare your data for the analysis. While cleaning the data, every now and then, there’s a need to create a new column in the Pandas dataframe. It’s usually conditioned on a function which manipulates an existing column. A strategic way to achieve that is by using Apply function. I want to address a couple of bottlenecks here:
Pandas: The Pandas library runs on a single thread and it doesn’t parallelize the task. Thus, if you are doing lots of computation or data manipulation on your Pandas dataframe, it can be pretty slow and can quickly become a bottleneck.
Apply(): The Pandas apply() function is slow! It does not take the advantage of vectorization and it acts as just another loop. It returns a new Series or dataframe object, which carries significant overhead.
So now, you may ask, what to do and what to use? I am going to share 4 techniques that are alternative to Apply function and are going to improve the performance of operation in Pandas dataframe.
Collocations are phrases or expressions containing multiple words, that are highly likely to co-occur. For example – ‘social media’, ‘school holiday’, ‘machine learning’, ‘Universal Studios Singapore’, etc.
Kids of 8-10 years of age are incredibly smart who are treading high on the curve of curiosity and learning. Thus, it’s equally challenging to teach such kids. Did I just write challenging? Did I not mention that I feel a strange pull for anything challenging? Jokes apart, in June I came across an opportunity to teach Python/Scratch to kids in Singapore. The program briefed a 10 week Code in the Community program run by Saturday Kids in collaboration with Google. This post is an account of my experience and learnings throughout these 10 weeks with Saturday Kids.
Time series, a series of data points ordered in time. Pretty intuitive, isn’t it? Time series analysis helps in businesses in analyzing past data, predict trends, seasonality, and numerous other use cases. Some examples of time series analysis in our day to day lives include:
Measuring number of taxi rides
In this blog, we will be dealing with stock market data and will be using Python 3, Pandas and Matplotlib.