I have recently shifted gears in my life. A shift to academics after spending quite sometime in the industry has been equally exciting and challenging at the same time. Even with all the diversity in the cohort of my own program (in terms of work experience and country of origin) along with a campus hustling with activities all day, I wanted to explore the communities ingrained deep within the culture of Singapore. A natural choice for me, was to look for meetups.
After subscribing to quite a few meetups of my interest, I finally got a notification of DataKind Singapore meetup. This is one of the few meetups I wanted to attend since quite some time. Due to the fact that its chapters are present only in 5 places (Bangalore, Washington, UK, San Francisco, Singapore), I could not attend any of its meetups in the past. But not this time!
DataKind Singapore Meetup
Datakind brings together data scientists to collaborate and solve critical humanitarian issues in the field of education, poverty, health, human rights, and environment through data driven approaches. The meetup took place at Halogen Foundation Singapore and saw participation from about 60 data enthusiasts from various organizations and universities. There are Project Accelerators – NGOs who present their problems and bottlenecks. Then there are DataJam, DataDives and DataCorps events which focus on solving specific problems from accelerators over an evening or a weekend or even an entire month.
Session and Project Overview
In the meetup that I attended, we had Halogen Foundation, Waterpoint Data Exchange (WPDx), Raffles’ Banded Langur Working Group. Halogen offers various leadership and mentoring programmes to equip youth with mindsets and skills to excel in future. They wanted us to come up with a strategy of analyzing their surveys done across their beneficiaries. The survey analysis would have helped them to establish benchmarks for their programmes. WPDx is a community which establishes coordination among organizations in the Water Access, Sanitation and Hygiene (WASH) community. WPDx is currently the largest public repository of water point data. They incorporate data science to improve water services through evidence based decision making. In the meetup, they had several tasks in hand including data cleaning, exploratory data analysis and modeling for risk of failure for data points. The Raffles’ Banded Langur Working Group is working to protect the extremely endangered species of Raffles’ Banded Langur. They leverage machine learning and openCV to identify individual monkey faces. Since it’s almost entirely impossible for the conservationist to tag the monkeys, they use professional cameras to capture their photos (for tagging purpose). Our task was to identify each monkey and track its age from infant to juvenile to adulthood.
My Contribution
I was subsequently grouped with the WPDx team. Our work initially was to clean the dataset and subsequently perform EDA (Exploratory Data Analysis) followed by data modeling. The huge dataset consisting of about 500,000 rows and 30 features, had ample data anomalies and inconsistencies. We were first assigned to analyze and clean the data. I analyzed and explored several features in the dataset and found inconsistencies in the data such as spelling mistakes, presence of escape characters, incorrect date-time format etc. After due validation and testing, I wrote the functions in a python script and raised its PR (Pull Request) to be merged in the organization’s code base.
The comprehensive project documentation, data dictionaries, trello boards, python scripts (including tests) helped in the process. The volunteers, subsequently moved on to data exploration part as well.
What seemed to be the longest meetup that I attended (>2.5 hours), we were hoping for some more time at the end to push our work to production. The coordinators were very organized and had great documentation to make the process of data volunteering as seamless as possible. Looking forward for the next meetup!
I write about Software Engineering and Data Science. My professional interests include data science, business analytics, product management, tech, and design thinking. Please read my other posts here.