Data cleaning is an essential step to prepare your data for the analysis. While cleaning the data, every now and then, there’s a need to create a new column in the Pandas dataframe. It’s usually conditioned on a function which manipulates an existing column. A strategic way to achieve that is by using Apply function. I want to address a couple of bottlenecks here:
- Pandas: The Pandas library runs on a single thread and it doesn’t parallelize the task. Thus, if you are doing lots of computation or data manipulation on your Pandas dataframe, it can be pretty slow and can quickly become a bottleneck.
- Apply(): The Pandas apply() function is slow! It does not take the advantage of vectorization and it acts as just another loop. It returns a new Series or dataframe object, which carries significant overhead.
So now, you may ask, what to do and what to use? I am going to share 4 techniques that are alternative to Apply function and are going to improve the performance of operation in Pandas dataframe.
Continue reading “Speed Up Pandas Dataframe Apply Function to Create a New Column”