Chin Hwee Ong - @ongchinhwee - Home page

Speed Up Your Data Processing: Parallel and Asynchronous Programming in Data Science

Abstract

Are you constantly waiting for your data processing code to finish executing? Through real-life stories, we will explore how to leverage parallel and asynchronous programming in Python to speed up your data processing pipelines - so that you can focus on getting value out of your data.

Description

In any data-intensive application, one of the biggest bottlenecks (in terms of time) is the constant wait for the data processing code to finish executing. Slow code, as well as connectivity issues, affect every step of a typical data science workflow — be it for event-driven I/O operations or computation-driven workloads.

In this talk, I will be exploring what we can do to speed up our code while sharing my own real-life experience of working in a young data science team. I will talk about:

  1. Sequential vs parallel processing,
  2. Synchronous vs asynchronous execution,
  3. Event-driven I/O operations vs computation-driven workloads in a data science workflow,
  4. When is parallelism and asynchronous programming a good idea,
  5. How to implement asynchronous programming to speed up your data processing pipelines

Bio

Ong Chin Hwee is a data engineer, aspiring polymath and Industry 4.0 enthusiast who happens to be interested in things that fly (and stuff that burns to keep things flying). Hailing from a background in aerospace engineering and computational modelling, Chin Hwee has experience working on innovative projects in collaboration with academia and industry partners. Chin Hwee is a contributor to pandas and enjoys sharing her experiences at meetups and conferences.