Log in

goodpods headphones icon

To access all our features

Open the Goodpods app
Close icon
The Real Python Podcast - Docker + Python for Data Science and Machine Learning

Docker + Python for Data Science and Machine Learning

05/08/20 • 55 min

The Real Python Podcast

Docker is a common tool for Python developers creating and deploying applications, but what do you need to know if you want to use Docker for data science and machine learning? What are the best practices if you want to start using containers for your scientific projects? This week we have Tania Allard on the show. She is a Sr. Developer Advocate at Microsoft focusing on Machine Learning, scientific computing, research and open source.

Tania has created a talk for the PyCon US 2020 which is now online. The talk is titled “Docker and Python: Making them Play Nicely and Securely for Data Science and ML.” Her talk draws on her expertise in the improvement of processes, reproducibility and transparency in research and data science. We discuss a variety of tools for making your containers more secure and results reproducible.

Tania is passionate about mentoring, open-source, and its community. She is an organizer for Mentored Sprints for Diverse Beginners, and she talks about the upcoming online sprints for PyCon US 2020. We also discuss her plans to start a podcast.

Topics:

  • 00:00:00 – Introduction
  • 00:01:43 – Microsoft Senior Developer Advocate Role
  • 00:04:07 – PyCon 2020 Talk - Docker and Python: making them play nicely
  • 00:05:34 – What is Docker?
  • 00:10:08 – Reproducibility of project results
  • 00:12:03 – What are the challenges of using Docker for machine learning?
  • 00:15:06 – Getting started suggestions
  • 00:16:26 – What metadata should be included?
  • 00:17:48 – Creating images through stages
  • 00:21:16 – What about your data?
  • 00:22:40 – Kubernetes: Orchestrating containers
  • 00:24:37 – Continuing stages into testing
  • 00:25:37 – What are tools for testing security?
  • 00:27:07 – Challenges in using containers for ML
  • 00:28:52 – What types of databases?
  • 00:29:39 – Are you doing initial research on a local machine?
  • 00:30:59 – An example of a recent ML project
  • 00:32:16 – Papermill: parameterizing and executing notebooks
  • 00:33:16 – NLP: Natural Language Processing
  • 00:33:58 – Kaggle: Help us better understand COVID-19
  • 00:34:42 – What are other best practices for data intensive projects?
  • 00:39:13 – Resources to get started in machine learning?
  • 00:40:30 – Mentored Sprints for Diverse Beginners
  • 00:45:34 – Tania’s upcoming podcast
  • 00:48:38 – A visiting fellow at the Alan Turing Institute
  • 00:49:08 – Weight lifting
  • 00:50:16 – Craft beer
  • 00:52:09 – What is something you thought you knew in Python but were wrong about?
  • 00:53:50 – What are excited about in the world of Python?
  • 00:54:42 – Thank you and Goodbye

Show links:

plus icon
bookmark

Docker is a common tool for Python developers creating and deploying applications, but what do you need to know if you want to use Docker for data science and machine learning? What are the best practices if you want to start using containers for your scientific projects? This week we have Tania Allard on the show. She is a Sr. Developer Advocate at Microsoft focusing on Machine Learning, scientific computing, research and open source.

Tania has created a talk for the PyCon US 2020 which is now online. The talk is titled “Docker and Python: Making them Play Nicely and Securely for Data Science and ML.” Her talk draws on her expertise in the improvement of processes, reproducibility and transparency in research and data science. We discuss a variety of tools for making your containers more secure and results reproducible.

Tania is passionate about mentoring, open-source, and its community. She is an organizer for Mentored Sprints for Diverse Beginners, and she talks about the upcoming online sprints for PyCon US 2020. We also discuss her plans to start a podcast.

Topics:

  • 00:00:00 – Introduction
  • 00:01:43 – Microsoft Senior Developer Advocate Role
  • 00:04:07 – PyCon 2020 Talk - Docker and Python: making them play nicely
  • 00:05:34 – What is Docker?
  • 00:10:08 – Reproducibility of project results
  • 00:12:03 – What are the challenges of using Docker for machine learning?
  • 00:15:06 – Getting started suggestions
  • 00:16:26 – What metadata should be included?
  • 00:17:48 – Creating images through stages
  • 00:21:16 – What about your data?
  • 00:22:40 – Kubernetes: Orchestrating containers
  • 00:24:37 – Continuing stages into testing
  • 00:25:37 – What are tools for testing security?
  • 00:27:07 – Challenges in using containers for ML
  • 00:28:52 – What types of databases?
  • 00:29:39 – Are you doing initial research on a local machine?
  • 00:30:59 – An example of a recent ML project
  • 00:32:16 – Papermill: parameterizing and executing notebooks
  • 00:33:16 – NLP: Natural Language Processing
  • 00:33:58 – Kaggle: Help us better understand COVID-19
  • 00:34:42 – What are other best practices for data intensive projects?
  • 00:39:13 – Resources to get started in machine learning?
  • 00:40:30 – Mentored Sprints for Diverse Beginners
  • 00:45:34 – Tania’s upcoming podcast
  • 00:48:38 – A visiting fellow at the Alan Turing Institute
  • 00:49:08 – Weight lifting
  • 00:50:16 – Craft beer
  • 00:52:09 – What is something you thought you knew in Python but were wrong about?
  • 00:53:50 – What are excited about in the world of Python?
  • 00:54:42 – Thank you and Goodbye

Show links:

Previous Episode

undefined - AsyncIO + Music, Origins of Black, and Managing Python Releases

AsyncIO + Music, Origins of Black, and Managing Python Releases

Want to learn more about AsyncIO in Python, with an example where you can see and hear events being triggered in real-time? This week we have Łukasz Langa on the show. Łukasz has created a talk for PyCon 2020 online about using AsyncIO with Music.

In his talk he shows live examples of coroutines, gathering, the event loop and events being triggered to create a piece of music. We also talk about his role as the release manager for Python 3.8 and 3.9. Łukasz provides background on the origins of his very popular, uncompromising code formatter, Black, and the types of problems it can solve inside of an organization.

Łukasz previously worked for Facebook, which is where he started Black. He talks about recently moving back to Poland. We discuss his current work for Edge DB, building a new generation object-relational database.

Topics:

  • 00:00:00 – Introduction
  • 00:01:32 – Łukasz’s background
  • 00:03:22 – Leaving Facebook and moving back to Poland
  • 00:05:26 – Starting work with EdgeDB
  • 00:06:07 – What is Edge DB?
  • 00:12:28 – AsyncIO + Music PyCon 2020 talk
  • 00:18:56 – More AsyncIO resources
  • 00:23:36 – Comparing the event loop to a game loop
  • 00:27:12 – Coroutines and gather
  • 00:30:00 – A conversation with Glyph
  • 00:33:40 – Bigger ideas for the AsyncIO MIDI sequencer
  • 00:35:41 – Using uvloop as a replacement for the built-in reference AsyncIO loop
  • 00:39:13 – Thoughts on MIDI 2.0
  • 00:46:30 – Origins of Black
  • 00:53:51 – Black grows in popularity
  • 00:58:35 – What is involved in being the Python 3.9 release manager?
  • 01:02:22 – The Python language summit
  • 01:07:44 – Is the beta on schedule?
  • 01:09:27 – How did you get the role of Release Manager?
  • 01:15:09 – What are you excited about in the world of Python?
  • 01:19:02 – If you were learning Python from scratch, what would do differently?
  • 01:22:18 – What is something you thought you knew about Python, but were wrong about?
  • 01:26:05 – Goodbye and Thanks

Show links:

Level up your Python skills with our expert-led courses:

Next Episode

undefined - Leveling Up Your Python Literacy and Finding Python Projects to Study

Leveling Up Your Python Literacy and Finding Python Projects to Study

In your quest to become a better developer, how do you find Python code that is at your reading level? What are good code bases or projects to study? What are the things holding you back from leveling up your Python literacy? This week we have Cecil Phillip on the show to discuss all of these common questions. Cecil is a Senior Cloud Advocate at Microsoft.

Cecil has been learning Python in the open on Twitch with Brian Clark. They run a weekly event on Twitch, where they are live-streaming an interactive Python course. Cecil has a background in multiple languages and technologies, and now he’s learning Python, bringing an audience along the way!

We start things off with a listener question and jump into a conversation about building up your Python skills. Then we’ll discuss common Python language stumbling blocks. Next we consider the importance of making personal projects, and documenting that code.

We also touch on some unique skills employers are looking for. And we discuss working through impostor syndrome. Cecil talks about his podcast “Away from the Keyboard” and his plans to start it back up.

In the show notes this week you’ll find links to resources we discuss, and several more that we didn’t have time to cover individually.

Want your question featured on the show? Send us your question at realpython.com/podcast-question and we might feature it on a future episode of the show.

Topics:

  • 00:00:00 – Intro
  • 00:01:52 – Cecil’s role at Microsoft
  • 00:03:35 – Twitch Stream with Brian Clark
  • 00:05:07 – Learning in front of an audience
  • 00:13:05 – Listener’s question
  • 00:14:46 – Finding code that’s at your level
  • 00:20:31 – Understanding more complex syntax in Python
  • 00:23:40 – Breaking down complexity
  • 00:29:17 – Translation of code
  • 00:31:55 – Importance of making projects and comments
  • 00:36:28 – Finding community
  • 00:41:23 – Open source contributing
  • 00:42:25 – Dealing with impostor syndrome
  • 00:49:09 – Looking for that first position
  • 01:00:58 – More project resources in show notes
  • 01:02:55 – Cecil’s podcast - Away from the keyboard
  • 01:08:29 – What are you excited about in the world of Python?
  • 01:10:14 – What is something you thought you knew about Python but were wrong about it?
  • 01:12:01 – What’s the next thing you want to learn in Python?
  • 01:13:37 – Read the actual Python docs
  • 01:15:24 – Thanks and goodbye

Show links:

Suggested project reading list:

Level up your Python skills with our expert-led courses:

Episode Comments

Featured in these lists

Generate a badge

Get a badge for your website that links back to this episode

Select type & size
Open dropdown icon
share badge image

<a href="https://goodpods.com/podcasts/the-real-python-podcast-186798/docker-python-for-data-science-and-machine-learning-17007732"> <img src="https://storage.googleapis.com/goodpods-images-bucket/badges/generic-badge-1.svg" alt="listen to docker + python for data science and machine learning on goodpods" style="width: 225px" /> </a>

Copy