PyTorch Developer Podcast

Edward Yang, Team PyTorch

The PyTorch Developer Podcast is a place for the PyTorch dev team to do bite sized (10-20 min) topics about all sorts of internal development topics in PyTorch.

All episodes

Best episodes

Top 10 PyTorch Developer Podcast Episodes

Goodpods has curated a list of the 10 best PyTorch Developer Podcast episodes, ranked by the number of listens and likes each episode have garnered from our listeners. If you are listening to PyTorch Developer Podcast for the first time, there's no better place to start than with one of these standout episodes. If you are a fan of the show, vote for your favorite PyTorch Developer Podcast episode by adding your comments to the episode page.

Inductor - Post-grad FX passes

PyTorch Developer Podcast

04/12/24 • 24 min

The post-grad FX passes in Inductor run after AOTAutograd has functionalized and normalized the input program into separate forward/backward graphs. As such, they generally can assume that the graph in question is functionalized, except for some mutations to inputs at the end of the graph. At the end of post-grad passes, there are special passes that reintroduce mutation into the graph before going into the rest of Inductor lowering which is generally aware of passes. The post-grad FX passes are varied but are typically domain specific passes making local changes to specific parts of the graph.

CUDA graph trees

PyTorch Developer Podcast

03/24/24 • 20 min

CUDA graph trees are the internal implementation of CUDA graphs used in PT2 when you say mode="reduce-overhead". Their primary innovation is that they allow the reuse of memory across multiple CUDA graphs, as long as they form a tree structure of potential paths you can go down with the CUDA graph. This greatly reduced the memory usage of CUDA graphs in PT2. There are some operational implications to using CUDA graphs which are described in the podcast.

Min-cut partitioner

PyTorch Developer Podcast

03/17/24 • 15 min

The min-cut partitioner makes decisions about what to save for backwards when splitting the forward and backwards graph from the joint graph traced by AOTAutograd. Crucially, it doesn't actually do a "split"; instead, it is deciding how much of the joint graph should be used for backwards. I also talk about the backward retracing problem.

TH

PyTorch Developer Podcast

06/16/21 • 11 min

What is TH? Why might you care? What is so horrible about it? What the heck is the generic/ folder? Why are we porting everything to C++? What are some downsides of having ported all our TH code to C++?

Further reading.

The TH to ATen porting guide has lots of explanations of old school TH idioms https://github.com/pytorch/pytorch/wiki/TH-to-ATen-porting-guide
Old notes about refcounting in TH https://github.com/pytorch/pytorch/blob/master/aten/src/README.md

TorchScript

PyTorch Developer Podcast

06/15/21 • 19 min

There is a really good TorchScript overview at https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/OVERVIEW.md and in this 20min podcast, I want to give you some of the highlights from this document.

CMake

PyTorch Developer Podcast

06/14/21 • 17 min

Why is PyTorch's build so g-dang complicated. How to avoid having to deal with cmake at all? And if you have to deal with cmake, what are the most important things to know? And if you were going to improve our cmake, how would you go about doing it...

Further reading.

The official CMake documentation is a great help and well worth reading https://cmake.org/documentation
If you work in torch/csrc chances are you'll need to edit this file https://github.com/pytorch/pytorch/blob/master/tools/build_variables.bzl

Liner notes.

multiple build systems: cmake, buck, xplat buck, ovrsource buck, bazel
- tools/build_variables.bzl is read from cmake! append_filelist
  - but not used uniformly for all components! (ouch!)
mashed together ATen and Caffe2 build systems (e.g., main library libtorch_cpu is defined in caffe2/CMakeLists.txt)
cmake: not very much syntax, "everything is a function". This means you can look up constructs relatively easily; e.g., even if() is a command
the general cmake model: "set a bunch of variables, run a bunch of commands". cmake is VERY GREPPABLE
- but not everything is in CMakeLists.txt; check *.cmake too
- the directory structure makes no sense, you really need to grep.
  (doing a lot of set PARENT_SCOPE to propagate stuff)
- renaming a file? grep for it
- primary hazard of refactoring: need to make sure all the variables
  are setup at the new location
many directories are not recursive glob, beware of adding new directories
old school cmake: literally everything is stuffed in variables (CMAKE_CXX_FLAGS). new school cmake: attach things to targets, things propagate when you depend on targets (public/private dependencies)
add_library: the most important thing
don't randomly change things and pray: have hypotheses and test them

Code generation

PyTorch Developer Podcast

06/04/21 • 16 min

Why does PyTorch use code generation as part of its build process? Why doesn't it use C++ templates? What things is code generation used for? What are the pros/consof using code generation? What are some other ways to do the same things we currently do with code generation?

Further reading.

Top level file for the new code generation pipeline https://github.com/pytorch/pytorch/blob/master/tools/codegen/gen.py
Out of tree external backend code generation from Brian Hirsh: https://github.com/pytorch/xla/issues/2871
Documentation for native_functions.yaml https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/README.md (have you seen this README before? Yes you've seen this README before. Imma post it again.)

Outline:

High level: reduce the amount of code in PyTorch, easier to develop
Strongly typed python
Stuff we're using codegen for
- Meta point: stuff c++ metaprogramming can't do
- C++ apis (functions, methods on classes)
  - Especially for forwarding (operator dot doko)
  - Prototypes for c++ to implement
- YAML files used by external frameworks for binding (accidental)
- Python arg parsing
- pyi generation
- Autograd classes for saving saved data
- Otherwise complicated constexpr computation (e.g., parsing JIT
  schema)
Pros
- Better surface syntax (native_functions.yaml, jit schema,
  derivatives.yaml)
- Better error messages (template messages famously bad)
- Easier to organize complicated code; esp nontrivial input
  data structure
- Easier to debug by looking at generated code
Con
- Not as portable (template can be used by anyone)
- Less good modeling for C++ type based metaprogramming (we've replicated a crappy version of C++ type system in our codegen)
Counterpoints in the design space
- C++ templates: just as efficient
- Boxed fallback: simpler, less efficient
Open question: can you have best of both worlds, e.g., with partially evaluated interpreters?

Why is autograd so complicated

PyTorch Developer Podcast

06/03/21 • 15 min

Why is autograd so complicated? What are the constraints and features that go into making it complicated? What's up with it being written in C++? What's with derivatives.yaml and code generation? What's going on with views and mutation? What's up with hooks and anomaly mode? What's reentrant execution? Why is it relevant to checkpointing? What's the distributed autograd engine?

Further reading.

Autograd notes in the docs https://pytorch.org/docs/stable/notes/autograd.html
derivatives.yaml https://github.com/pytorch/pytorch/blob/master/tools/autograd/derivatives.yaml
Paper on autograd engine in PyTorch https://openreview.net/pdf/25b8eee6c373d48b84e5e9c6e10e7cbbbce4ac73.pdf

__torch_function__

PyTorch Developer Podcast

06/02/21 • 17 min

What is __torch_function__? Why would I want to use it? What does it have to do with keeping extra metadata on Tensors or torch.fx? How is it implemented? Why is __torch_function__ a really popular way of extending functionality in PyTorch? What makes it different from the dispatcher extensibility mechanism? What are some downsides of it being written this way? What are we doing about it?

Further reading.

__torch_function__ RFC: https://github.com/pytorch/rfcs/blob/master/RFC-0001-torch-function-for-methods.md
One of the original GitHub issues tracking the overall design discussion https://github.com/pytorch/pytorch/issues/24015
Documentation for using __torch_function__ https://pytorch.org/docs/stable/notes/extending.html#extending-torch

Higher order operators

PyTorch Developer Podcast

04/21/24 • 17 min

Higher order operators are a special form of operators in torch.ops which have relaxed input argument requirements: in particular, they can accept any form of argument, including Python callables. Their name is based off of their most common use case, which is to represent higher order functions like control flow operators. However, they are also used to implement other variants of basic operators and can also be used to smuggle in Python data that is quite unusual. They are implemented using a Python dispatcher.

Show more best episodes

FAQ

How many episodes does PyTorch Developer Podcast have?

PyTorch Developer Podcast currently has 83 episodes available.

What topics does PyTorch Developer Podcast cover?

The podcast is about Deep Learning, Podcasts, Technology and Machine Learning.

What is the most popular episode on PyTorch Developer Podcast?

The episode title 'Higher order operators' is the most popular.

What is the average episode length on PyTorch Developer Podcast?

The average episode length on PyTorch Developer Podcast is 16 minutes.

How often are episodes of PyTorch Developer Podcast released?

Episodes of PyTorch Developer Podcast are typically released every 3 days.

When was the first episode of PyTorch Developer Podcast?

The first episode of PyTorch Developer Podcast was released on May 4, 2021.

Show more FAQ

PyTorch Developer Podcast

Edward Yang, Team PyTorch

Top 10 PyTorch Developer Podcast Episodes

FAQ

How many episodes does PyTorch Developer Podcast have?

What topics does PyTorch Developer Podcast cover?

What is the most popular episode on PyTorch Developer Podcast?

What is the average episode length on PyTorch Developer Podcast?

How often are episodes of PyTorch Developer Podcast released?

When was the first episode of PyTorch Developer Podcast?

Comments