Sven Mika

08/19/22 • 34 min

TalkRL: The Reinforcement Learning Podcast

Sven Mika is the Reinforcement Learning Team Lead at Anyscale, and lead committer of RLlib. He holds a PhD in biomathematics, bioinformatics, and computational biology from Witten/Herdecke University.

Featured References

RLlib Documentation: RLlib: Industry-Grade Reinforcement Learning

Ray: Documentation

RLlib: Abstractions for Distributed Reinforcement Learning
Eric Liang, Richard Liaw, Philipp Moritz, Robert Nishihara, Roy Fox, Ken Goldberg, Joseph E. Gonzalez, Michael I. Jordan, Ion Stoica

Episode sponsor: Anyscale
Ray Summit 2022 is coming to San Francisco on August 23-24.
Hear how teams at Dow, Verizon, Riot Games, and more are solving their RL challenges with Ray's RLlib.

Featured References

RLlib Documentation: RLlib: Industry-Grade Reinforcement Learning

Ray: Documentation

RLlib: Abstractions for Distributed Reinforcement Learning
Eric Liang, Richard Liaw, Philipp Moritz, Robert Nishihara, Roy Fox, Ken Goldberg, Joseph E. Gonzalez, Michael I. Jordan, Ion Stoica

Episode sponsor: Anyscale
Ray Summit 2022 is coming to San Francisco on August 23-24.
Hear how teams at Dow, Verizon, Riot Games, and more are solving their RL challenges with Ray's RLlib.

Previous Episode

Karol Hausman and Fei Xia

Karol Hausman is a Senior Research Scientist at Google Brain and an Adjunct Professor at Stanford working on robotics and machine learning. Karol is interested in enabling robots to acquire general-purpose skills with minimal supervision in real-world environments.

Fei Xia is a Research Scientist with Google Research. Fei Xia is mostly interested in robot learning in complex and unstructured environments. Previously he has been approaching this problem by learning in realistic and scalable simulation environments (GibsonEnv, iGibson). Most recently, he has been exploring using foundation models for those challenges.

Featured References

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances [ website ]
Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-Huei Lee, Sergey Levine, Yao Lu, Linda Luu, Carolina Parada, Peter Pastor, Jornell Quiambao, Kanishka Rao, Jarek Rettinghouse, Diego Reyes, Pierre Sermanet, Nicolas Sievers, Clayton Tan, Alexander Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Mengyuan Yan

Inner Monologue: Embodied Reasoning through Planning with Language Models
Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, Brian Ichter

Additional References

Large-scale simulation for embodied perception and robot learning, Xia 2021
QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation, Kalashnikov et al 2018
MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale, Kalashnikov et al 2021
ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile Manipulation, Xia et al 2020
Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills, Chebotar et al 2021
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language, Zeng et al 2022

Episode sponsor: Anyscale
Ray Summit 2022 is coming to San Francisco on August 23-24.
Hear how teams at Dow, Verizon, Riot Games, and more are solving their RL challenges with Ray's RLlib.

Next Episode

John Schulman

John Schulman is a cofounder of OpenAI, and currently a researcher and engineer at OpenAI.

Featured References

WebGPT: Browser-assisted question-answering with human feedback
Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman

Training language models to follow instructions with human feedback
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe

Additional References

Our approach to alignment research, OpenAI 2022
Training Verifiers to Solve Math Word Problems, Cobbe et al 2021
UC Berkeley Deep RL Bootcamp Lecture 6: Nuts and Bolts of Deep RL Experimentation, John Schulman 2017
Proximal Policy Optimization Algorithms, Schulman 2017
Optimizing Expectations: From Deep Reinforcement Learning to Stochastic Computation Graphs, Schulman 2016

TalkRL: The Reinforcement Learning Podcast - Sven Mika

Transcript

Sven00:00:00.880

There's a rise in interest in our finance. We have JPM for example, as well as other companies that we're seeing moving into the space, and trying RL on financial decision making. Ray was actually developed, because of the need, to write a reinforcement learning library. TalkRL.

Robin00:00:20.620

R l. Talk r l podcast is all reinforcement learning all the time, featuring brilliant guests, both research and applied. Join