TalkRL: The Reinforcement Learning Podcast - Natasha Jaques 2

Natasha Jaques 2

03/14/23 • 46 min

TalkRL: The Reinforcement Learning Podcast

Hear about why OpenAI cites her work in RLHF and dialog models, approaches to rewards in RLHF, ChatGPT, Industry vs Academia, PsiPhi-Learning, AGI and more!

Dr Natasha Jaques is a Senior Research Scientist at Google Brain.

Featured References

Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog
Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard
Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control
Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, José Miguel Hernández-Lobato, Richard E. Turner, Douglas Eck
PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning
Angelos Filos, Clare Lyle, Yarin Gal, Sergey Levine, Natasha Jaques, Gregory Farquhar

Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience
Marwa Abdulhai, Natasha Jaques, Sergey Levine

Additional References

Fine-Tuning Language Models from Human Preferences, Daniel M. Ziegler et al 2019
Learning to summarize from human feedback, Nisan Stiennon et al 2020
Training language models to follow instructions with human feedback, Long Ouyang et al 2022

Hear about why OpenAI cites her work in RLHF and dialog models, approaches to rewards in RLHF, ChatGPT, Industry vs Academia, PsiPhi-Learning, AGI and more!

Dr Natasha Jaques is a Senior Research Scientist at Google Brain.

Featured References

Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience
Marwa Abdulhai, Natasha Jaques, Sergey Levine

Additional References

Fine-Tuning Language Models from Human Preferences, Daniel M. Ziegler et al 2019
Learning to summarize from human feedback, Nisan Stiennon et al 2020
Training language models to follow instructions with human feedback, Long Ouyang et al 2022

Previous Episode

Jacob Beck and Risto Vuorio

Jacob Beck and Risto Vuorio on their recent Survey of Meta-Reinforcement Learning. Jacob and Risto are Ph.D. students at Whiteson Research Lab at University of Oxford.

Featured Reference

A Survey of Meta-Reinforcement Learning
Jacob Beck, Risto Vuorio, Evan Zheran Liu, Zheng Xiong, Luisa Zintgraf, Chelsea Finn, Shimon Whiteson

Additional References

VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning, Luisa Zintgraf et al
Mastering Diverse Domains through World Models (Dreamerv3), Hafner et al
Unsupervised Meta-Learning for Reinforcement Learning (MAML), Gupta et al
Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices (DREAM), Liu et al
RL2: Fast Reinforcement Learning via Slow Reinforcement Learning, Duan et al
Learning to reinforcement learn, Wang et al

Next Episode

Jeff Clune

AI Generating Algos, Learning to play Minecraft with Video PreTraining (VPT), Go-Explore for hard exploration, POET and Open Endedness, AI-GAs and ChatGPT, AGI predictions, and lots more!

Professor Jeff Clune is Associate Professor of Computer Science at University of British Columbia, a Canada CIFAR AI Chair and Faculty Member at Vector Institute, and Senior Research Advisor at DeepMind.

Featured References

Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos [ Blog Post ]
Bowen Baker, Ilge Akkaya, Peter Zhokhov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, Jeff Clune

Robots that can adapt like animals
Antoine Cully, Jeff Clune, Danesh Tarapore, Jean-Baptiste Mouret

Illuminating search spaces by mapping elites
Jean-Baptiste Mouret, Jeff Clune
Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions
Rui Wang, Joel Lehman, Aditya Rawal, Jiale Zhi, Yulun Li, Jeff Clune, Kenneth O. Stanley

Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions
Rui Wang, Joel Lehman, Jeff Clune, Kenneth O. Stanley

First return, then explore
Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O. Stanley, Jeff Clune