Log in

goodpods headphones icon

To access all our features

Open the Goodpods app
Close icon
Artificial General Intelligence (AGI) Show with Soroush Pour - Ep 10 - Accelerated training to become an AI safety researcher w/ Ryan Kidd (Co-Director, MATS)

Ep 10 - Accelerated training to become an AI safety researcher w/ Ryan Kidd (Co-Director, MATS)

11/08/23 • 76 min

Artificial General Intelligence (AGI) Show with Soroush Pour

We speak with Ryan Kidd, Co-Director at ML Alignment & Theory Scholars (MATS) program, previously "SERI MATS".
MATS (https://www.matsprogram.org/) provides research mentorship, technical seminars, and connections to help new AI researchers get established and start producing impactful research towards AI safety & alignment.
Prior to MATS, Ryan completed a PhD in Physics at the University of Queensland (UQ) in Australia.
We talk about:
* What the MATS program is
* Who should apply to MATS (next *deadline*: Nov 17 midnight PT)
* Research directions being explored by MATS mentors, now and in the past
* Promising alignment research directions & ecosystem gaps , in Ryan's view
Hosted by Soroush Pour. Follow me for more AGI content:
* Twitter: https://twitter.com/soroushjp
* LinkedIn: https://www.linkedin.com/in/soroushjp/
== Show links ==
-- About Ryan --
* Twitter: https://twitter.com/ryan_kidd44
* LinkedIn: https://www.linkedin.com/in/ryan-kidd-1b0574a3/
* MATS: https://www.matsprogram.org/
* LISA: https://www.safeai.org.uk/
* Manifold: https://manifold.markets/
-- Further resources --
* Book: “The Precipice” - https://theprecipice.com/
* Ikigai - https://en.wikipedia.org/wiki/Ikigai
* Fermi paradox - https://en.wikipedia.org/wiki/Fermi_p...
* Ajeya Contra - Bioanchors - https://www.cold-takes.com/forecastin...
* Chomsky hierarchy & LLM transformers paper + external memory - https://en.wikipedia.org/wiki/Chomsky...
* AutoGPT - https://en.wikipedia.org/wiki/Auto-GPT
* BabyAGI - https://github.com/yoheinakajima/babyagi
* Unilateralist's curse - https://forum.effectivealtruism.org/t...
* Jeffrey Ladish & team - fine tuning to remove LLM safeguards - https://www.alignmentforum.org/posts/...
* Epoch AI trends - https://epochai.org/trends
* The demon "Moloch" - https://slatestarcodex.com/2014/07/30...
* AI safety fundamentals course - https://aisafetyfundamentals.com/
* Anthropic sycophancy paper - https://www.anthropic.com/index/towar...
* Promising technical alignment research directions
* Scalable oversight
* Recursive reward modelling - https://deepmindsafetyresearch.medium...
* RLHF - could work for a while, but unlikely forever as we scale
* Interpretability
* Mechanistic interpretability
* Paper: GPT4 labelling GPT2 - https://openai.com/research/language-...
* Concept based interpretability
* Rome paper - https://rome.baulab.info/
* Developmental interpretability
* devinterp.com - http://devinterp.com
* Timaeus - https://timaeus.co/
* Internal consistency
* Colin Burns research - https://arxiv.org/abs/2212.03827
* Threat modelling / capabilities evaluation & demos
* Paper: Can large language models democratize access to dual-use biotechnology? - https://arxiv.org/abs/2306.03809
* ARC Evals - https://evals.alignment.org/
* Palisade Research - https://palisaderesearch.org/
* Paper: Situational awareness with Owain Evans - https://arxiv.org/abs/2309.00667
* Gradient hacking - https://www.lesswrong.com/posts/uXH4r6MmKPedk8rMA/gradient-hacking
* Past scholar's work
* Apollo Research - https://www.apolloresearch.ai/
* Leap Labs - https://www.leap-labs.com/
* Timaeus - https://timaeus.co/
* Other orgs mentioned
* Redwood Research - https://redwoodresearch.org/
Recorded Oct 25, 2023

plus icon
bookmark

We speak with Ryan Kidd, Co-Director at ML Alignment & Theory Scholars (MATS) program, previously "SERI MATS".
MATS (https://www.matsprogram.org/) provides research mentorship, technical seminars, and connections to help new AI researchers get established and start producing impactful research towards AI safety & alignment.
Prior to MATS, Ryan completed a PhD in Physics at the University of Queensland (UQ) in Australia.
We talk about:
* What the MATS program is
* Who should apply to MATS (next *deadline*: Nov 17 midnight PT)
* Research directions being explored by MATS mentors, now and in the past
* Promising alignment research directions & ecosystem gaps , in Ryan's view
Hosted by Soroush Pour. Follow me for more AGI content:
* Twitter: https://twitter.com/soroushjp
* LinkedIn: https://www.linkedin.com/in/soroushjp/
== Show links ==
-- About Ryan --
* Twitter: https://twitter.com/ryan_kidd44
* LinkedIn: https://www.linkedin.com/in/ryan-kidd-1b0574a3/
* MATS: https://www.matsprogram.org/
* LISA: https://www.safeai.org.uk/
* Manifold: https://manifold.markets/
-- Further resources --
* Book: “The Precipice” - https://theprecipice.com/
* Ikigai - https://en.wikipedia.org/wiki/Ikigai
* Fermi paradox - https://en.wikipedia.org/wiki/Fermi_p...
* Ajeya Contra - Bioanchors - https://www.cold-takes.com/forecastin...
* Chomsky hierarchy & LLM transformers paper + external memory - https://en.wikipedia.org/wiki/Chomsky...
* AutoGPT - https://en.wikipedia.org/wiki/Auto-GPT
* BabyAGI - https://github.com/yoheinakajima/babyagi
* Unilateralist's curse - https://forum.effectivealtruism.org/t...
* Jeffrey Ladish & team - fine tuning to remove LLM safeguards - https://www.alignmentforum.org/posts/...
* Epoch AI trends - https://epochai.org/trends
* The demon "Moloch" - https://slatestarcodex.com/2014/07/30...
* AI safety fundamentals course - https://aisafetyfundamentals.com/
* Anthropic sycophancy paper - https://www.anthropic.com/index/towar...
* Promising technical alignment research directions
* Scalable oversight
* Recursive reward modelling - https://deepmindsafetyresearch.medium...
* RLHF - could work for a while, but unlikely forever as we scale
* Interpretability
* Mechanistic interpretability
* Paper: GPT4 labelling GPT2 - https://openai.com/research/language-...
* Concept based interpretability
* Rome paper - https://rome.baulab.info/
* Developmental interpretability
* devinterp.com - http://devinterp.com
* Timaeus - https://timaeus.co/
* Internal consistency
* Colin Burns research - https://arxiv.org/abs/2212.03827
* Threat modelling / capabilities evaluation & demos
* Paper: Can large language models democratize access to dual-use biotechnology? - https://arxiv.org/abs/2306.03809
* ARC Evals - https://evals.alignment.org/
* Palisade Research - https://palisaderesearch.org/
* Paper: Situational awareness with Owain Evans - https://arxiv.org/abs/2309.00667
* Gradient hacking - https://www.lesswrong.com/posts/uXH4r6MmKPedk8rMA/gradient-hacking
* Past scholar's work
* Apollo Research - https://www.apolloresearch.ai/
* Leap Labs - https://www.leap-labs.com/
* Timaeus - https://timaeus.co/
* Other orgs mentioned
* Redwood Research - https://redwoodresearch.org/
Recorded Oct 25, 2023

Previous Episode

undefined - Ep 9 - Scaling AI safety research w/ Adam Gleave (CEO, FAR AI)

Ep 9 - Scaling AI safety research w/ Adam Gleave (CEO, FAR AI)

We speak with Adam Gleave, CEO of FAR AI (https://far.ai). FAR AI’s mission is to ensure AI systems are trustworthy & beneficial. They incubate & accelerate research that's too resource-intensive for academia but not ready for commercialisation. They work on everything from adversarial robustness, interpretability, preference learning, & more.
We talk to Adam about:
* The founding story of FAR as an AI safety org, and how it's different from the big commercial labs (e.g. OpenAI) and academia.
* Their current research directions & how they're going
* Promising agendas & notable gaps in the AI safety research
Hosted by Soroush Pour. Follow me for more AGI content:
Twitter: https://twitter.com/soroushjp
LinkedIn: https://www.linkedin.com/in/soroushjp/
== Show links ==
-- About Adam --
Adam Gleave is the CEO of FAR, one of the most prominent not-for-profits focused on research towards AI safety & alignment. He completed his PhD in artificial intelligence (AI) at UC Berkeley, advised by Stuart Russell, a giant in the field of AI. Adam did his PhD on trustworthy machine learning and has dedicated his career to ensuring advanced AI systems act according to human preferences. Adam is incredibly knowledgeable about the world of AI, having worked directly as a researcher and now as leader of a sizable and growing research org.
-- Further resources --
* Adam
* Website: https://www.gleave.me/
* Twitter: https://twitter.com/ARGleave
* LinkedIn: https://www.linkedin.com/in/adamgleave/
* Google Scholar: https://scholar.google.com/citations?user=lBunDH0AAAAJ&hl=en&oi=ao
* FAR AI
* Website: https://far.ai
* Twitter: https://twitter.com/farairesearch
* LinkedIn: https://www.linkedin.com/company/far-ai/
* Job board: https://far.ai/category/jobs/
* AI safety training bootcamps:
* ARENA: https://www.arena.education/
* See also: MLAB, WMLB, https://aisafety.training/
* Research
* FAR's adversarial attack on Katago https://goattack.far.ai/
* Ideas for impact mentioned by Adam
* Consumer report for AI model safety
* Agency model to support AI safety researchers
* Compute cluster for AI safety researchers
* Donate to AI safety
* FAR AI: https://www.every.org/far-ai-inc#/donate/card
* ARC Evals: https://evals.alignment.org/
* Berkeley CHAI: https://humancompatible.ai/
Recorded Oct 9, 2023

Next Episode

undefined - Ep 11 - Technical alignment overview w/ Thomas Larsen (Director of Strategy, Center for AI Policy)

Ep 11 - Technical alignment overview w/ Thomas Larsen (Director of Strategy, Center for AI Policy)

We speak with Thomas Larsen, Director for Strategy at the Center for AI Policy in Washington, DC, to do a "speed run" overview of all the major technical research directions in AI alignment. A great way to quickly learn broadly about the field of technical AI alignment.
In 2022, Thomas spent ~75 hours putting together an overview of what everyone in technical alignment was doing. Since then, he's continued to be deeply engaged in AI safety. We talk to Thomas to share an updated overview to help listeners quickly understand the technical alignment research landscape.
We talk to Thomas about a huge breadth of technical alignment areas including:
* Prosaic alignment
* Scalable oversight (e.g. RLHF, debate, IDA)
* Intrepretability
* Heuristic arguments, from ARC
* Model evaluations
* Agent foundations
* Other areas more briefly:
* Model splintering
* Out-of-distribution (OOD) detection
* Low impact measures
* Threat modelling
* Scaling laws
* Brain-like AI safety
* Inverse reinforcement learning (RL)
* Cooperative AI
* Adversarial training
* Truthful AI
* Brain-machine interfaces (Neuralink)
Hosted by Soroush Pour. Follow me for more AGI content:
Twitter: https://twitter.com/soroushjp
LinkedIn: https://www.linkedin.com/in/soroushjp/
== Show links ==
-- About Thomas --
Thomas studied Computer Science & Mathematics at U. Michigan where he first did ML research in the field of computer vision. After graduating, he completed the MATS AI safety research scholar program before doing a stint at MIRI as a Technical AI Safety Researcher. Earlier this year, he moved his work into AI policy by co-founding the Center for AI Policy, a nonprofit, nonpartisan organisation focused on getting the US government to adopt policies that would mitigate national security risks from AI. The Center for AI Policy is not connected to foreign governments or commercial AI developers and is instead committed to the public interest.
* Center for AI Policy - https://www.aipolicy.us
* LinkedIn - https://www.linkedin.com/in/thomas-larsen/
* LessWrong - https://www.lesswrong.com/users/thomas-larsen
-- Further resources --
* Thomas' post, "What Everyone in Technical Alignment is Doing and Why" https://www.lesswrong.com/posts/QBAjndPuFbhEXKcCr/my-understanding-of-what-everyone-in-technical-alignment-is
* Please note this post is from Aug 2022. The podcast should be more up-to-date, but this post is still a valuable and relevant resource.

Episode Comments

Generate a badge

Get a badge for your website that links back to this episode

Select type & size
Open dropdown icon
share badge image

<a href="https://goodpods.com/podcasts/artificial-general-intelligence-agi-show-with-soroush-pour-257764/ep-10-accelerated-training-to-become-an-ai-safety-researcher-w-ryan-ki-36361479"> <img src="https://storage.googleapis.com/goodpods-images-bucket/badges/generic-badge-1.svg" alt="listen to ep 10 - accelerated training to become an ai safety researcher w/ ryan kidd (co-director, mats) on goodpods" style="width: 225px" /> </a>

Copy