
#195 – Sella Nevo on who's trying to steal frontier AI models, and what they could do with them
08/01/24 • 128 min
"Computational systems have literally millions of physical and conceptual components, and around 98% of them are embedded into your infrastructure without you ever having heard of them. And an inordinate amount of them can lead to a catastrophic failure of your security assumptions. And because of this, the Iranian secret nuclear programme failed to prevent a breach, most US agencies failed to prevent multiple breaches, most US national security agencies failed to prevent breaches. So ensuring your system is truly secure against highly resourced and dedicated attackers is really, really hard." —Sella Nevo
In today’s episode, host Luisa Rodriguez speaks to Sella Nevo — director of the Meselson Center at RAND — about his team’s latest report on how to protect the model weights of frontier AI models from actors who might want to steal them.
Links to learn more, highlights, and full transcript.
They cover:
- Real-world examples of sophisticated security breaches, and what we can learn from them.
- Why AI model weights might be such a high-value target for adversaries like hackers, rogue states, and other bad actors.
- The many ways that model weights could be stolen, from using human insiders to sophisticated supply chain hacks.
- The current best practices in cybersecurity, and why they may not be enough to keep bad actors away.
- New security measures that Sella hopes can mitigate with the growing risks.
- Sella’s work using machine learning for flood forecasting, which has significantly reduced injuries and costs from floods across Africa and Asia.
- And plenty more.
Also, RAND is currently hiring for roles in technical and policy information security — check them out if you're interested in this field!
Chapters:
- Cold open (00:00:00)
- Luisa’s intro (00:00:56)
- The interview begins (00:02:30)
- The importance of securing the model weights of frontier AI models (00:03:01)
- The most sophisticated and surprising security breaches (00:10:22)
- AI models being leaked (00:25:52)
- Researching for the RAND report (00:30:11)
- Who tries to steal model weights? (00:32:21)
- Malicious code and exploiting zero-days (00:42:06)
- Human insiders (00:53:20)
- Side-channel attacks (01:04:11)
- Getting access to air-gapped networks (01:10:52)
- Model extraction (01:19:47)
- Reducing and hardening authorised access (01:38:52)
- Confidential computing (01:48:05)
- Red-teaming and security testing (01:53:42)
- Careers in information security (01:59:54)
- Sella’s work on flood forecasting systems (02:01:57)
- Luisa’s outro (02:04:51)
Producer and editor: Keiran Harris
Audio engineering team: Ben Cordell, Simon Monsour, Milo McGuire, and Dominic Armstrong
Additional content editing: Katy Moore and Luisa Rodriguez
Transcriptions: Katy Moore
"Computational systems have literally millions of physical and conceptual components, and around 98% of them are embedded into your infrastructure without you ever having heard of them. And an inordinate amount of them can lead to a catastrophic failure of your security assumptions. And because of this, the Iranian secret nuclear programme failed to prevent a breach, most US agencies failed to prevent multiple breaches, most US national security agencies failed to prevent breaches. So ensuring your system is truly secure against highly resourced and dedicated attackers is really, really hard." —Sella Nevo
In today’s episode, host Luisa Rodriguez speaks to Sella Nevo — director of the Meselson Center at RAND — about his team’s latest report on how to protect the model weights of frontier AI models from actors who might want to steal them.
Links to learn more, highlights, and full transcript.
They cover:
- Real-world examples of sophisticated security breaches, and what we can learn from them.
- Why AI model weights might be such a high-value target for adversaries like hackers, rogue states, and other bad actors.
- The many ways that model weights could be stolen, from using human insiders to sophisticated supply chain hacks.
- The current best practices in cybersecurity, and why they may not be enough to keep bad actors away.
- New security measures that Sella hopes can mitigate with the growing risks.
- Sella’s work using machine learning for flood forecasting, which has significantly reduced injuries and costs from floods across Africa and Asia.
- And plenty more.
Also, RAND is currently hiring for roles in technical and policy information security — check them out if you're interested in this field!
Chapters:
- Cold open (00:00:00)
- Luisa’s intro (00:00:56)
- The interview begins (00:02:30)
- The importance of securing the model weights of frontier AI models (00:03:01)
- The most sophisticated and surprising security breaches (00:10:22)
- AI models being leaked (00:25:52)
- Researching for the RAND report (00:30:11)
- Who tries to steal model weights? (00:32:21)
- Malicious code and exploiting zero-days (00:42:06)
- Human insiders (00:53:20)
- Side-channel attacks (01:04:11)
- Getting access to air-gapped networks (01:10:52)
- Model extraction (01:19:47)
- Reducing and hardening authorised access (01:38:52)
- Confidential computing (01:48:05)
- Red-teaming and security testing (01:53:42)
- Careers in information security (01:59:54)
- Sella’s work on flood forecasting systems (02:01:57)
- Luisa’s outro (02:04:51)
Producer and editor: Keiran Harris
Audio engineering team: Ben Cordell, Simon Monsour, Milo McGuire, and Dominic Armstrong
Additional content editing: Katy Moore and Luisa Rodriguez
Transcriptions: Katy Moore
Previous Episode

#194 – Vitalik Buterin on defensive acceleration and how to regulate AI when you fear government
"If you’re a power that is an island and that goes by sea, then you’re more likely to do things like valuing freedom, being democratic, being pro-foreigner, being open-minded, being interested in trade. If you are on the Mongolian steppes, then your entire mindset is kill or be killed, conquer or be conquered ... the breeding ground for basically everything that all of us consider to be dystopian governance. If you want more utopian governance and less dystopian governance, then find ways to basically change the landscape, to try to make the world look more like mountains and rivers and less like the Mongolian steppes." —Vitalik Buterin
Can ‘effective accelerationists’ and AI ‘doomers’ agree on a common philosophy of technology? Common sense says no. But programmer and Ethereum cofounder Vitalik Buterin showed otherwise with his essay “My techno-optimism,” which both camps agreed was basically reasonable.
Links to learn more, highlights, video, and full transcript.
Seeing his social circle divided and fighting, Vitalik hoped to write a careful synthesis of the best ideas from both the optimists and the apprehensive.
Accelerationists are right: most technologies leave us better off, the human cost of delaying further advances can be dreadful, and centralising control in government hands often ends disastrously.
But the fearful are also right: some technologies are important exceptions, AGI has an unusually high chance of being one of those, and there are options to advance AI in safer directions.
The upshot? Defensive acceleration: humanity should run boldly but also intelligently into the future — speeding up technology to get its benefits, but preferentially developing ‘defensive’ technologies that lower systemic risks, permit safe decentralisation of power, and help both individuals and countries defend themselves against aggression and domination.
Entrepreneur First is running a defensive acceleration incubation programme with $250,000 of investment. If these ideas resonate with you, learn about the programme and apply by August 2, 2024. You don’t need a business idea yet — just the hustle to start a technology company.
In addition to all of that, host Rob Wiblin and Vitalik discuss:
- AI regulation disagreements being less about AI in particular, and more whether you’re typically more scared of anarchy or totalitarianism.
- Vitalik’s updated p(doom).
- Whether the social impact of blockchain and crypto has been a disappointment.
- Whether humans can merge with AI, and if that’s even desirable.
- The most valuable defensive technologies to accelerate.
- How to trustlessly identify what everyone will agree is misinformation
- Whether AGI is offence-dominant or defence-dominant.
- Vitalik’s updated take on effective altruism.
- Plenty more.
Chapters:
- Cold open (00:00:00)
- Rob’s intro (00:00:56)
- The interview begins (00:04:47)
- Three different views on technology (00:05:46)
- Vitalik’s updated probability of doom (00:09:25)
- Technology is amazing, and AI is fundamentally different from other tech (00:15:55)
- Fear of totalitarianism and finding middle ground (00:22:44)
- Should AI be more centralised or more decentralised? (00:42:20)
- Humans merging with AIs to remain relevant (01:06:59)
- Vitalik’s “d/acc” alternative (01:18:48)
- Biodefence (01:24:01)
- Pushback on Vitalik’s vision (01:37:09)
- How much do people actually disagree? (01:42:14)
- Cybersecurity (01:47:28)
- Information defence (02:01:44)
- Is AI more offence-dominant or defence-dominant? (02:21:00)
- How Vitalik communicates among different camps (02:25:44)
- Blockchain applications with social impact (02:34:37)
- Rob’s outro (03:01:00)
Producer and editor: Keiran Harris
Audio engineering team: Ben Cordell, Simon Monsour, Milo McGuire, and Dominic Armstrong
Transcriptions: Katy Moore
Next Episode

#196 – Jonathan Birch on the edge cases of sentience and why they matter
"In the 1980s, it was still apparently common to perform surgery on newborn babies without anaesthetic on both sides of the Atlantic. This led to appalling cases, and to public outcry, and to campaigns to change clinical practice. And as soon as [some courageous scientists] looked for evidence, it showed that this practice was completely indefensible and then the clinical practice was changed. People don’t need convincing anymore that we should take newborn human babies seriously as sentience candidates. But the tale is a useful cautionary tale, because it shows you how deep that overconfidence can run and how problematic it can be. It just underlines this point that overconfidence about sentience is everywhere and is dangerous." —Jonathan Birch
In today’s episode, host Luisa Rodriguez speaks to Dr Jonathan Birch — philosophy professor at the London School of Economics — about his new book, The Edge of Sentience: Risk and Precaution in Humans, Other Animals, and AI. (Check out the free PDF version!)
Links to learn more, highlights, and full transcript.
They cover:
- Candidates for sentience, such as humans with consciousness disorders, foetuses, neural organoids, invertebrates, and AIs
- Humanity’s history of acting as if we’re sure that such beings are incapable of having subjective experiences — and why Jonathan thinks that that certainty is completely unjustified.
- Chilling tales about overconfident policies that probably caused significant suffering for decades.
- How policymakers can act ethically given real uncertainty.
- Whether simulating the brain of the roundworm C. elegans or Drosophila (aka fruit flies) would create minds equally sentient to the biological versions.
- How new technologies like brain organoids could replace animal testing, and how big the risk is that they could be sentient too.
- Why Jonathan is so excited about citizens’ assemblies.
- Jonathan’s conversation with the Dalai Lama about whether insects are sentient.
- And plenty more.
Chapters:
- Cold open (00:00:00)
- Luisa’s intro (00:01:20)
- The interview begins (00:03:04)
- Why does sentience matter? (00:03:31)
- Inescapable uncertainty about other minds (00:05:43)
- The “zone of reasonable disagreement” in sentience research (00:10:31)
- Disorders of consciousness: comas and minimally conscious states (00:17:06)
- Foetuses and the cautionary tale of newborn pain (00:43:23)
- Neural organoids (00:55:49)
- AI sentience and whole brain emulation (01:06:17)
- Policymaking at the edge of sentience (01:28:09)
- Citizens’ assemblies (01:31:13)
- The UK’s Sentience Act (01:39:45)
- Ways Jonathan has changed his mind (01:47:26)
- Careers (01:54:54)
- Discussing animal sentience with the Dalai Lama (01:59:08)
- Luisa’s outro (02:01:04)
Producer and editor: Keiran Harris
Audio engineering by Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Additional content editing: Katy Moore and Luisa Rodriguez
Transcriptions: Katy Moore
If you like this episode you’ll love
Episode Comments
Generate a badge
Get a badge for your website that links back to this episode
<a href="https://goodpods.com/podcasts/80000-hours-podcast-134884/195-sella-nevo-on-whos-trying-to-steal-frontier-ai-models-and-what-the-65675850"> <img src="https://storage.googleapis.com/goodpods-images-bucket/badges/generic-badge-1.svg" alt="listen to #195 – sella nevo on who's trying to steal frontier ai models, and what they could do with them on goodpods" style="width: 225px" /> </a>
Copy