Chasing Real AGI: Inside ARC Prize 2025 with Chollet & Knoop

04/03/25 • 60 min

In this fascinating episode, we dive deep into the race towards true AI intelligence, AGI benchmarks, test-time adaptation, and program synthesis with star AI researcher (and philosopher) Francois Chollet, creator of Keras and the ARC AGI benchmark, and Mike Knoop, co-founder of Zapier and now co-founder with Francois of both the ARC Prize and the research lab Ndea. With the launch of ARC Prize 2025 and ARC-AGI 2, they explain why existing LLMs fall short on true intelligence tests, how new models like O3 mark a step change in capabilities, and what it will really take to reach AGI.

We cover everything from the technical evolution of ARC 1 to ARC 2, the shift toward test-time reasoning, and the role of program synthesis as a foundation for more general intelligence. The conversation also explores the philosophical underpinnings of intelligence, the structure of the ARC Prize, and the motivation behind launching Ndea — a ew AGI research lab that aims to build a "factory for rapid scientific advancement." Whether you're deep in the AI research trenches or just fascinated by where this is all headed, this episode offers clarity and inspiration.

Ndea

Website - https://ndea.com

X/Twitter - https://x.com/ndea

ARC Prize

Website - https://arcprize.org

X/Twitter - https://x.com/arcprize

François Chollet

LinkedIn - https://www.linkedin.com/in/fchollet

X/Twitter - https://x.com/fchollet

Mike Knoop

X/Twitter - https://x.com/mikeknoop

FIRSTMARK

Website - https://firstmark.com

X/Twitter - https://twitter.com/FirstMarkCap

Matt Turck (Managing Director)

LinkedIn - https://www.linkedin.com/in/turck/

X/Twitter - https://twitter.com/mattturck

(00:00) Intro

(01:05) Introduction to ARC Prize 2025 and ARC-AGI 2

(02:07) What is ARC and how it differs from other AI benchmarks

(02:54) Why current models struggle with fluid intelligence

(03:52) Shift from static LLMs to test-time adaptation

(04:19) What ARC measures vs. traditional benchmarks

(07:52) Limitations of brute-force scaling in LLMs

(13:31) Defining intelligence: adaptation and efficiency

(16:19) How O3 achieved a massive leap in ARC performance

(20:35) Speculation on O3's architecture and test-time search

(22:48) Program synthesis: what it is and why it matters

(28:28) Combining LLMs with search and synthesis techniques

(34:57) The ARC Prize structure: efficiency track, private vs. public

(42:03) Open source as a requirement for progress

(44:59) What's new in ARC-AGI 2 and human benchmark testing

(48:14) Capabilities ARC-AGI 2 is designed to test

(49:21) When will ARC-AGI 2 be saturated? AGI timelines

(52:25) Founding of NDEA and why now

(54:19) Vision beyond AGI: a factory for scientific advancement

(56:40) What NDEA is building and why it's different from LLM labs

(58:32) Hiring and remote-first culture at NDEA

(59:52) Closing thoughts and the future of AI research

Ndea

Website - https://ndea.com

X/Twitter - https://x.com/ndea

ARC Prize

Website - https://arcprize.org

X/Twitter - https://x.com/arcprize

François Chollet

LinkedIn - https://www.linkedin.com/in/fchollet

X/Twitter - https://x.com/fchollet

Mike Knoop

X/Twitter - https://x.com/mikeknoop

FIRSTMARK

Website - https://firstmark.com

X/Twitter - https://twitter.com/FirstMarkCap

Matt Turck (Managing Director)

LinkedIn - https://www.linkedin.com/in/turck/

X/Twitter - https://twitter.com/mattturck

(00:00) Intro

(01:05) Introduction to ARC Prize 2025 and ARC-AGI 2

(02:07) What is ARC and how it differs from other AI benchmarks

(02:54) Why current models struggle with fluid intelligence

(03:52) Shift from static LLMs to test-time adaptation

(04:19) What ARC measures vs. traditional benchmarks

(07:52) Limitations of brute-force scaling in LLMs

(13:31) Defining intelligence: adaptation and efficiency

(16:19) How O3 achieved a massive leap in ARC performance

(20:35) Speculation on O3's architecture and test-time search

(22:48) Program synthesis: what it is and why it matters

(28:28) Combining LLMs with search and synthesis techniques

(34:57) The ARC Prize structure: efficiency track, private vs. public

(42:03) Open source as a requirement for progress

(44:59) What's new in ARC-AGI 2 and human benchmark testing

(48:14) Capabilities ARC-AGI 2 is designed to test

(49:21) When will ARC-AGI 2 be saturated? AGI timelines

(52:25) Founding of NDEA and why now

(54:19) Vision beyond AGI: a factory for scientific advancement

(56:40) What NDEA is building and why it's different from LLM labs

(58:32) Hiring and remote-first culture at NDEA

(59:52) Closing thoughts and the future of AI research

Previous Episode

Why This Ex-Meta Leader is Rethinking AI Infrastructure | Lin Qiao, CEO, Fireworks AI

In 2022, Lin Qiao decided to leave Meta, where she was managing several hundred engineers, to start Fireworks AI. In this episode, we sit down with Lin for a deep dive on her work, starting with her leadership on PyTorch, now one of the most influential machine learning frameworks in the industry, powering research and production at scale across the AI industry.

Now at the helm of Fireworks AI, Lin is leading a new wave in generative AI infrastructure, simplifying model deployment and optimizing performance to empower all developers building with Gen AI technologies.

We dive into the technical core of Fireworks AI, uncovering their innovative strategies for model optimization, Function Calling in agentic development, and low-level breakthroughs at the GPU and CUDA layers.

Fireworks AI

Website - https://fireworks.ai

X/Twitter - https://twitter.com/FireworksAI_HQ

Lin Qiao

LinkedIn - https://www.linkedin.com/in/lin-qiao-22248b4

X/Twitter - https://twitter.com/lqiao

FIRSTMARK

Website - https://firstmark.com

X/Twitter - https://twitter.com/FirstMarkCap

Matt Turck (Managing Director)

LinkedIn - https://www.linkedin.com/in/turck/

X/Twitter - https://twitter.com/mattturck

(00:00) Intro

(01:20) What is Fireworks AI?

(02:47) What is PyTorch?

(12:50) Traditional ML vs GenAI

(14:54) AI’s enterprise transformation

(16:16) From Meta to Fireworks

(19:39) Simplifying AI infrastructure

(20:41) How Fireworks clients use GenAI

(22:02) How many models are powered by Fireworks

(30:09) LLM partitioning

(34:43) Real-time vs pre-set search

(36:56) Reinforcement learning

(38:56) Function calling

(44:23) Low-level architecture overview

(45:47) Cloud GPUs & hardware support

(47:16) VPC vs on-prem vs local deployment

(49:50) Decreasing inference costs and its business implications

(52:46) Fireworks roadmap

(55:03) AI future predictions

Next Episode

Snowflake CEO on Winning the AI Arms Race

In this episode, we sit down with Sridhar Ramaswamy, CEO of Snowflake, for an in-depth conversation about the company’s transformation from a cloud analytics platform into a comprehensive AI data cloud. Sridhar shares insights on Snowflake’s shift toward open formats like Apache Iceberg and why monetizing storage was, in his view, a strategic misstep.

We also dive into Snowflake’s growing AI capabilities, including tools like Cortex Analyst and Cortex Search, and discuss how the company scaled AI deployments at an impressive pace. Sridhar reflects on lessons from his previous startup, Neeva, and offers candid thoughts on the search landscape, the future of BI tools, real-time analytics, and why partnering with OpenAI and Anthropic made more sense than building Snowflake’s own foundation models.

Snowflake

Website - https://www.snowflake.com

X/Twitter - https://x.com/snowflakedb

Sridhar Ramaswamy

LinkedIn - https://www.linkedin.com/in/sridhar-ramaswamy

X/Twitter - https://x.com/RamaswmySridhar

FIRSTMARK

Website - https://firstmark.com

X/Twitter - https://twitter.com/FirstMarkCap

Matt Turck (Managing Director)

LinkedIn - https://www.linkedin.com/in/turck/

X/Twitter - https://twitter.com/mattturck