muckrAIkers

Jacob Haimes and Igor Krawczuk

Join us as we dig a tiny bit deeper into the hype surrounding "AI" press releases, research papers, and more. Each episode, we'll highlight ongoing research and investigations, providing some much needed contextualization, constructive critique, and even a smidge of occasional good will teasing to the conversation, trying to find the meaning under all of this muck.

All episodes

Best episodes

Top 10 muckrAIkers Episodes

Goodpods has curated a list of the 10 best muckrAIkers episodes, ranked by the number of listens and likes each episode have garnered from our listeners. If you are listening to muckrAIkers for the first time, there's no better place to start than with one of these standout episodes. If you are a fan of the show, vote for your favorite muckrAIkers episode by adding your comments to the episode page.

NeurIPS 2024 Wrapped 🌯

muckrAIkers

12/30/24 • 86 min

What happens when you bring over 15,000 machine learning nerds to one city? If your guess didn't include racism, sabotage and scandal, belated epiphanies, a spicy SoLaR panel, and many fantastic research papers, you wouldn't have captured my experience. In this episode we discuss the drama and takeaways from NeurIPS 2024.

Posters available at time of episode preparation can be found on the episode webpage.

EPISODE RECORDED 2024.12.22

(00:00) - Recording date
(00:05) - Intro
(00:44) - Obligatory mentions
(01:54) - SoLaR panel
(18:43) - Test of Time
(24:17) - And now: science!
(28:53) - Downsides of benchmarks
(41:39) - Improving the science of ML
(53:07) - Performativity
(57:33) - NopenAI and Nanthropic
(01:09:35) - Fun/interesting papers
(01:13:12) - Initial takes on o3
(01:18:12) - WorkArena
(01:25:00) - Outro

Links

Note: many workshop papers had not yet been published to arXiv as of preparing this episode, the OpenReview submission page is provided in these cases.

NeurIPS statement on inclusivity
CTOL Digital Solutions article - NeurIPS 2024 Sparks Controversy: MIT Professor's Remarks Ignite "Racism" Backlash Amid Chinese Researchers’ Triumphs
(1/2) NeurIPS Best Paper - Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
Visual Autoregressive Model report this link now provides a 404 error
- Don't worry, here it is on archive.is
Reuters article - ByteDance seeks $1.1 mln damages from intern in AI breach case, report says
CTOL Digital Solutions article - NeurIPS Award Winner Entangled in ByteDance's AI Sabotage Accusations: The Two Tales of an AI Genius
Reddit post on Ilya's talk
SoLaR workshop page

Referenced Sources

Harvard Data Science Review article - Data Science at the Singularity
Paper - Reward Reports for Reinforcement Learning
Paper - It's Not What Machines Can Learn, It's What We Cannot Teach
Paper - NeurIPS Reproducibility Program
Paper - A Metric Learning Reality Check

Improving Datasets, Benchmarks, and Measurements

Tutorial video + slides - Experimental Design and Analysis for AI Researchers (I think you need to have attended NeurIPS to access the recording, but I couldn't find a different version)
Paper - BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best Practices
Paper - Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
Paper - A Systematic Review of NeurIPS Dataset Management Practices
Paper - The State of Data Curation at NeurIPS: An Assessment of Dataset Development Practices in the Datasets and Benchmarks Track
Paper - Benchmark Repositories for Better Benchmarking
Paper - Croissant: A Metadata Format for ML-Ready Datasets
Paper - Rethinking the Evaluation of Out-of-Distribution Detection: A Sorites Paradox
Paper - Evaluating Generative AI Systems is a Social Science Measurement Challenge

OpenAI's o1, aka. Strawberry

muckrAIkers

09/23/24 • 50 min

OpenAI's new model is out, and we are going to have to rake through a lot of muck to get the value out of this one!

⚠ Opt out of LinkedIn's GenAI scraping ➡️ https://lnkd.in/epziUeTi

(00:00) - Intro
(00:25) - Other recent news
(02:57) - Hot off the press
(03:58) - Why might someone care?
(04:52) - What is it?
(06:49) - How is it being sold?
(10:45) - How do they explain it, technically?
(27:09) - Reflection AI Drama
(40:19) - Why do we care?
(46:39) - Scraping away the muck

Note: at around 32 minutes, Igor says the incorrect Llama model version for the story he is telling. Jacob dubbed over those mistakes with the correct versioning.

Links relating to o1

Other stuff we mention

Open Source AI and 2024 Nobel Prizes

muckrAIkers

10/16/24 • 61 min

The Open Source AI Definition is out after years of drafting, will it reestablish brand meaning for the “Open Source” term? Also, the 2024 Nobel Prizes in Physics and Chemistry are heavily tied to AI; we scrutinize not only this year's prizes, but also Nobel Prizes as a concept.

(00:00) - Intro
(00:30) - Hot off the press
(03:45) - Open Source AI background
(10:30) - Definitions and changes in RC1
(18:36) - “Business source”
(22:17) - Parallels with legislation
(26:22) - Impacts of the OSAID
(33:58) - 2024 Nobel Prize Context
(37:21) - Chemistry prize
(45:06) - Physics prize
(50:29) - Takeaways
(52:03) - What’s the real muck?
(01:00:27) - Outro

Links

More Reading on Open Source AI

Kairos.FM article - Open Source AI is a lie, but it doesn't have to be
The Register article - The open source AI civil war approaches
MIT Technology Review article - We finally have a definition for open-source AI

On Nobel Prizes

Paper - Access to Opportunity in the Sciences: Evidence from the Nobel Laureates
Physics prize - scientific background, popular info
Chemistry prize - scientific background, popular info
Reuters article - Google's Nobel prize winners stir debate over AI research
Wikipedia article - Nobel disease

Other Sources

Understanding Claude 3.5 Sonnet (New)

muckrAIkers

10/30/24 • 60 min

Frontier developers continue their war on sane versioning schema to bring us Claude 3.5 Sonnet (New), along with "computer use" capabilities. We discuss not only the new model, but also why Anthropic may have released this model and tool combination now.

(00:00) - Intro
(00:22) - Hot off the press
(05:03) - Claude 3.5 Sonnet (New) Two 'o' 3000
(09:23) - Breaking down "computer use"
(13:16) - Our understanding
(16:03) - Diverging business models
(32:07) - Why has Anthropic chosen this strategy?
(43:14) - Changing the frame
(48:00) - Polishing the lily

Links

Anthropic press release - Introducing Claude 3.5 Sonnet (New)
Model Card Addendum

Other Anthropic Relevant Media

Paper - Sabotage Evaluations for Frontier Models
Anthropic press release - Anthropic's Updated RSP
Alignment Forum blogpost - Anthropic's Updated RSP
Tweet - Response to scare regarding Anthropic training on user data
Anthropic press release - Developing a computer use model
Simon Willison article - Initial explorations of Anthropic’s new Computer Use capability
Tweet - ARC Prize performance
The Information article - Anthropic Has Floated $40 Billion Valuation in Funding Talks

Other Sources

LWN.net article - OSI readies controversial Open AI definition
National Security Memorandum
Framework to Advance AI Governance and Risk Management in National Security
Reuters article - Mother sues AI chatbot company Character.AI, Google over son's suicide
Medium article - A Small Step Towards Reproducing OpenAI o1: Progress Report on the Steiner Open Source Models
The Guardian article - Google's solution to accidental algorithmic racism: ban gorillas
TIME article - Ethical AI Isn’t to Blame for Google’s Gemini Debacle
Latacora article - The SOC2 Starting Seven
Grandview Research market trends - Robotic Process Automation Market Trends

Understanding AI World Models w/ Chris Canal

muckrAIkers

01/27/25 • 199 min

Chris Canal, co-founder of EquiStamp, joins muckrAIkers as our first ever podcast guest! In this ~3.5 hour interview, we discuss intelligence vs. competencies, the importance of test-time compute, moving goalposts, the orthogonality thesis, and much more.

A seasoned software developer, Chris started EquiStamp as a way to improve our current understanding of model failure modes and capabilities in late 2023. Now a key contractor for METR, EquiStamp evaluates the next generation of LLMs from frontier model developers like OpenAI and Anthropic.

EquiStamp is hiring, so if you're a software developer interested in a fully remote opportunity with flexible working hours, join the EquiStamp Discord server and message Chris directly; oh, and let him know muckrAIkers sent you!

(00:00) - Recording date
(00:05) - Intro
(00:29) - Hot off the press
(02:17) - Introducing Chris Canal
(19:12) - World/risk models
(35:21) - Competencies + decision making power
(42:09) - Breaking models down
(01:05:06) - Timelines, test time compute
(01:19:17) - Moving goalposts
(01:26:34) - Risk management pre-AGI
(01:46:32) - Happy endings
(01:55:50) - Causal chains
(02:04:49) - Appetite for democracy
(02:20:06) - Tech-frame based fallacies
(02:39:56) - Bringing back real capitalism
(02:45:23) - Orthogonality Thesis
(03:04:31) - Why we do this
(03:15:36) - Equistamp!

Links

EquiStamp
Chris's Twitter
METR Paper - RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts
All Trades article - Learning from History: Preventing AGI Existential Risks through Policy by Chris Canal
Better Systems article - The Omega Protocol: Another Manhattan Project

Superintelligence & Commentary

Wikipedia article - Superintelligence: Paths, Dangers, Strategies by Nick Bostrom
Reflective Altruism article - Against the singularity hypothesis (Part 5: Bostrom on the singularity)
Into AI Safety Interview - Scaling Democracy w/ Dr. Igor Krawczuk

Referenced Sources

Book - Man-made Catastrophes and Risk Information Concealment: Case Studies of Major Disasters and Human Fallibility
Artificial Intelligence Paper - Reward is Enough
Wikipedia article - Capital and Ideology by Thomas Piketty
Wikipedia article - Pantheon

LeCun on AGI

"Won't Happen" - Time article - Meta’s AI Chief Yann LeCun on AGI, Open-Source, and AI Risk
"But if it does, it'll be my research agenda latent state models, which I happen to research" - Meta Platforms Blogpost - I-JEPA: The first AI model based on Yann LeCun’s vision for more human-like AI

Other Sources

Stanford CS Senior Project - Timing Attacks on Prompt Caching in Language Model APIs
TechCrunch article - AI researcher François Chollet founds a new AI lab focused on AGI
White House Fact Sheet - Ensuring U.S. Security and Economic Strength in the Age of Artificial Intelligence
New York Post

DeepSeek Minisode

muckrAIkers

02/10/25 • 15 min

DeepSeek R1 has taken the world by storm, causing a stock market crash and prompting further calls for export controls within the US. Since this story is still very much in development, with follow-up investigations and calls for governance being released almost daily, we thought it best to hold of for a little while longer to be able to tell the whole story. Nonetheless, it's a big story, so we provide a brief overview of all that's out there so far.

(00:00) - Recording date
(00:04) - Intro
(00:37) - DeepSeek drop and reactions
(04:27) - Export controls
(08:05) - Skepticism and uncertainty
(14:12) - Outro

Links

DeepSeek website
DeepSeek paper
Reuters article - What is DeepSeek and why is it disrupting the AI sector?

Fallout coverage

The Verge article - OpenAI has evidence that its models helped train China’s DeepSeek
The Signal article - Nvidia loses nearly $600 billion in DeepSeek crash
CNN article - US lawmakers want to ban DeepSeek from government devices
Fortune article - Meta is reportedly scrambling ‘war rooms’ of engineers to figure out how DeepSeek’s AI is beating everyone else at a fraction of the price
Dario Amodei's blogpost - On DeepSeek and Export Controls
SemiAnalysis article - DeepSeek Debates
Ars Technica article - Microsoft now hosts AI model accused of copying OpenAI data
Wiz Blogpost - Wiz Research Uncovers Exposed DeepSeek Database Leaking Sensitive Information, Including Chat History

Investigations into "reasoning"

Blogpost - There May Not be Aha Moment in R1-Zero-like Training — A Pilot Study
Preprint - s1: Simple test-time scaling
Preprint - LIMO: Less is More for Reasoning
Blogpost - Reasoning Reflections
Preprint - Token-Hungry, Yet Precise: DeepSeek R1 Highlights the Need for Multi-Step Reasoning Over Speed in MATH

The End of Scaling?

muckrAIkers

11/19/24 • 67 min

Multiple news outlets, including The Information, Bloomberg, and Reuters [see sources] are reporting an "end of scaling" for the current AI paradigm. In this episode we look into these articles, as well as a wide variety of economic forecasting, empirical analysis, and technical papers to understand the validity, and impact of these reports. We also use this as an opportunity to contextualize the realized versus promised fruits of "AI".

(00:23) - Hot off the press
(01:49) - The end of scaling
(10:50) - "Useful tools" and "agentic" "AI"
(17:19) - The end of quantization
(25:18) - Hedging
(29:41) - The end of upwards mobility
(33:12) - How to grow an economy
(38:14) - Transformative & disruptive tech
(49:19) - Finding the meaning
(56:14) - Bursting AI bubble and Trump
(01:00:58) - The muck

Links

The Information article - OpenAI Shifts Strategy as Rate of ‘GPT’ AI Improvements Slows
Bloomberg [article] - OpenAI, Google and Anthropic Are Struggling to Build More Advanced AI
Reuters article - OpenAI and others seek new path to smarter AI as current methods hit limitations
Paper on the end of quantization - Scaling Laws for Precision
Tim Dettmers Tweet on "Scaling Laws for Precision"

Empirical Analysis

WU Vienna paper - Unslicing the pie: AI innovation and the labor share in European regions
IMF paper - The Labor Market Impact of Artificial Intelligence: Evidence from US Regions
NBER paper - Automation, Career Values, and Political Preferences
Pew Research Center report - Which U.S. Workers Are More Exposed to AI on Their Jobs?

Forecasting

NBER/Acemoglu paper - The Simple Macroeconomics of AI
NBER/Acemoglu paper - Harms of AI
IMF report - Gen-AI: Artificial Intelligence and the Future of Work
Submission to Open Philanthropy AI Worldviews Contest - Transformative AGI by 2043 is <1% likely

Externalities and the Bursting Bubble

NBER paper - Bubbles, Rational Expectations and Financial Markets
Clayton Christensen lecture capture - Clayton Christensen: Disruptive innovation
The New Republic article - The “Godfather of AI” Predicted I Wouldn’t Have a Job. He Was Wrong.
Latent Space article - $2 H100s: How the GPU Rental Bubble Burst

On Productization

Palantir press release on introduction of Claude to US security and defense
Ars Technica article - Claude AI to process secret government data through new Palantir deal
OpenAI press release on partnering with Condé Nast
Candid Technology article - Shutterstock and Getty partner with OpenAI and BRIA
E2B
Stripe agents
Robopair

O...

Winter is Coming for OpenAI

muckrAIkers

10/22/24 • 82 min

Brace yourselves, winter is coming for OpenAI - atleast, that's what we think. In this episode we look at OpenAI's recent massive funding round and ask "why would anyone want to fund a company that is set to lose net 5 billion USD for 2024?" We scrape through a whole lot of muck to find the meaningful signals in all this news, and there is a lot of it, so get ready!

(00:00) - Intro

(00:28) - Hot off the press

(02:43) - Why listen?

(06:07) - Why might VCs invest?

(15:52) - What are people saying

(23:10) - How *is* OpenAI making money?

(28:18) - Is AI hype dying?

(41:08) - Why might big companies invest?

(48:47) - Concrete impacts of AI

(52:37) - Outcome 1: OpenAI as a commodity

(01:04:02) - Outcome 2: AGI

(01:04:42) - Outcome 3: best plausible case

(01:07:53) - Outcome 1*: many ways to bust

(01:10:51) - Outcome 4+: shock factor

(01:12:51) - What's the muck

(01:21:17) - Extended outro

Links
Reuters article - OpenAI closes $6.6 billion funding haul with investment from Microsoft and Nvidia
Goldman Sachs report - GenAI: Too Much Spend, Too Little Benefit
Apricitas Economics article - The AI Investment Boom
Discussion of "The AI Investment Boom" on YCombinator
State of AI in 13 Charts
Fortune article - OpenAI sees $5 billion loss in 2024 and soaring sales as big ChatGPT fee hikes planned, report says
More on AI Hype (Dying)
Latent Space article - The Winds of AI Winter
Article by Gary Marcus - The Great AI Retrenchment has Begun
TimmermanReport article - AI: If Not Now, When? No, Really - When?
MIT News article - Who Will Benefit from AI?
Washington Post article - The AI Hype bubble is deflating. Now comes the hard part.
Andreesen Horowitz article - Why AI Will Save the World
Other Sources
Human-Centered Artificial Intelligence Foundation Model Transparency Index
Cointelegraph article - Europe gathers global experts to draft ‘Code of Practice’ for AI
Reuters article - Microsoft's VP of GenAI research to join OpenAI
Twitter post from Tim Brooks on joining DeepMind
Edward Zitron article - The Man Who Killed Google Search

SB1047

muckrAIkers

09/30/24 • 79 min

Why is Mark Ruffalo talking about SB1047, and what is it anyway? Tune in for our thoughts on the now vetoed California legislation that had Big Tech scared.

(00:00) - Intro

(00:31) - Updates from a relatively slow week

(03:32) - Disclaimer: SB1047 vetoed during recording (still worth a listen)

(05:24) - What is SB1047

(12:30) - Definitions

(17:18) - Understanding the bill

(28:42) - What are the players saying about it?

(46:44) - Addressing critiques

(55:59) - Open Source

(01:02:36) - Takeaways

(01:15:40) - Clarification on impact to big tech

(01:18:51) - Outro

Links
SB1047 legislation page
SB1047 CalMatters page
Newsom vetoes SB1047
CAIS newsletter on SB1047
Prominent AI nerd letter
Anthropic's letter
SB1047 ~explainer
Additional SB1047 Related Coverage
Opposition to SB1047 'makes no sense'
Newsom on SB1047
Andreesen Horowitz on SB1047
Classy move by Dan
Ex-OpenAI employee says Altman doesn't want regulation
Other Sources
o1 doesn't measure up in new benchmark paper
OpenAI losses and gains
OpenAI crypto hack
"Murati out" -Mira Murati, probably
Altman pitching datacenters to White House
Sam Altman, 'podcast bro'
Paper: Contract Design with Safety Inspections

How to Safely Handle Your AGI

muckrAIkers

12/02/24 • 58 min

While on the campaign trail, Trump made claims about repealing Biden's Executive Order on AI, but what will actually be changed when he gets into office? We take this opportunity to examine policies being discussed or implemented by leading governments around the world.

(00:00) - Intro

(00:29) - Hot off the press

(02:59) - Repealing the AI executive order?

(11:16) - "Manhattan" for AI

(24:33) - EU

(30:47) - UK

(39:27) - Bengio

(44:39) - Comparing EU/UK to USA

(45:23) - China

(51:12) - Taxes

(55:29) - The muck

Links
SFChronicle article - US gathers allies to talk AI safety as Trump's vow to undo Biden's AI policy overshadows their work
Trump's Executive Order on AI (the AI governance executive order at home)
Biden's Executive Order on AI
Congressional report brief which advises a "Manhattan Project for AI"
Non-USA
CAIRNE resource collection on CERN for AI
UK Frontier AI Taskforce report (2023)
International interim report (2024)
Bengio's paper - AI and Catastrophic Risk
Davidad's Safeguarded AI program at ARIA
MIT Technology Review article - Four things to know about China’s new AI rules in 2024
GovInsider article - Australia’s national policy for ethical use of AI starts to take shape
Future of Privacy forum article - The African Union’s Continental AI Strategy: Data Protection and Governance Laws Set to Play a Key Role in AI Regulation
Taxes
Macroeconomic Dynamics paper - Automation, Stagnation, and the Implications of a Robot Tax
CESifo paper - AI, Automation, and Taxation
GavTax article - Taxation of Artificial Intelligence and Automation
Perplexity Pages
CERN for AI page
China's AI policy page
Singapore's AI policy page
AI policy in Africa, India, Australia page
Other Sources
Artificial Intelligence Made Simple article - NYT's "AI Outperforms Doctors" Story Is Wrong
Intel report - Reclaim Your Day: The Impact of AI PCs on Productivity
Heise Online article - Users on AI PCs slower, Intel sees problem in unenlightened users
The Hacker News

Show more best episodes

muckrAIkers

Jacob Haimes and Igor Krawczuk

Top 10 muckrAIkers Episodes

NeurIPS 2024 Wrapped 🌯

OpenAI's o1, aka. Strawberry

Open Source AI and 2024 Nobel Prizes

Understanding Claude 3.5 Sonnet (New)

Understanding AI World Models w/ Chris Canal

DeepSeek Minisode

The End of Scaling?

Winter is Coming for OpenAI

SB1047

How to Safely Handle Your AGI

FAQ

How many episodes does muckrAIkers have?

What topics does muckrAIkers cover?

What is the most popular episode on muckrAIkers?

What is the average episode length on muckrAIkers?

How often are episodes of muckrAIkers released?

When was the first episode of muckrAIkers?

Comments