
muckrAIkers
Jacob Haimes and Igor Krawczuk
All episodes
Best episodes
Top 10 muckrAIkers Episodes
Goodpods has curated a list of the 10 best muckrAIkers episodes, ranked by the number of listens and likes each episode have garnered from our listeners. If you are listening to muckrAIkers for the first time, there's no better place to start than with one of these standout episodes. If you are a fan of the show, vote for your favorite muckrAIkers episode by adding your comments to the episode page.

NeurIPS 2024 Wrapped 🌯
muckrAIkers
What happens when you bring over 15,000 machine learning nerds to one city? If your guess didn't include racism, sabotage and scandal, belated epiphanies, a spicy SoLaR panel, and many fantastic research papers, you wouldn't have captured my experience. In this episode we discuss the drama and takeaways from NeurIPS 2024.
Posters available at time of episode preparation can be found on the episode webpage.
EPISODE RECORDED 2024.12.22
- (00:00) - Recording date
- (00:05) - Intro
- (00:44) - Obligatory mentions
- (01:54) - SoLaR panel
- (18:43) - Test of Time
- (24:17) - And now: science!
- (28:53) - Downsides of benchmarks
- (41:39) - Improving the science of ML
- (53:07) - Performativity
- (57:33) - NopenAI and Nanthropic
- (01:09:35) - Fun/interesting papers
- (01:13:12) - Initial takes on o3
- (01:18:12) - WorkArena
- (01:25:00) - Outro
Links
Note: many workshop papers had not yet been published to arXiv as of preparing this episode, the OpenReview submission page is provided in these cases.
- NeurIPS statement on inclusivity
- CTOL Digital Solutions article - NeurIPS 2024 Sparks Controversy: MIT Professor's Remarks Ignite "Racism" Backlash Amid Chinese Researchers’ Triumphs
- (1/2) NeurIPS Best Paper - Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
- Visual Autoregressive Model report this link now provides a 404 error
- Don't worry, here it is on archive.is
- Reuters article - ByteDance seeks $1.1 mln damages from intern in AI breach case, report says
- CTOL Digital Solutions article - NeurIPS Award Winner Entangled in ByteDance's AI Sabotage Accusations: The Two Tales of an AI Genius
- Reddit post on Ilya's talk
- SoLaR workshop page
Referenced Sources
- Harvard Data Science Review article - Data Science at the Singularity
- Paper - Reward Reports for Reinforcement Learning
- Paper - It's Not What Machines Can Learn, It's What We Cannot Teach
- Paper - NeurIPS Reproducibility Program
- Paper - A Metric Learning Reality Check
Improving Datasets, Benchmarks, and Measurements
- Tutorial video + slides - Experimental Design and Analysis for AI Researchers (I think you need to have attended NeurIPS to access the recording, but I couldn't find a different version)
- Paper - BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best Practices
- Paper - Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
- Paper - A Systematic Review of NeurIPS Dataset Management Practices
- Paper - The State of Data Curation at NeurIPS: An Assessment of Dataset Development Practices in the Datasets and Benchmarks Track
- Paper - Benchmark Repositories for Better Benchmarking
- Paper - Croissant: A Metadata Format for ML-Ready Datasets
- Paper - Rethinking the Evaluation of Out-of-Distribution Detection: A Sorites Paradox
- Paper - Evaluating Generative AI Systems is a Social Science Measurement Challenge
What happens when you bring over 15,000 machine learning nerds to one city? If your guess didn't include racism, sabotage and scandal, belated epiphanies, a spicy SoLaR panel, and many fantastic research papers, you wouldn't have captured my experience. In this episode we discuss the drama and takeaways from NeurIPS 2024.
Posters available at time of episode preparation can be found on the episode webpage.
EPISODE RECORDED 2024.12.22
- (00:00) - Recording date
- (00:05) - Intro
- (00:44) - Obligatory mentions
- (01:54) - SoLaR panel
- (18:43) - Test of Time
- (24:17) - And now: science!
- (28:53) - Downsides of benchmarks
- (41:39) - Improving the science of ML
- (53:07) - Performativity
- (57:33) - NopenAI and Nanthropic
- (01:09:35) - Fun/interesting papers
- (01:13:12) - Initial takes on o3
- (01:18:12) - WorkArena
- (01:25:00) - Outro
Links
Note: many workshop papers had not yet been published to arXiv as of preparing this episode, the OpenReview submission page is provided in these cases.
- NeurIPS statement on inclusivity
- CTOL Digital Solutions article - NeurIPS 2024 Sparks Controversy: MIT Professor's Remarks Ignite "Racism" Backlash Amid Chinese Researchers’ Triumphs
- (1/2) NeurIPS Best Paper - Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
- Visual Autoregressive Model report this link now provides a 404 error
- Don't worry, here it is on archive.is
- Reuters article - ByteDance seeks $1.1 mln damages from intern in AI breach case, report says
- CTOL Digital Solutions article - NeurIPS Award Winner Entangled in ByteDance's AI Sabotage Accusations: The Two Tales of an AI Genius
- Reddit post on Ilya's talk
- SoLaR workshop page
Referenced Sources
- Harvard Data Science Review article - Data Science at the Singularity
- Paper - Reward Reports for Reinforcement Learning
- Paper - It's Not What Machines Can Learn, It's What We Cannot Teach
- Paper - NeurIPS Reproducibility Program
- Paper - A Metric Learning Reality Check
Improving Datasets, Benchmarks, and Measurements
- Tutorial video + slides - Experimental Design and Analysis for AI Researchers (I think you need to have attended NeurIPS to access the recording, but I couldn't find a different version)
- Paper - BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best Practices
- Paper - Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
- Paper - A Systematic Review of NeurIPS Dataset Management Practices
- Paper - The State of Data Curation at NeurIPS: An Assessment of Dataset Development Practices in the Datasets and Benchmarks Track
- Paper - Benchmark Repositories for Better Benchmarking
- Paper - Croissant: A Metadata Format for ML-Ready Datasets
- Paper - Rethinking the Evaluation of Out-of-Distribution Detection: A Sorites Paradox
- Paper - Evaluating Generative AI Systems is a Social Science Measurement Challenge
12/30/24 • 86 min
OpenAI's o1, aka. Strawberry
muckrAIkers
OpenAI's new model is out, and we are going to have to rake through a lot of muck to get the value out of this one!
⚠ Opt out of LinkedIn's GenAI scraping ➡️ https://lnkd.in/epziUeTi
- (00:00) - Intro
- (00:25) - Other recent news
- (02:57) - Hot off the press
- (03:58) - Why might someone care?
- (04:52) - What is it?
- (06:49) - How is it being sold?
- (10:45) - How do they explain it, technically?
- (27:09) - Reflection AI Drama
- (40:19) - Why do we care?
- (46:39) - Scraping away the muck
Note: at around 32 minutes, Igor says the incorrect Llama model version for the story he is telling. Jacob dubbed over those mistakes with the correct versioning.
Links relating to o1
- OpenAI blogpost
- System card webpage
- GitHub collection of o1 related media
- AMA Twitter thread
- Francois Chollet Tweet on reasoning and o1
- The academic paper doing something very similar to o1
Other stuff we mention
- OpenAI's huge valuation hinges on upending corporate structure
- Meta acknowledges it’s scraping all public posts for AI training
- White House announces new private sector voluntary commitments to combat image-based sexual abuse
- Sam Altman wants you to be grateful
- The Zuck is done apologizing
- IAPS report on technical safety research at AI companies
- Llama2 70B is "about as good" as GPT-4 at summarization tasks
OpenAI's new model is out, and we are going to have to rake through a lot of muck to get the value out of this one!
⚠ Opt out of LinkedIn's GenAI scraping ➡️ https://lnkd.in/epziUeTi
- (00:00) - Intro
- (00:25) - Other recent news
- (02:57) - Hot off the press
- (03:58) - Why might someone care?
- (04:52) - What is it?
- (06:49) - How is it being sold?
- (10:45) - How do they explain it, technically?
- (27:09) - Reflection AI Drama
- (40:19) - Why do we care?
- (46:39) - Scraping away the muck
Note: at around 32 minutes, Igor says the incorrect Llama model version for the story he is telling. Jacob dubbed over those mistakes with the correct versioning.
Links relating to o1
- OpenAI blogpost
- System card webpage
- GitHub collection of o1 related media
- AMA Twitter thread
- Francois Chollet Tweet on reasoning and o1
- The academic paper doing something very similar to o1
Other stuff we mention
- OpenAI's huge valuation hinges on upending corporate structure
- Meta acknowledges it’s scraping all public posts for AI training
- White House announces new private sector voluntary commitments to combat image-based sexual abuse
- Sam Altman wants you to be grateful
- The Zuck is done apologizing
- IAPS report on technical safety research at AI companies
- Llama2 70B is "about as good" as GPT-4 at summarization tasks
09/23/24 • 50 min
Open Source AI and 2024 Nobel Prizes
muckrAIkers
The Open Source AI Definition is out after years of drafting, will it reestablish brand meaning for the “Open Source” term? Also, the 2024 Nobel Prizes in Physics and Chemistry are heavily tied to AI; we scrutinize not only this year's prizes, but also Nobel Prizes as a concept.
- (00:00) - Intro
- (00:30) - Hot off the press
- (03:45) - Open Source AI background
- (10:30) - Definitions and changes in RC1
- (18:36) - “Business source”
- (22:17) - Parallels with legislation
- (26:22) - Impacts of the OSAID
- (33:58) - 2024 Nobel Prize Context
- (37:21) - Chemistry prize
- (45:06) - Physics prize
- (50:29) - Takeaways
- (52:03) - What’s the real muck?
- (01:00:27) - Outro
Links
More Reading on Open Source AI
- Kairos.FM article - Open Source AI is a lie, but it doesn't have to be
- The Register article - The open source AI civil war approaches
- MIT Technology Review article - We finally have a definition for open-source AI
On Nobel Prizes
- Paper - Access to Opportunity in the Sciences: Evidence from the Nobel Laureates
- Physics prize - scientific background, popular info
- Chemistry prize - scientific background, popular info
- Reuters article - Google's Nobel prize winners stir debate over AI research
- Wikipedia article - Nobel disease
Other Sources
The Open Source AI Definition is out after years of drafting, will it reestablish brand meaning for the “Open Source” term? Also, the 2024 Nobel Prizes in Physics and Chemistry are heavily tied to AI; we scrutinize not only this year's prizes, but also Nobel Prizes as a concept.
- (00:00) - Intro
- (00:30) - Hot off the press
- (03:45) - Open Source AI background
- (10:30) - Definitions and changes in RC1
- (18:36) - “Business source”
- (22:17) - Parallels with legislation
- (26:22) - Impacts of the OSAID
- (33:58) - 2024 Nobel Prize Context
- (37:21) - Chemistry prize
- (45:06) - Physics prize
- (50:29) - Takeaways
- (52:03) - What’s the real muck?
- (01:00:27) - Outro
Links
More Reading on Open Source AI
- Kairos.FM article - Open Source AI is a lie, but it doesn't have to be
- The Register article - The open source AI civil war approaches
- MIT Technology Review article - We finally have a definition for open-source AI
On Nobel Prizes
- Paper - Access to Opportunity in the Sciences: Evidence from the Nobel Laureates
- Physics prize - scientific background, popular info
- Chemistry prize - scientific background, popular info
- Reuters article - Google's Nobel prize winners stir debate over AI research
- Wikipedia article - Nobel disease
Other Sources
10/16/24 • 61 min
Understanding Claude 3.5 Sonnet (New)
muckrAIkers
Frontier developers continue their war on sane versioning schema to bring us Claude 3.5 Sonnet (New), along with "computer use" capabilities. We discuss not only the new model, but also why Anthropic may have released this model and tool combination now.
- (00:00) - Intro
- (00:22) - Hot off the press
- (05:03) - Claude 3.5 Sonnet (New) Two 'o' 3000
- (09:23) - Breaking down "computer use"
- (13:16) - Our understanding
- (16:03) - Diverging business models
- (32:07) - Why has Anthropic chosen this strategy?
- (43:14) - Changing the frame
- (48:00) - Polishing the lily
Links
- Anthropic press release - Introducing Claude 3.5 Sonnet (New)
- Model Card Addendum
Other Anthropic Relevant Media
- Paper - Sabotage Evaluations for Frontier Models
- Anthropic press release - Anthropic's Updated RSP
- Alignment Forum blogpost - Anthropic's Updated RSP
- Tweet - Response to scare regarding Anthropic training on user data
- Anthropic press release - Developing a computer use model
- Simon Willison article - Initial explorations of Anthropic’s new Computer Use capability
- Tweet - ARC Prize performance
- The Information article - Anthropic Has Floated $40 Billion Valuation in Funding Talks
Other Sources
- LWN.net article - OSI readies controversial Open AI definition
- National Security Memorandum
- Framework to Advance AI Governance and Risk Management in National Security
- Reuters article - Mother sues AI chatbot company Character.AI, Google over son's suicide
- Medium article - A Small Step Towards Reproducing OpenAI o1: Progress Report on the Steiner Open Source Models
- The Guardian article - Google's solution to accidental algorithmic racism: ban gorillas
- TIME article - Ethical AI Isn’t to Blame for Google’s Gemini Debacle
- Latacora article - The SOC2 Starting Seven
- Grandview Research market trends - Robotic Process Automation Market Trends
Frontier developers continue their war on sane versioning schema to bring us Claude 3.5 Sonnet (New), along with "computer use" capabilities. We discuss not only the new model, but also why Anthropic may have released this model and tool combination now.
- (00:00) - Intro
- (00:22) - Hot off the press
- (05:03) - Claude 3.5 Sonnet (New) Two 'o' 3000
- (09:23) - Breaking down "computer use"
- (13:16) - Our understanding
- (16:03) - Diverging business models
- (32:07) - Why has Anthropic chosen this strategy?
- (43:14) - Changing the frame
- (48:00) - Polishing the lily
Links
- Anthropic press release - Introducing Claude 3.5 Sonnet (New)
- Model Card Addendum
Other Anthropic Relevant Media
- Paper - Sabotage Evaluations for Frontier Models
- Anthropic press release - Anthropic's Updated RSP
- Alignment Forum blogpost - Anthropic's Updated RSP
- Tweet - Response to scare regarding Anthropic training on user data
- Anthropic press release - Developing a computer use model
- Simon Willison article - Initial explorations of Anthropic’s new Computer Use capability
- Tweet - ARC Prize performance
- The Information article - Anthropic Has Floated $40 Billion Valuation in Funding Talks
Other Sources
- LWN.net article - OSI readies controversial Open AI definition
- National Security Memorandum
- Framework to Advance AI Governance and Risk Management in National Security
- Reuters article - Mother sues AI chatbot company Character.AI, Google over son's suicide
- Medium article - A Small Step Towards Reproducing OpenAI o1: Progress Report on the Steiner Open Source Models
- The Guardian article - Google's solution to accidental algorithmic racism: ban gorillas
- TIME article - Ethical AI Isn’t to Blame for Google’s Gemini Debacle
- Latacora article - The SOC2 Starting Seven
- Grandview Research market trends - Robotic Process Automation Market Trends
10/30/24 • 60 min
Understanding AI World Models w/ Chris Canal
muckrAIkers
Chris Canal, co-founder of EquiStamp, joins muckrAIkers as our first ever podcast guest! In this ~3.5 hour interview, we discuss intelligence vs. competencies, the importance of test-time compute, moving goalposts, the orthogonality thesis, and much more.
A seasoned software developer, Chris started EquiStamp as a way to improve our current understanding of model failure modes and capabilities in late 2023. Now a key contractor for METR, EquiStamp evaluates the next generation of LLMs from frontier model developers like OpenAI and Anthropic.
EquiStamp is hiring, so if you're a software developer interested in a fully remote opportunity with flexible working hours, join the EquiStamp Discord server and message Chris directly; oh, and let him know muckrAIkers sent you!
- (00:00) - Recording date
- (00:05) - Intro
- (00:29) - Hot off the press
- (02:17) - Introducing Chris Canal
- (19:12) - World/risk models
- (35:21) - Competencies + decision making power
- (42:09) - Breaking models down
- (01:05:06) - Timelines, test time compute
- (01:19:17) - Moving goalposts
- (01:26:34) - Risk management pre-AGI
- (01:46:32) - Happy endings
- (01:55:50) - Causal chains
- (02:04:49) - Appetite for democracy
- (02:20:06) - Tech-frame based fallacies
- (02:39:56) - Bringing back real capitalism
- (02:45:23) - Orthogonality Thesis
- (03:04:31) - Why we do this
- (03:15:36) - Equistamp!
Links
- EquiStamp
- Chris's Twitter
- METR Paper - RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts
- All Trades article - Learning from History: Preventing AGI Existential Risks through Policy by Chris Canal
- Better Systems article - The Omega Protocol: Another Manhattan Project
Superintelligence & Commentary
- Wikipedia article - Superintelligence: Paths, Dangers, Strategies by Nick Bostrom
- Reflective Altruism article - Against the singularity hypothesis (Part 5: Bostrom on the singularity)
- Into AI Safety Interview - Scaling Democracy w/ Dr. Igor Krawczuk
Referenced Sources
- Book - Man-made Catastrophes and Risk Information Concealment: Case Studies of Major Disasters and Human Fallibility
- Artificial Intelligence Paper - Reward is Enough
- Wikipedia article - Capital and Ideology by Thomas Piketty
- Wikipedia article - Pantheon
LeCun on AGI
- "Won't Happen" - Time article - Meta’s AI Chief Yann LeCun on AGI, Open-Source, and AI Risk
- "But if it does, it'll be my research agenda latent state models, which I happen to research" - Meta Platforms Blogpost - I-JEPA: The first AI model based on Yann LeCun’s vision for more human-like AI
Other Sources
- Stanford CS Senior Project - Timing Attacks on Prompt Caching in Language Model APIs
- TechCrunch article - AI researcher François Chollet founds a new AI lab focused on AGI
- White House Fact Sheet - Ensuring U.S. Security and Economic Strength in the Age of Artificial Intelligence
- New York Post
Chris Canal, co-founder of EquiStamp, joins muckrAIkers as our first ever podcast guest! In this ~3.5 hour interview, we discuss intelligence vs. competencies, the importance of test-time compute, moving goalposts, the orthogonality thesis, and much more.
A seasoned software developer, Chris started EquiStamp as a way to improve our current understanding of model failure modes and capabilities in late 2023. Now a key contractor for METR, EquiStamp evaluates the next generation of LLMs from frontier model developers like OpenAI and Anthropic.
EquiStamp is hiring, so if you're a software developer interested in a fully remote opportunity with flexible working hours, join the EquiStamp Discord server and message Chris directly; oh, and let him know muckrAIkers sent you!
- (00:00) - Recording date
- (00:05) - Intro
- (00:29) - Hot off the press
- (02:17) - Introducing Chris Canal
- (19:12) - World/risk models
- (35:21) - Competencies + decision making power
- (42:09) - Breaking models down
- (01:05:06) - Timelines, test time compute
- (01:19:17) - Moving goalposts
- (01:26:34) - Risk management pre-AGI
- (01:46:32) - Happy endings
- (01:55:50) - Causal chains
- (02:04:49) - Appetite for democracy
- (02:20:06) - Tech-frame based fallacies
- (02:39:56) - Bringing back real capitalism
- (02:45:23) - Orthogonality Thesis
- (03:04:31) - Why we do this
- (03:15:36) - Equistamp!
Links
- EquiStamp
- Chris's Twitter
- METR Paper - RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts
- All Trades article - Learning from History: Preventing AGI Existential Risks through Policy by Chris Canal
- Better Systems article - The Omega Protocol: Another Manhattan Project
Superintelligence & Commentary
- Wikipedia article - Superintelligence: Paths, Dangers, Strategies by Nick Bostrom
- Reflective Altruism article - Against the singularity hypothesis (Part 5: Bostrom on the singularity)
- Into AI Safety Interview - Scaling Democracy w/ Dr. Igor Krawczuk
Referenced Sources
- Book - Man-made Catastrophes and Risk Information Concealment: Case Studies of Major Disasters and Human Fallibility
- Artificial Intelligence Paper - Reward is Enough
- Wikipedia article - Capital and Ideology by Thomas Piketty
- Wikipedia article - Pantheon
LeCun on AGI
- "Won't Happen" - Time article - Meta’s AI Chief Yann LeCun on AGI, Open-Source, and AI Risk
- "But if it does, it'll be my research agenda latent state models, which I happen to research" - Meta Platforms Blogpost - I-JEPA: The first AI model based on Yann LeCun’s vision for more human-like AI
Other Sources
- Stanford CS Senior Project - Timing Attacks on Prompt Caching in Language Model APIs
- TechCrunch article - AI researcher François Chollet founds a new AI lab focused on AGI
- White House Fact Sheet - Ensuring U.S. Security and Economic Strength in the Age of Artificial Intelligence
- New York Post
01/27/25 • 199 min
DeepSeek Minisode
muckrAIkers
DeepSeek R1 has taken the world by storm, causing a stock market crash and prompting further calls for export controls within the US. Since this story is still very much in development, with follow-up investigations and calls for governance being released almost daily, we thought it best to hold of for a little while longer to be able to tell the whole story. Nonetheless, it's a big story, so we provide a brief overview of all that's out there so far.
- (00:00) - Recording date
- (00:04) - Intro
- (00:37) - DeepSeek drop and reactions
- (04:27) - Export controls
- (08:05) - Skepticism and uncertainty
- (14:12) - Outro
Links
- DeepSeek website
- DeepSeek paper
- Reuters article - What is DeepSeek and why is it disrupting the AI sector?
Fallout coverage
- The Verge article - OpenAI has evidence that its models helped train China’s DeepSeek
- The Signal article - Nvidia loses nearly $600 billion in DeepSeek crash
- CNN article - US lawmakers want to ban DeepSeek from government devices
- Fortune article - Meta is reportedly scrambling ‘war rooms’ of engineers to figure out how DeepSeek’s AI is beating everyone else at a fraction of the price
- Dario Amodei's blogpost - On DeepSeek and Export Controls
- SemiAnalysis article - DeepSeek Debates
- Ars Technica article - Microsoft now hosts AI model accused of copying OpenAI data
- Wiz Blogpost - Wiz Research Uncovers Exposed DeepSeek Database Leaking Sensitive Information, Including Chat History
Investigations into "reasoning"
- Blogpost - There May Not be Aha Moment in R1-Zero-like Training — A Pilot Study
- Preprint - s1: Simple test-time scaling
- Preprint - LIMO: Less is More for Reasoning
- Blogpost - Reasoning Reflections
- Preprint - Token-Hungry, Yet Precise: DeepSeek R1 Highlights the Need for Multi-Step Reasoning Over Speed in MATH
DeepSeek R1 has taken the world by storm, causing a stock market crash and prompting further calls for export controls within the US. Since this story is still very much in development, with follow-up investigations and calls for governance being released almost daily, we thought it best to hold of for a little while longer to be able to tell the whole story. Nonetheless, it's a big story, so we provide a brief overview of all that's out there so far.
- (00:00) - Recording date
- (00:04) - Intro
- (00:37) - DeepSeek drop and reactions
- (04:27) - Export controls
- (08:05) - Skepticism and uncertainty
- (14:12) - Outro
Links
- DeepSeek website
- DeepSeek paper
- Reuters article - What is DeepSeek and why is it disrupting the AI sector?
Fallout coverage
- The Verge article - OpenAI has evidence that its models helped train China’s DeepSeek
- The Signal article - Nvidia loses nearly $600 billion in DeepSeek crash
- CNN article - US lawmakers want to ban DeepSeek from government devices
- Fortune article - Meta is reportedly scrambling ‘war rooms’ of engineers to figure out how DeepSeek’s AI is beating everyone else at a fraction of the price
- Dario Amodei's blogpost - On DeepSeek and Export Controls
- SemiAnalysis article - DeepSeek Debates
- Ars Technica article - Microsoft now hosts AI model accused of copying OpenAI data
- Wiz Blogpost - Wiz Research Uncovers Exposed DeepSeek Database Leaking Sensitive Information, Including Chat History
Investigations into "reasoning"
- Blogpost - There May Not be Aha Moment in R1-Zero-like Training — A Pilot Study
- Preprint - s1: Simple test-time scaling
- Preprint - LIMO: Less is More for Reasoning
- Blogpost - Reasoning Reflections
- Preprint - Token-Hungry, Yet Precise: DeepSeek R1 Highlights the Need for Multi-Step Reasoning Over Speed in MATH
02/10/25 • 15 min
The End of Scaling?
muckrAIkers
Multiple news outlets, including The Information, Bloomberg, and Reuters [see sources] are reporting an "end of scaling" for the current AI paradigm. In this episode we look into these articles, as well as a wide variety of economic forecasting, empirical analysis, and technical papers to understand the validity, and impact of these reports. We also use this as an opportunity to contextualize the realized versus promised fruits of "AI".
- (00:23) - Hot off the press
- (01:49) - The end of scaling
- (10:50) - "Useful tools" and "agentic" "AI"
- (17:19) - The end of quantization
- (25:18) - Hedging
- (29:41) - The end of upwards mobility
- (33:12) - How to grow an economy
- (38:14) - Transformative & disruptive tech
- (49:19) - Finding the meaning
- (56:14) - Bursting AI bubble and Trump
- (01:00:58) - The muck
- The Information article - OpenAI Shifts Strategy as Rate of ‘GPT’ AI Improvements Slows
- Bloomberg [article] - OpenAI, Google and Anthropic Are Struggling to Build More Advanced AI
- Reuters article - OpenAI and others seek new path to smarter AI as current methods hit limitations
- Paper on the end of quantization - Scaling Laws for Precision
- Tim Dettmers Tweet on "Scaling Laws for Precision"
Empirical Analysis
- WU Vienna paper - Unslicing the pie: AI innovation and the labor share in European regions
- IMF paper - The Labor Market Impact of Artificial Intelligence: Evidence from US Regions
- NBER paper - Automation, Career Values, and Political Preferences
- Pew Research Center report - Which U.S. Workers Are More Exposed to AI on Their Jobs?
Forecasting
- NBER/Acemoglu paper - The Simple Macroeconomics of AI
- NBER/Acemoglu paper - Harms of AI
- IMF report - Gen-AI: Artificial Intelligence and the Future of Work
- Submission to Open Philanthropy AI Worldviews Contest - Transformative AGI by 2043 is <1% likely
Externalities and the Bursting Bubble
- NBER paper - Bubbles, Rational Expectations and Financial Markets
- Clayton Christensen lecture capture - Clayton Christensen: Disruptive innovation
- The New Republic article - The “Godfather of AI” Predicted I Wouldn’t Have a Job. He Was Wrong.
- Latent Space article - $2 H100s: How the GPU Rental Bubble Burst
On Productization
- Palantir press release on introduction of Claude to US security and defense
- Ars Technica article - Claude AI to process secret government data through new Palantir deal
- OpenAI press release on partnering with Condé Nast
- Candid Technology article - Shutterstock and Getty partner with OpenAI and BRIA
- E2B
- Stripe agents
- Robopair
O...
Multiple news outlets, including The Information, Bloomberg, and Reuters [see sources] are reporting an "end of scaling" for the current AI paradigm. In this episode we look into these articles, as well as a wide variety of economic forecasting, empirical analysis, and technical papers to understand the validity, and impact of these reports. We also use this as an opportunity to contextualize the realized versus promised fruits of "AI".
- (00:23) - Hot off the press
- (01:49) - The end of scaling
- (10:50) - "Useful tools" and "agentic" "AI"
- (17:19) - The end of quantization
- (25:18) - Hedging
- (29:41) - The end of upwards mobility
- (33:12) - How to grow an economy
- (38:14) - Transformative & disruptive tech
- (49:19) - Finding the meaning
- (56:14) - Bursting AI bubble and Trump
- (01:00:58) - The muck
- The Information article - OpenAI Shifts Strategy as Rate of ‘GPT’ AI Improvements Slows
- Bloomberg [article] - OpenAI, Google and Anthropic Are Struggling to Build More Advanced AI
- Reuters article - OpenAI and others seek new path to smarter AI as current methods hit limitations
- Paper on the end of quantization - Scaling Laws for Precision
- Tim Dettmers Tweet on "Scaling Laws for Precision"
Empirical Analysis
- WU Vienna paper - Unslicing the pie: AI innovation and the labor share in European regions
- IMF paper - The Labor Market Impact of Artificial Intelligence: Evidence from US Regions
- NBER paper - Automation, Career Values, and Political Preferences
- Pew Research Center report - Which U.S. Workers Are More Exposed to AI on Their Jobs?
Forecasting
- NBER/Acemoglu paper - The Simple Macroeconomics of AI
- NBER/Acemoglu paper - Harms of AI
- IMF report - Gen-AI: Artificial Intelligence and the Future of Work
- Submission to Open Philanthropy AI Worldviews Contest - Transformative AGI by 2043 is <1% likely
Externalities and the Bursting Bubble
- NBER paper - Bubbles, Rational Expectations and Financial Markets
- Clayton Christensen lecture capture - Clayton Christensen: Disruptive innovation
- The New Republic article - The “Godfather of AI” Predicted I Wouldn’t Have a Job. He Was Wrong.
- Latent Space article - $2 H100s: How the GPU Rental Bubble Burst
On Productization
- Palantir press release on introduction of Claude to US security and defense
- Ars Technica article - Claude AI to process secret government data through new Palantir deal
- OpenAI press release on partnering with Condé Nast
- Candid Technology article - Shutterstock and Getty partner with OpenAI and BRIA
- E2B
- Stripe agents
- Robopair
O...
11/19/24 • 67 min
Winter is Coming for OpenAI
muckrAIkers
Brace yourselves, winter is coming for OpenAI - atleast, that's what we think. In this episode we look at OpenAI's recent massive funding round and ask "why would anyone want to fund a company that is set to lose net 5 billion USD for 2024?" We scrape through a whole lot of muck to find the meaningful signals in all this news, and there is a lot of it, so get ready!
- (00:00) - Intro
- (00:28) - Hot off the press
- (02:43) - Why listen?
- (06:07) - Why might VCs invest?
- (15:52) - What are people saying
- (23:10) - How *is* OpenAI making money?
- (28:18) - Is AI hype dying?
- (41:08) - Why might big companies invest?
- (48:47) - Concrete impacts of AI
- (52:37) - Outcome 1: OpenAI as a commodity
- (01:04:02) - Outcome 2: AGI
- (01:04:42) - Outcome 3: best plausible case
- (01:07:53) - Outcome 1*: many ways to bust
- (01:10:51) - Outcome 4+: shock factor
- (01:12:51) - What's the muck
- (01:21:17) - Extended outro
Links
- Reuters article - OpenAI closes $6.6 billion funding haul with investment from Microsoft and Nvidia
- Goldman Sachs report - GenAI: Too Much Spend, Too Little Benefit
- Apricitas Economics article - The AI Investment Boom
- Discussion of "The AI Investment Boom" on YCombinator
- State of AI in 13 Charts
- Fortune article - OpenAI sees $5 billion loss in 2024 and soaring sales as big ChatGPT fee hikes planned, report says
More on AI Hype (Dying)
- Latent Space article - The Winds of AI Winter
- Article by Gary Marcus - The Great AI Retrenchment has Begun
- TimmermanReport article - AI: If Not Now, When? No, Really - When?
- MIT News article - Who Will Benefit from AI?
- Washington Post article - The AI Hype bubble is deflating. Now comes the hard part.
- Andreesen Horowitz article - Why AI Will Save the World
Other Sources
- Human-Centered Artificial Intelligence Foundation Model Transparency Index
- Cointelegraph article - Europe gathers global experts to draft ‘Code of Practice’ for AI
- Reuters article - Microsoft's VP of GenAI research to join OpenAI
- Twitter post from Tim Brooks on joining DeepMind
- Edward Zitron article - The Man Who Killed Google Search
Brace yourselves, winter is coming for OpenAI - atleast, that's what we think. In this episode we look at OpenAI's recent massive funding round and ask "why would anyone want to fund a company that is set to lose net 5 billion USD for 2024?" We scrape through a whole lot of muck to find the meaningful signals in all this news, and there is a lot of it, so get ready!
- (00:00) - Intro
- (00:28) - Hot off the press
- (02:43) - Why listen?
- (06:07) - Why might VCs invest?
- (15:52) - What are people saying
- (23:10) - How *is* OpenAI making money?
- (28:18) - Is AI hype dying?
- (41:08) - Why might big companies invest?
- (48:47) - Concrete impacts of AI
- (52:37) - Outcome 1: OpenAI as a commodity
- (01:04:02) - Outcome 2: AGI
- (01:04:42) - Outcome 3: best plausible case
- (01:07:53) - Outcome 1*: many ways to bust
- (01:10:51) - Outcome 4+: shock factor
- (01:12:51) - What's the muck
- (01:21:17) - Extended outro
Links
- Reuters article - OpenAI closes $6.6 billion funding haul with investment from Microsoft and Nvidia
- Goldman Sachs report - GenAI: Too Much Spend, Too Little Benefit
- Apricitas Economics article - The AI Investment Boom
- Discussion of "The AI Investment Boom" on YCombinator
- State of AI in 13 Charts
- Fortune article - OpenAI sees $5 billion loss in 2024 and soaring sales as big ChatGPT fee hikes planned, report says
More on AI Hype (Dying)
- Latent Space article - The Winds of AI Winter
- Article by Gary Marcus - The Great AI Retrenchment has Begun
- TimmermanReport article - AI: If Not Now, When? No, Really - When?
- MIT News article - Who Will Benefit from AI?
- Washington Post article - The AI Hype bubble is deflating. Now comes the hard part.
- Andreesen Horowitz article - Why AI Will Save the World
Other Sources
- Human-Centered Artificial Intelligence Foundation Model Transparency Index
- Cointelegraph article - Europe gathers global experts to draft ‘Code of Practice’ for AI
- Reuters article - Microsoft's VP of GenAI research to join OpenAI
- Twitter post from Tim Brooks on joining DeepMind
- Edward Zitron article - The Man Who Killed Google Search
10/22/24 • 82 min
SB1047
muckrAIkers
Why is Mark Ruffalo talking about SB1047, and what is it anyway? Tune in for our thoughts on the now vetoed California legislation that had Big Tech scared.
- (00:00) - Intro
- (00:31) - Updates from a relatively slow week
- (03:32) - Disclaimer: SB1047 vetoed during recording (still worth a listen)
- (05:24) - What is SB1047
- (12:30) - Definitions
- (17:18) - Understanding the bill
- (28:42) - What are the players saying about it?
- (46:44) - Addressing critiques
- (55:59) - Open Source
- (01:02:36) - Takeaways
- (01:15:40) - Clarification on impact to big tech
- (01:18:51) - Outro
Links
- SB1047 legislation page
- SB1047 CalMatters page
- Newsom vetoes SB1047
- CAIS newsletter on SB1047
- Prominent AI nerd letter
- Anthropic's letter
- SB1047 ~explainer
Additional SB1047 Related Coverage
- Opposition to SB1047 'makes no sense'
- Newsom on SB1047
- Andreesen Horowitz on SB1047
- Classy move by Dan
- Ex-OpenAI employee says Altman doesn't want regulation
Other Sources
Why is Mark Ruffalo talking about SB1047, and what is it anyway? Tune in for our thoughts on the now vetoed California legislation that had Big Tech scared.
- (00:00) - Intro
- (00:31) - Updates from a relatively slow week
- (03:32) - Disclaimer: SB1047 vetoed during recording (still worth a listen)
- (05:24) - What is SB1047
- (12:30) - Definitions
- (17:18) - Understanding the bill
- (28:42) - What are the players saying about it?
- (46:44) - Addressing critiques
- (55:59) - Open Source
- (01:02:36) - Takeaways
- (01:15:40) - Clarification on impact to big tech
- (01:18:51) - Outro
Links
- SB1047 legislation page
- SB1047 CalMatters page
- Newsom vetoes SB1047
- CAIS newsletter on SB1047
- Prominent AI nerd letter
- Anthropic's letter
- SB1047 ~explainer
Additional SB1047 Related Coverage
- Opposition to SB1047 'makes no sense'
- Newsom on SB1047
- Andreesen Horowitz on SB1047
- Classy move by Dan
- Ex-OpenAI employee says Altman doesn't want regulation
Other Sources
09/30/24 • 79 min
How to Safely Handle Your AGI
muckrAIkers
While on the campaign trail, Trump made claims about repealing Biden's Executive Order on AI, but what will actually be changed when he gets into office? We take this opportunity to examine policies being discussed or implemented by leading governments around the world.
- (00:00) - Intro
- (00:29) - Hot off the press
- (02:59) - Repealing the AI executive order?
- (11:16) - "Manhattan" for AI
- (24:33) - EU
- (30:47) - UK
- (39:27) - Bengio
- (44:39) - Comparing EU/UK to USA
- (45:23) - China
- (51:12) - Taxes
- (55:29) - The muck
Links
- SFChronicle article - US gathers allies to talk AI safety as Trump's vow to undo Biden's AI policy overshadows their work
- Trump's Executive Order on AI (the AI governance executive order at home)
- Biden's Executive Order on AI
- Congressional report brief which advises a "Manhattan Project for AI"
Non-USA
- CAIRNE resource collection on CERN for AI
- UK Frontier AI Taskforce report (2023)
- International interim report (2024)
- Bengio's paper - AI and Catastrophic Risk
- Davidad's Safeguarded AI program at ARIA
- MIT Technology Review article - Four things to know about China’s new AI rules in 2024
- GovInsider article - Australia’s national policy for ethical use of AI starts to take shape
- Future of Privacy forum article - The African Union’s Continental AI Strategy: Data Protection and Governance Laws Set to Play a Key Role in AI Regulation
Taxes
- Macroeconomic Dynamics paper - Automation, Stagnation, and the Implications of a Robot Tax
- CESifo paper - AI, Automation, and Taxation
- GavTax article - Taxation of Artificial Intelligence and Automation
Perplexity Pages
- CERN for AI page
- China's AI policy page
- Singapore's AI policy page
- AI policy in Africa, India, Australia page
Other Sources
While on the campaign trail, Trump made claims about repealing Biden's Executive Order on AI, but what will actually be changed when he gets into office? We take this opportunity to examine policies being discussed or implemented by leading governments around the world.
- (00:00) - Intro
- (00:29) - Hot off the press
- (02:59) - Repealing the AI executive order?
- (11:16) - "Manhattan" for AI
- (24:33) - EU
- (30:47) - UK
- (39:27) - Bengio
- (44:39) - Comparing EU/UK to USA
- (45:23) - China
- (51:12) - Taxes
- (55:29) - The muck
Links
- SFChronicle article - US gathers allies to talk AI safety as Trump's vow to undo Biden's AI policy overshadows their work
- Trump's Executive Order on AI (the AI governance executive order at home)
- Biden's Executive Order on AI
- Congressional report brief which advises a "Manhattan Project for AI"
Non-USA
- CAIRNE resource collection on CERN for AI
- UK Frontier AI Taskforce report (2023)
- International interim report (2024)
- Bengio's paper - AI and Catastrophic Risk
- Davidad's Safeguarded AI program at ARIA
- MIT Technology Review article - Four things to know about China’s new AI rules in 2024
- GovInsider article - Australia’s national policy for ethical use of AI starts to take shape
- Future of Privacy forum article - The African Union’s Continental AI Strategy: Data Protection and Governance Laws Set to Play a Key Role in AI Regulation
Taxes
- Macroeconomic Dynamics paper - Automation, Stagnation, and the Implications of a Robot Tax
- CESifo paper - AI, Automation, and Taxation
- GavTax article - Taxation of Artificial Intelligence and Automation
Perplexity Pages
- CERN for AI page
- China's AI policy page
- Singapore's AI policy page
- AI policy in Africa, India, Australia page
Other Sources
12/02/24 • 58 min
Show more best episodes

Show more best episodes
FAQ
How many episodes does muckrAIkers have?
muckrAIkers currently has 13 episodes available.
What topics does muckrAIkers cover?
The podcast is about Mathematics, Podcasts, Technology, Science and Artificial Intelligence.
What is the most popular episode on muckrAIkers?
The episode title 'OpenAI's o1, aka. Strawberry' is the most popular.
What is the average episode length on muckrAIkers?
The average episode length on muckrAIkers is 77 minutes.
How often are episodes of muckrAIkers released?
Episodes of muckrAIkers are typically released every 13 days.
When was the first episode of muckrAIkers?
The first episode of muckrAIkers was released on Sep 23, 2024.
Show more FAQ

Show more FAQ