Log in

goodpods headphones icon

To access all our features

Open the Goodpods app
Close icon
headphones
muckrAIkers

muckrAIkers

Jacob Haimes and Igor Krawczuk

Join us as we dig a tiny bit deeper into the hype surrounding "AI" press releases, research papers, and more. Each episode, we'll highlight ongoing research and investigations, providing some much needed contextualization, constructive critique, and even a smidge of occasional good will teasing to the conversation, trying to find the meaning under all of this muck.
Share icon

All episodes

Best episodes

Top 10 muckrAIkers Episodes

Goodpods has curated a list of the 10 best muckrAIkers episodes, ranked by the number of listens and likes each episode have garnered from our listeners. If you are listening to muckrAIkers for the first time, there's no better place to start than with one of these standout episodes. If you are a fan of the show, vote for your favorite muckrAIkers episode by adding your comments to the episode page.

muckrAIkers - NeurIPS 2024 Wrapped 🌯
play

12/30/24 • 86 min

What happens when you bring over 15,000 machine learning nerds to one city? If your guess didn't include racism, sabotage and scandal, belated epiphanies, a spicy SoLaR panel, and many fantastic research papers, you wouldn't have captured my experience. In this episode we discuss the drama and takeaways from NeurIPS 2024.

Posters available at time of episode preparation can be found on the episode webpage.

EPISODE RECORDED 2024.12.22

  • (00:00) - Recording date
  • (00:05) - Intro
  • (00:44) - Obligatory mentions
  • (01:54) - SoLaR panel
  • (18:43) - Test of Time
  • (24:17) - And now: science!
  • (28:53) - Downsides of benchmarks
  • (41:39) - Improving the science of ML
  • (53:07) - Performativity
  • (57:33) - NopenAI and Nanthropic
  • (01:09:35) - Fun/interesting papers
  • (01:13:12) - Initial takes on o3
  • (01:18:12) - WorkArena
  • (01:25:00) - Outro

Links

Note: many workshop papers had not yet been published to arXiv as of preparing this episode, the OpenReview submission page is provided in these cases.

  • NeurIPS statement on inclusivity
  • CTOL Digital Solutions article - NeurIPS 2024 Sparks Controversy: MIT Professor's Remarks Ignite "Racism" Backlash Amid Chinese Researchers’ Triumphs
  • (1/2) NeurIPS Best Paper - Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
  • Visual Autoregressive Model report this link now provides a 404 error
  • Reuters article - ByteDance seeks $1.1 mln damages from intern in AI breach case, report says
  • CTOL Digital Solutions article - NeurIPS Award Winner Entangled in ByteDance's AI Sabotage Accusations: The Two Tales of an AI Genius
  • Reddit post on Ilya's talk
  • SoLaR workshop page

Referenced Sources

  • Harvard Data Science Review article - Data Science at the Singularity
  • Paper - Reward Reports for Reinforcement Learning
  • Paper - It's Not What Machines Can Learn, It's What We Cannot Teach
  • Paper - NeurIPS Reproducibility Program
  • Paper - A Metric Learning Reality Check

Improving Datasets, Benchmarks, and Measurements

  • Tutorial video + slides - Experimental Design and Analysis for AI Researchers (I think you need to have attended NeurIPS to access the recording, but I couldn't find a different version)
  • Paper - BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best Practices
  • Paper - Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
  • Paper - A Systematic Review of NeurIPS Dataset Management Practices
  • Paper - The State of Data Curation at NeurIPS: An Assessment of Dataset Development Practices in the Datasets and Benchmarks Track
  • Paper - Benchmark Repositories for Better Benchmarking
  • Paper - Croissant: A Metadata Format for ML-Ready Datasets
  • Paper - Rethinking the Evaluation of Out-of-Distribution Detection: A Sorites Paradox
  • Paper - Evaluating Generative AI Systems is a Social Science Measurement Challenge
  • bookmark
    plus icon
    share episode
muckrAIkers - OpenAI's o1, aka. Strawberry
play

09/23/24 • 50 min

OpenAI's new model is out, and we are going to have to rake through a lot of muck to get the value out of this one!

⚠ Opt out of LinkedIn's GenAI scraping ➡️ https://lnkd.in/epziUeTi

  • (00:00) - Intro
  • (00:25) - Other recent news
  • (02:57) - Hot off the press
  • (03:58) - Why might someone care?
  • (04:52) - What is it?
  • (06:49) - How is it being sold?
  • (10:45) - How do they explain it, technically?
  • (27:09) - Reflection AI Drama
  • (40:19) - Why do we care?
  • (46:39) - Scraping away the muck

Note: at around 32 minutes, Igor says the incorrect Llama model version for the story he is telling. Jacob dubbed over those mistakes with the correct versioning.

Links relating to o1

Other stuff we mention

bookmark
plus icon
share episode
muckrAIkers - Open Source AI and 2024 Nobel Prizes
play

10/16/24 • 61 min

The Open Source AI Definition is out after years of drafting, will it reestablish brand meaning for the “Open Source” term? Also, the 2024 Nobel Prizes in Physics and Chemistry are heavily tied to AI; we scrutinize not only this year's prizes, but also Nobel Prizes as a concept.

  • (00:00) - Intro
  • (00:30) - Hot off the press
  • (03:45) - Open Source AI background
  • (10:30) - Definitions and changes in RC1
  • (18:36) - “Business source”
  • (22:17) - Parallels with legislation
  • (26:22) - Impacts of the OSAID
  • (33:58) - 2024 Nobel Prize Context
  • (37:21) - Chemistry prize
  • (45:06) - Physics prize
  • (50:29) - Takeaways
  • (52:03) - What’s the real muck?
  • (01:00:27) - Outro

Links

More Reading on Open Source AI

On Nobel Prizes

Other Sources

bookmark
plus icon
share episode
muckrAIkers - Understanding Claude 3.5 Sonnet (New)
play

10/30/24 • 60 min

Frontier developers continue their war on sane versioning schema to bring us Claude 3.5 Sonnet (New), along with "computer use" capabilities. We discuss not only the new model, but also why Anthropic may have released this model and tool combination now.

  • (00:00) - Intro
  • (00:22) - Hot off the press
  • (05:03) - Claude 3.5 Sonnet (New) Two 'o' 3000
  • (09:23) - Breaking down "computer use"
  • (13:16) - Our understanding
  • (16:03) - Diverging business models
  • (32:07) - Why has Anthropic chosen this strategy?
  • (43:14) - Changing the frame
  • (48:00) - Polishing the lily

Links

Other Anthropic Relevant Media

Other Sources

bookmark
plus icon
share episode
muckrAIkers - Understanding AI World Models w/ Chris Canal
play

01/27/25 • 199 min

Chris Canal, co-founder of EquiStamp, joins muckrAIkers as our first ever podcast guest! In this ~3.5 hour interview, we discuss intelligence vs. competencies, the importance of test-time compute, moving goalposts, the orthogonality thesis, and much more.

A seasoned software developer, Chris started EquiStamp as a way to improve our current understanding of model failure modes and capabilities in late 2023. Now a key contractor for METR, EquiStamp evaluates the next generation of LLMs from frontier model developers like OpenAI and Anthropic.

EquiStamp is hiring, so if you're a software developer interested in a fully remote opportunity with flexible working hours, join the EquiStamp Discord server and message Chris directly; oh, and let him know muckrAIkers sent you!

  • (00:00) - Recording date
  • (00:05) - Intro
  • (00:29) - Hot off the press
  • (02:17) - Introducing Chris Canal
  • (19:12) - World/risk models
  • (35:21) - Competencies + decision making power
  • (42:09) - Breaking models down
  • (01:05:06) - Timelines, test time compute
  • (01:19:17) - Moving goalposts
  • (01:26:34) - Risk management pre-AGI
  • (01:46:32) - Happy endings
  • (01:55:50) - Causal chains
  • (02:04:49) - Appetite for democracy
  • (02:20:06) - Tech-frame based fallacies
  • (02:39:56) - Bringing back real capitalism
  • (02:45:23) - Orthogonality Thesis
  • (03:04:31) - Why we do this
  • (03:15:36) - Equistamp!

Links

  • EquiStamp
  • Chris's Twitter
  • METR Paper - RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts
  • All Trades article - Learning from History: Preventing AGI Existential Risks through Policy by Chris Canal
  • Better Systems article - The Omega Protocol: Another Manhattan Project

Superintelligence & Commentary

  • Wikipedia article - Superintelligence: Paths, Dangers, Strategies by Nick Bostrom
  • Reflective Altruism article - Against the singularity hypothesis (Part 5: Bostrom on the singularity)
  • Into AI Safety Interview - Scaling Democracy w/ Dr. Igor Krawczuk

Referenced Sources

  • Book - Man-made Catastrophes and Risk Information Concealment: Case Studies of Major Disasters and Human Fallibility
  • Artificial Intelligence Paper - Reward is Enough
  • Wikipedia article - Capital and Ideology by Thomas Piketty
  • Wikipedia article - Pantheon

LeCun on AGI

  • "Won't Happen" - Time article - Meta’s AI Chief Yann LeCun on AGI, Open-Source, and AI Risk
  • "But if it does, it'll be my research agenda latent state models, which I happen to research" - Meta Platforms Blogpost - I-JEPA: The first AI model based on Yann LeCun’s vision for more human-like AI

Other Sources

  • Stanford CS Senior Project - Timing Attacks on Prompt Caching in Language Model APIs
  • TechCrunch article - AI researcher François Chollet founds a new AI lab focused on AGI
  • White House Fact Sheet - Ensuring U.S. Security and Economic Strength in the Age of Artificial Intelligence
  • New York Post
    bookmark
    plus icon
    share episode
muckrAIkers - DeepSeek Minisode
play

02/10/25 • 15 min

DeepSeek R1 has taken the world by storm, causing a stock market crash and prompting further calls for export controls within the US. Since this story is still very much in development, with follow-up investigations and calls for governance being released almost daily, we thought it best to hold of for a little while longer to be able to tell the whole story. Nonetheless, it's a big story, so we provide a brief overview of all that's out there so far.

  • (00:00) - Recording date
  • (00:04) - Intro
  • (00:37) - DeepSeek drop and reactions
  • (04:27) - Export controls
  • (08:05) - Skepticism and uncertainty
  • (14:12) - Outro

Links
  • DeepSeek website
  • DeepSeek paper
  • Reuters article - What is DeepSeek and why is it disrupting the AI sector?

Fallout coverage

  • The Verge article - OpenAI has evidence that its models helped train China’s DeepSeek
  • The Signal article - Nvidia loses nearly $600 billion in DeepSeek crash
  • CNN article - US lawmakers want to ban DeepSeek from government devices
  • Fortune article - Meta is reportedly scrambling ‘war rooms’ of engineers to figure out how DeepSeek’s AI is beating everyone else at a fraction of the price
  • Dario Amodei's blogpost - On DeepSeek and Export Controls
  • SemiAnalysis article - DeepSeek Debates
  • Ars Technica article - Microsoft now hosts AI model accused of copying OpenAI data
  • Wiz Blogpost - Wiz Research Uncovers Exposed DeepSeek Database Leaking Sensitive Information, Including Chat History

Investigations into "reasoning"

  • Blogpost - There May Not be Aha Moment in R1-Zero-like Training — A Pilot Study
  • Preprint - s1: Simple test-time scaling
  • Preprint - LIMO: Less is More for Reasoning
  • Blogpost - Reasoning Reflections
  • Preprint - Token-Hungry, Yet Precise: DeepSeek R1 Highlights the Need for Multi-Step Reasoning Over Speed in MATH
bookmark
plus icon
share episode
muckrAIkers - The End of Scaling?
play

11/19/24 • 67 min

Multiple news outlets, including The Information, Bloomberg, and Reuters [see sources] are reporting an "end of scaling" for the current AI paradigm. In this episode we look into these articles, as well as a wide variety of economic forecasting, empirical analysis, and technical papers to understand the validity, and impact of these reports. We also use this as an opportunity to contextualize the realized versus promised fruits of "AI".

  • (00:23) - Hot off the press
  • (01:49) - The end of scaling
  • (10:50) - "Useful tools" and "agentic" "AI"
  • (17:19) - The end of quantization
  • (25:18) - Hedging
  • (29:41) - The end of upwards mobility
  • (33:12) - How to grow an economy
  • (38:14) - Transformative & disruptive tech
  • (49:19) - Finding the meaning
  • (56:14) - Bursting AI bubble and Trump
  • (01:00:58) - The muck
Links
  • The Information article - OpenAI Shifts Strategy as Rate of ‘GPT’ AI Improvements Slows
  • Bloomberg [article] - OpenAI, Google and Anthropic Are Struggling to Build More Advanced AI
  • Reuters article - OpenAI and others seek new path to smarter AI as current methods hit limitations
  • Paper on the end of quantization - Scaling Laws for Precision
  • Tim Dettmers Tweet on "Scaling Laws for Precision"

Empirical Analysis

  • WU Vienna paper - Unslicing the pie: AI innovation and the labor share in European regions
  • IMF paper - The Labor Market Impact of Artificial Intelligence: Evidence from US Regions
  • NBER paper - Automation, Career Values, and Political Preferences
  • Pew Research Center report - Which U.S. Workers Are More Exposed to AI on Their Jobs?

Forecasting

  • NBER/Acemoglu paper - The Simple Macroeconomics of AI
  • NBER/Acemoglu paper - Harms of AI
  • IMF report - Gen-AI: Artificial Intelligence and the Future of Work
  • Submission to Open Philanthropy AI Worldviews Contest - Transformative AGI by 2043 is <1% likely

Externalities and the Bursting Bubble

  • NBER paper - Bubbles, Rational Expectations and Financial Markets
  • Clayton Christensen lecture capture - Clayton Christensen: Disruptive innovation
  • The New Republic article - The “Godfather of AI” Predicted I Wouldn’t Have a Job. He Was Wrong.
  • Latent Space article - $2 H100s: How the GPU Rental Bubble Burst

On Productization

  • Palantir press release on introduction of Claude to US security and defense
  • Ars Technica article - Claude AI to process secret government data through new Palantir deal
  • OpenAI press release on partnering with Condé Nast
  • Candid Technology article - Shutterstock and Getty partner with OpenAI and BRIA
  • E2B
  • Stripe agents
  • Robopair

O...

bookmark
plus icon
share episode
muckrAIkers - Winter is Coming for OpenAI
play

10/22/24 • 82 min

Brace yourselves, winter is coming for OpenAI - atleast, that's what we think. In this episode we look at OpenAI's recent massive funding round and ask "why would anyone want to fund a company that is set to lose net 5 billion USD for 2024?" We scrape through a whole lot of muck to find the meaningful signals in all this news, and there is a lot of it, so get ready!

  • (00:00) - Intro
  • (00:28) - Hot off the press
  • (02:43) - Why listen?
  • (06:07) - Why might VCs invest?
  • (15:52) - What are people saying
  • (23:10) - How *is* OpenAI making money?
  • (28:18) - Is AI hype dying?
  • (41:08) - Why might big companies invest?
  • (48:47) - Concrete impacts of AI
  • (52:37) - Outcome 1: OpenAI as a commodity
  • (01:04:02) - Outcome 2: AGI
  • (01:04:42) - Outcome 3: best plausible case
  • (01:07:53) - Outcome 1*: many ways to bust
  • (01:10:51) - Outcome 4+: shock factor
  • (01:12:51) - What's the muck
  • (01:21:17) - Extended outro

Links

More on AI Hype (Dying)

Other Sources

bookmark
plus icon
share episode
muckrAIkers - SB1047

SB1047

muckrAIkers

play

09/30/24 • 79 min

Why is Mark Ruffalo talking about SB1047, and what is it anyway? Tune in for our thoughts on the now vetoed California legislation that had Big Tech scared.

  • (00:00) - Intro
  • (00:31) - Updates from a relatively slow week
  • (03:32) - Disclaimer: SB1047 vetoed during recording (still worth a listen)
  • (05:24) - What is SB1047
  • (12:30) - Definitions
  • (17:18) - Understanding the bill
  • (28:42) - What are the players saying about it?
  • (46:44) - Addressing critiques
  • (55:59) - Open Source
  • (01:02:36) - Takeaways
  • (01:15:40) - Clarification on impact to big tech
  • (01:18:51) - Outro

Links

Additional SB1047 Related Coverage

Other Sources

bookmark
plus icon
share episode
muckrAIkers - How  to Safely Handle Your AGI
play

12/02/24 • 58 min

While on the campaign trail, Trump made claims about repealing Biden's Executive Order on AI, but what will actually be changed when he gets into office? We take this opportunity to examine policies being discussed or implemented by leading governments around the world.

  • (00:00) - Intro
  • (00:29) - Hot off the press
  • (02:59) - Repealing the AI executive order?
  • (11:16) - "Manhattan" for AI
  • (24:33) - EU
  • (30:47) - UK
  • (39:27) - Bengio
  • (44:39) - Comparing EU/UK to USA
  • (45:23) - China
  • (51:12) - Taxes
  • (55:29) - The muck

Links
  • SFChronicle article - US gathers allies to talk AI safety as Trump's vow to undo Biden's AI policy overshadows their work
  • Trump's Executive Order on AI (the AI governance executive order at home)
  • Biden's Executive Order on AI
  • Congressional report brief which advises a "Manhattan Project for AI"

Non-USA

  • CAIRNE resource collection on CERN for AI
  • UK Frontier AI Taskforce report (2023)
  • International interim report (2024)
  • Bengio's paper - AI and Catastrophic Risk
  • Davidad's Safeguarded AI program at ARIA
  • MIT Technology Review article - Four things to know about China’s new AI rules in 2024
  • GovInsider article - Australia’s national policy for ethical use of AI starts to take shape
  • Future of Privacy forum article - The African Union’s Continental AI Strategy: Data Protection and Governance Laws Set to Play a Key Role in AI Regulation

Taxes

  • Macroeconomic Dynamics paper - Automation, Stagnation, and the Implications of a Robot Tax
  • CESifo paper - AI, Automation, and Taxation
  • GavTax article - Taxation of Artificial Intelligence and Automation

Perplexity Pages

  • CERN for AI page
  • China's AI policy page
  • Singapore's AI policy page
  • AI policy in Africa, India, Australia page

Other Sources

  • Artificial Intelligence Made Simple article - NYT's "AI Outperforms Doctors" Story Is Wrong
  • Intel report - Reclaim Your Day: The Impact of AI PCs on Productivity
  • Heise Online article - Users on AI PCs slower, Intel sees problem in unenlightened users
  • The Hacker News
    bookmark
    plus icon
    share episode