
Episode 7: There Are Now 15 Competing Evaluation Metrics (ft. Dr. Jeremy Kahn). December 12, 2022
Explicit content warning
07/26/23 • 63 min
Emily and Alex are joined by Dr. Jeremy G. Kahn to discuss the distressingly large number of evaluation metrics for artificial intelligence, and some new AI hell.
Jeremy G. Kahn has a PhD in computational linguistics, with a focus on information-theoretic and empirical engineering approaches to dealing with natural language (in text and speech). He’s gregarious, polyglot, a semi-auto-didact, and occasionally prolix. He also likes comic books, coffee, progressive politics, information theory, lateral thinking, science fiction, science fact, linear thinking, bicycles, beer, meditation, love, play, and inquiry. He lives in Seattle with his wife Dorothy and son Elliott.
This episode was recorded on December 12, 2022.
Watch the video of this episode on PeerTube.
References:
XKCD: Standards
The Bender Rule
DJ Khaled - You Played Yourself
Jeff Kao's interrogation of public comment periods.
Emily's blog post response to NYT piece
Check out future streams at on Twitch, Meanwhile, send us any AI Hell you see.
Our book, 'The AI Con,' comes out in May! Pre-order now.
Subscribe to our newsletter via Buttondown.
Follow us!
Emily
- Bluesky: emilymbender.bsky.social
- Mastodon: dair-community.social/@EmilyMBender
Alex
- Bluesky: alexhanna.bsky.social
- Mastodon: dair-community.social/@alex
- Twitter: @alexhanna
Music by Toby Menon.
Artwork by Naomi Pleasure-Park.
Production by Christie Taylor.
Emily and Alex are joined by Dr. Jeremy G. Kahn to discuss the distressingly large number of evaluation metrics for artificial intelligence, and some new AI hell.
Jeremy G. Kahn has a PhD in computational linguistics, with a focus on information-theoretic and empirical engineering approaches to dealing with natural language (in text and speech). He’s gregarious, polyglot, a semi-auto-didact, and occasionally prolix. He also likes comic books, coffee, progressive politics, information theory, lateral thinking, science fiction, science fact, linear thinking, bicycles, beer, meditation, love, play, and inquiry. He lives in Seattle with his wife Dorothy and son Elliott.
This episode was recorded on December 12, 2022.
Watch the video of this episode on PeerTube.
References:
XKCD: Standards
The Bender Rule
DJ Khaled - You Played Yourself
Jeff Kao's interrogation of public comment periods.
Emily's blog post response to NYT piece
Check out future streams at on Twitch, Meanwhile, send us any AI Hell you see.
Our book, 'The AI Con,' comes out in May! Pre-order now.
Subscribe to our newsletter via Buttondown.
Follow us!
Emily
- Bluesky: emilymbender.bsky.social
- Mastodon: dair-community.social/@EmilyMBender
Alex
- Bluesky: alexhanna.bsky.social
- Mastodon: dair-community.social/@alex
- Twitter: @alexhanna
Music by Toby Menon.
Artwork by Naomi Pleasure-Park.
Production by Christie Taylor.
Previous Episode

Episode 6: Stochastic Parrot Galactica, November 23, 2022
Emily and Alex discuss MetaAI's bullshit science paper generator, Galactica, along with its defenders. Plus, where could AI actually help scientific research? And more Fresh AI Hell.
Watch the video of this episode on PeerTube.
References:
Imre Lakatos on research programs
Shah, Chirag and Emily M. Bender. 2022. Situating Search. Proceedings of the 2022 ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR ’22).
UW RAISE (Responsibility in AI Systems and Experiences)
Stochastic Parrots:
Bender, Emily M., Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜. In Proceedings of FAccT 2021, pp.610-623.
The Octopus Paper:
Bender Emily M. and Alexander Koller. 2020. Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data. ACL 2020
Palestinian man arrested because of bad machine translation.
Katherine McKittrick, Dear Science and Other StoriesThe Sokal Hoax
Safiya Noble, Algorithms of OppressionLatanya Sweeney, "Discrimination in Online Ad Delivery"
Mehtab Khan and Alex Hanna, The Subjects and Stages of AI Dataset Development: A Framework for Dataset Accountability
(What is 'sealioning'?)
http://wondermark.com/1k62/
Grover:
Raji, Inioluwa Deborah, Emily M. Bender, Amandalynne Paullada, Emily Denton and Alex Hanna. 2021. AI and the Everything in the Whole Wide World Benchmark. Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks.
Ben Dickson's coverage of Grover:
Why we must rethink AI benchmarks, Ben Dickson, Tec
Check out future streams at on Twitch, Meanwhile, send us any AI Hell you see.
Our book, 'The AI Con,' comes out in May! Pre-order now.
Subscribe to our newsletter via Buttondown.
Follow us!
Emily
- Bluesky: emilymbender.bsky.social
- Mastodon: dair-community.social/@EmilyMBender
Alex
- Bluesky: alexhanna.bsky.social
- Mastodon: dair-community.social/@alex
- Twitter: @alexhanna
Music by Toby Menon.
Artwork by Naomi Pleasure-Park.
Production by Christie Taylor.
Next Episode

Episode 8: The ChatGPT Awakens, January 20, 2023
New year, new hype? As the world gets swept up in the fervor over ChatGPT of late 2022, Emily and Alex give a deep sigh and begin to unpack the wave of fresh enthusiasm over large language models and the "chat" format specifically.
Plus, more fresh AI hell.
This episode was recorded on January 20, 2023.
Watch the video of this episode on PeerTube.
References:
Situating Search (Shah & Bender 2022)
Related op-ed: https://iai.tv/articles/all-knowing-machines-are-a-fantasy-auid-2334
Piantadosi's thread showing ChatGPT writing a program to classify white males as good scientists
Find Anna Lauren Hoffman's publications (though not yet the one we were referring to) here: https://www.annaeveryday.com/publications
Sarah T. Roberts, Behind the Screen
Karen Hao's AI Colonialism series
Milagros Miceli: https://www.weizenbaum-institut.de/en/spezialseiten/persons-details/p/milagros-miceli/
Julian Posada: https://posada.website/
“This Isn’t Your Data, Friend”: Black Twitter as a Case Study on Research Ethics for Public Data (Klassen & Fiesler 2022)
No Humans Here: Ethical Speculation on Public Data, Unintended Consequences, and the Limits of Institutional Review (Pater, Fiesler & Zimmer 2022)
Casey Fiesler's publications: https://caseyfiesler.com/publications/
And TikTok: https://www.tiktok.com/@professorcasey
Where are human subjects in Big Data research? The emerging ethics divide. (Metcalf & Crawford 2016)
Check out future streams at on Twitch, Meanwhile, send us any AI Hell you see.
Our book, 'The AI Con,' comes out in May! Pre-order now.
Subscribe to our newsletter via Buttondown.
Follow us!
Emily
- Bluesky: emilymbender.bsky.social
- Mastodon: dair-community.social/@EmilyMBender
Alex
- Bluesky: alexhanna.bsky.social
- Mastodon: dair-community.social/@alex
- Twitter: @alexhanna
Music by Toby Menon.
Artwork by Naomi Pleasure-Park.
Production by Christie Taylor.
Mystery AI Hype Theater 3000 - Episode 7: There Are Now 15 Competing Evaluation Metrics (ft. Dr. Jeremy Kahn). December 12, 2022
Transcript
ALEX: Welcome everyone!...to Mystery AI Hype Theater 3000, where we seek catharsis in this age of AI hype! We find the worst of it and pop it with the sharpest needles we can find.
EMILY: Along the way, we learn to always read the footnotes. And each time we think we’ve reached peak AI hype -- the summit of bullshit mountain -- we discover there’s worse to come.
I’m Emily M. Bender, a professor of linguistics at the University of Washington.
ALEX: And I’m Alex Hanna,
If you like this episode you’ll love
Episode Comments
Generate a badge
Get a badge for your website that links back to this episode
<a href="https://goodpods.com/podcasts/mystery-ai-hype-theater-3000-266804/episode-7-there-are-now-15-competing-evaluation-metrics-ft-dr-jeremy-k-31912920"> <img src="https://storage.googleapis.com/goodpods-images-bucket/badges/generic-badge-1.svg" alt="listen to episode 7: there are now 15 competing evaluation metrics (ft. dr. jeremy kahn). december 12, 2022 on goodpods" style="width: 225px" /> </a>
Copy