
Deep Papers
Arize AI
Deep Papers is a podcast series featuring deep dives on today’s most important AI papers and research. Hosted by Arize AI founders and engineers, each episode profiles the people and techniques behind cutting-edge breakthroughs in machine learning.
1 Listener
All episodes
Best episodes
Seasons
Top 10 Deep Papers Episodes
Goodpods has curated a list of the 10 best Deep Papers episodes, ranked by the number of listens and likes each episode have garnered from our listeners. If you are listening to Deep Papers for the first time, there's no better place to start than with one of these standout episodes. If you are a fan of the show, vote for your favorite Deep Papers episode by adding your comments to the episode page.

07/31/23 • 30 min
Deep Papers is a podcast series featuring deep dives on today’s seminal AI papers and research. Each episode profiles the people and techniques behind cutting-edge breakthroughs in machine learning. This episode is led by Aparna Dhinakaran ( Chief Product Officer, Arize AI) and Michael Schiff (Chief Technology Officer, Arize AI), as they discuss the paper "Llama 2: Open Foundation and Fine-Tuned Chat Models."
In this paper reading, we explore the paper “Developing Llama 2: Pretrained Large Language Models Optimized for Dialogue.” The paper introduces Llama 2, a collection of pretrained and fine-tuned large language models ranging from 7 billion to 70 billion parameters. Their fine-tuned model, Llama 2-Chat, is specifically designed for dialogue use cases and showcases superior performance on various benchmarks. Through human evaluations for helpfulness and safety, Llama 2-Chat emerges as a promising alternative to closed-source models. Discover the approach to fine-tuning and safety improvements, allowing us to foster responsible development and contribute to this rapidly evolving field.
Full transcript and more here: https://arize.com/blog/llama-2-open-foundation-and-fine-tuned-chat-models-paper-reading/
Follow AI__Pub on Twitter. To learn more about ML observability, join the Arize AI Slack community or get the latest on our LinkedIn and Twitter.
Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.
1 Listener

07/21/23 • 42 min
Deep Papers is a podcast series featuring deep dives on today’s seminal AI papers and research. Hosted by AI Pub creator Brian Burns and Arize AI founders Jason Lopatecki and Aparna Dhinakaran, each episode profiles the people and techniques behind cutting-edge breakthroughs in machine learning.
In this episode, we talk about Orca. Recent research focuses on improving smaller models through imitation learning using outputs from large foundation models (LFMs). Challenges include limited imitation signals, homogeneous training data, and a lack of rigorous evaluation, leading to overestimation of small model capabilities.
To address this, Orca is a 13-billion parameter model that learns to imitate LFMs’ reasoning process. Orca leverages rich signals from GPT-4, surpassing state-of-the-art models by over 100% in complex zero-shot reasoning benchmarks. It also shows competitive performance in professional and academic exams without CoT. Learning from step-by-step explanations, generated by humans or advanced AI models, enhances model capabilities and skills.
Full transcript and more here: https://arize.com/blog/orca-progressive-learning-from-complex-explanation-traces-of-gpt-4-paper-reading/
Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.
1 Listener

04/04/25 • 26 min
This week we talk about modern AI benchmarks, taking a close look at Google's recent Gemini 2.5 release and its performance on key evaluations, notably Humanity's Last Exam (HLE). In the session we covered Gemini 2.5's architecture, its advancements in reasoning and multimodality, and its impressive context window. We also talked about how benchmarks like HLE and ARC AGI 2 help us understand the current state and future direction of AI.
Read it on the blog: https://arize.com/blog/ai-benchmark-deep-dive-gemini-humanitys-last-exam/
Sign up to watch the next live recording: https://arize.com/resource/community-papers-reading/
Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.
1 Listener

10/16/24 • 3 min
In this byte-sized podcast, Harrison Chu, Director of Engineering at Arize, breaks down the Shrek Sampler.
This innovative Entropy-Based Sampling technique--nicknamed the 'Shrek Sampler--is transforming LLMs. Harrison talks about how this method improves upon traditional sampling strategies by leveraging entropy and varentropy to produce more dynamic and intelligent responses. Explore its potential to enhance open-source AI models and enable human-like reasoning in smaller language models.
Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.
1 Listener

03/01/24 • 45 min
This week, we discuss the implications of Text-to-Video Generation and speculate as to the possibilities (and limitations) of this incredible technology with some hot takes. Dat Ngo, ML Solutions Engineer at Arize, is joined by community member and AI Engineer Vibhu Sapra to review OpenAI’s technical report on their Text-To-Video Generation Model: Sora.
According to OpenAI, “Sora can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt.” At the time of this recording, the model had not been widely released yet, but was becoming available to red teamers to assess risk, and also to artists to receive feedback on how Sora could be helpful for creatives.
At the end of our discussion, we also explore EvalCrafter: Benchmarking and Evaluating Large Video Generation Models. This recent paper proposed a new framework and pipeline to exhaustively evaluate the performance of the generated videos, which we look at in light of Sora.
Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.
1 Listener

The Geometry of Truth: Emergent Linear Structure in LLM Representation of True/False Datasets
Deep Papers
11/30/23 • 41 min
For this paper read, we’re joined by Samuel Marks, Postdoctoral Research Associate at Northeastern University, to discuss his paper, “The Geometry of Truth: Emergent Linear Structure in LLM Representation of True/False Datasets.” Samuel and his team curated high-quality datasets of true/false statements and used them to study in detail the structure of LLM representations of truth. Overall, they present evidence that language models linearly represent the truth or falsehood of factual statements and also introduce a novel technique, mass-mean probing, which generalizes better and is more causally implicated in model outputs than other probing techniques.
Find the transcript and read more here: https://arize.com/blog/the-geometry-of-truth-emergent-linear-structure-in-llm-representation-of-true-false-datasets-paper-reading/
Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.

10/18/23 • 43 min
We discuss RankVicuna, the first fully open-source LLM capable of performing high-quality listwise reranking in a zero-shot setting. While researchers have successfully applied LLMs such as ChatGPT to reranking in an information retrieval context, such work has mostly been built on proprietary models hidden behind opaque API endpoints. This approach yields experimental results that are not reproducible and non-deterministic, threatening the veracity of outcomes that build on such shaky foundations. RankVicuna provides access to a fully open-source LLM and associated code infrastructure capable of performing high-quality reranking.
Find the transcript and more here: https://arize.com/blog/rankvicuna-paper-reading/
Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.

01/18/23 • 47 min
Deep Papers is a podcast series featuring deep dives on today’s seminal AI papers and research. Hosted by AI Pub creator Brian Burns and Arize AI founders Jason Lopatecki and Aparna Dhinakaran, each episode profiles the people and techniques behind cutting-edge breakthroughs in machine learning.
In this first episode, we’re joined by Long Ouyang and Ryan Lowe, research scientists at OpenAI and creators of InstructGPT. InstructGPT was one of the first major applications of Reinforcement Learning with Human Feedback to train large language models, and is the precursor to the now-famous ChatGPT. Listen to learn about the major ideas behind InstructGPT and the future of aligning language models to human intention.
Read OpenAI's InstructGPT paper here: https://openai.com/blog/instruction-following/
Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.

RAG vs Fine-Tuning
Deep Papers
02/08/24 • 39 min
This week, we’re discussing "RAG vs Fine-Tuning: Pipelines, Tradeoff, and a Case Study on Agriculture." This paper explores a pipeline for fine-tuning and RAG, and presents the tradeoffs of both for multiple popular LLMs, including Llama2-13B, GPT-3.5, and GPT-4.
The authors propose a pipeline that consists of multiple stages, including extracting information from PDFs, generating questions and answers, using them for fine-tuning, and leveraging GPT-4 for evaluating the results. Overall, the results point to how systems built using LLMs can be adapted to respond and incorporate knowledge across a dimension that is critical for a specific industry, paving the way for further applications of LLMs in other industrial domains.
Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.

10/17/23 • 36 min
Join Arize Co-Founder & CEO Jason Lopatecki, and ML Solutions Engineer, Sally-Ann DeLucia, as they discuss “Explaining Grokking Through Circuit Efficiency." This paper explores novel predictions about grokking, providing significant evidence in favor of its explanation. Most strikingly, the research conducted in this paper demonstrates two novel and surprising behaviors: ungrokking, in which a network regresses from perfect to low test accuracy, and semi-grokking, in which a network shows delayed generalization to partial rather than perfect test accuracy.
Find the transcript and more here: https://arize.com/blog/explaining-grokking-through-circuit-efficiency-paper-reading/
Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.
Show more best episodes

Show more best episodes
FAQ
How many episodes does Deep Papers have?
Deep Papers currently has 47 episodes available.
What topics does Deep Papers cover?
The podcast is about Mathematics, Podcasts, Technology and Science.
What is the most popular episode on Deep Papers?
The episode title 'Sora: OpenAI’s Text-to-Video Generation Model' is the most popular.
What is the average episode length on Deep Papers?
The average episode length on Deep Papers is 37 minutes.
How often are episodes of Deep Papers released?
Episodes of Deep Papers are typically released every 14 days, 2 hours.
When was the first episode of Deep Papers?
The first episode of Deep Papers was released on Jan 18, 2023.
Show more FAQ

Show more FAQ