
Latent Space: The AI Engineer Podcast
swyx + Alessio
www.latent.space

2 Listeners
All episodes
Best episodes
Top 10 Latent Space: The AI Engineer Podcast Episodes
Goodpods has curated a list of the 10 best Latent Space: The AI Engineer Podcast episodes, ranked by the number of listens and likes each episode have garnered from our listeners. If you are listening to Latent Space: The AI Engineer Podcast for the first time, there's no better place to start than with one of these standout episodes. If you are a fan of the show, vote for your favorite Latent Space: The AI Engineer Podcast episode by adding your comments to the episode page.

Commoditizing the Petaflop — with George Hotz of the tiny corp
Latent Space: The AI Engineer Podcast
06/20/23 • 72 min
We are now launching our dedicated new YouTube and Twitter! Any help in amplifying our podcast would be greatly appreciated, and of course, tell your friends!
Notable followon discussions collected on Twitter, Reddit, Reddit, Reddit, HN, and HN. Please don’t obsess too much over the GPT4 discussion as it is mostly rumor; we spent much more time on tinybox/tinygrad on which George is the foremost authority!
We are excited to share the world’s first interview with George Hotz on the tiny corp!
If you don’t know George, he was the first person to unlock the iPhone, jailbreak the PS3, went on to start Comma.ai, and briefly “interned” at the Elon Musk-run Twitter.
Tinycorp is the company behind the deep learning framework tinygrad, as well as the recently announced tinybox, a new $15,000 “luxury AI computer” aimed at local model training and inference, aka your “personal compute cluster”:
738 FP16 TFLOPS
144 GB GPU RAM
5.76 TB/s RAM bandwidth
30 GB/s model load bandwidth (big llama loads in around 4 seconds)
AMD EPYC CPU
1600W (one 120V outlet)
Runs 65B FP16 LLaMA out of the box (using tinygrad, subject to software development risks)
(In the episode, we also talked about the future of the tinybox as the intelligence center of every home that will help run models, at-home robots, and more. Make sure to check the timestamps 👀 )
The tiny corp manifesto
There are three main theses to tinycorp:
If XLA/PrimTorch are CISC, tinygrad is RISC: CISC (Complex Instruction Set Computing) are more complex instruction sets where a single instruction can execute many low-level operations. RISC (Reduced Instruction Set Computing) are smaller, and only let you execute a single low-level operation per instruction, leading to faster and more efficient instruction execution. If you’ve used the Apple Silicon M1/M2, AMD Ryzen, or Raspberry Pi, you’ve used a RISC computer.
If you can’t write a fast ML framework for GPU, you can’t write one for your own chip: there are many “AI chips” companies out there, and they all started from taping the chip. Some of them like Cerebras are still building, while others like Graphcore seem to be struggling. But building chips with higher TFLOPS isn’t enough: “There’s a great chip already on the market. For $999, you get a 123 TFLOP card with 24 GB of 960 GB/s RAM. This is the best FLOPS per dollar today, and yet...nobody in ML uses it.”, referring to the AMD RX 7900 XTX. NVIDIA’s lead is not only thanks to high-performing cards, but also thanks to a great developer platform in CUDA. Starting with the chip development rather than the dev toolkit is much more cost-intensive, so tinycorp is starting by writing a framework for off-the-shelf hardware rather than taping their own chip.
Turing completeness considered harmful: Once you call in to Turing complete kernels, you can 1 Listener

Agent Engineering with Pydantic + Graphs — with Samuel Colvin
Latent Space: The AI Engineer Podcast
02/06/25 • 64 min
Did you know that adding a simple Code Interpreter took o3 from 9.2% to 32% on FrontierMath? The Latent Space crew is hosting a hack night Feb 11th in San Francisco focused on CodeGen use cases, co-hosted with E2B and Edge AGI; watch E2B’s new workshop and RSVP here!
We’re happy to announce that today’s guest Samuel Colvin will be teaching his very first Pydantic AI workshop at the newly announced AI Engineer NYC Workshops day on Feb 22! 25 tickets left.
If you’re a Python developer, it’s very likely that you’ve heard of Pydantic. Every month, it’s downloaded >300,000,000 times, making it one of the top 25 PyPi packages. OpenAI uses it in its SDK for structured outputs, it’s at the core of FastAPI, and if you’ve followed our AI Engineer Summit conference, Jason Liu of Instructor has given two great talks about it: “Pydantic is all you need” and “Pydantic is STILL all you need”.
Now, Samuel Colvin has raised $17M from Sequoia to turn Pydantic from an open source project to a full stack AI engineer platform with Logfire, their observability platform, and PydanticAI, their new agent framework.
Logfire: bringing OTEL to AI
OpenTelemetry recently merged Semantic Conventions for LLM workloads which provides standard definitions to track performance like gen_ai.server.time_per_output_token. In Sam’s view at least 80% of new apps being built today have some sort of LLM usage in them, and just like web observability platform got replaced by cloud-first ones in the 2010s, Logfire wants to do the same for AI-first apps.
If you’re interested in the technical details, Logfire migrated away from Clickhouse to Datafusion for their backend. We spent some time on the importance of picking open source tools you understand and that you can actually contribute to upstream, rather than the more popular ones; listen in ~43:19 for that part.
Agents are the killer app for graphs
Pydantic AI is their attempt at taking a lot of the learnings that LangChain and the other early LLM frameworks had, and putting Python best practices into it. At an API level, it’s very similar to the other libraries: you can call LLMs, create agents, do function calling, do evals, etc.
They define an “Agent” as a container with a system prompt, tools, structured result, and an LLM. Under the hood, each Agent is now a graph of function calls that can orchestrate multi-step LLM interactions. You can start simple, then move toward fully dynamic graph-based control flow if needed.
“We were compelled enough by graphs once we got them right that our agent implementation [...] is now actually a graph under the hood.”
Why Graphs?
More natural for complex or multi-step AI workflows.
Easy to visualize and debug with mermaid diagrams.
Potential for distributed runs, or “waiting days” between steps in certain flows.
In parallel, you see folks like Emil Eifrem of Neo4j talk about GraphRAG as another place where graphs fit really well in the AI stack, so it might be time for more people to take them seriously.
Full Video Episode
Chapters
00:00:00 Introductions
00:00:24 Origins of Pydantic<...

1 Listener

Open Operator, Serverless Browsers and the Future of Computer-Using Agents
Latent Space: The AI Engineer Podcast
02/28/25 • 61 min
Today's episode is with Paul Klein, founder of Browserbase. We talked about building browser infrastructure for AI agents, the future of agent authentication, and their open source framework Stagehand.
[00:00:00] Introductions
[00:04:46] AI-specific challenges in browser infrastructure
[00:07:05] Multimodality in AI-Powered Browsing
[00:12:26] Running headless browsers at scale
[00:18:46] Geolocation when proxying
[00:21:25] CAPTCHAs and Agent Auth
[00:28:21] Building “User take over” functionality
[00:33:43] Stagehand: AI web browsing framework
[00:38:58] OpenAI's Operator and computer use agents
[00:44:44] Surprising use cases of Browserbase
[00:47:18] Future of browser automation and market competition
[00:53:11] Being a solo founder
Transcript
Alessio [00:00:04]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol.ai.
swyx [00:00:12]: Hey, and today we are very blessed to have our friends, Paul Klein, for the fourth, the fourth, CEO of Browserbase. Welcome.
Paul [00:00:21]: Thanks guys. Yeah, I'm happy to be here. I've been lucky to know both of you for like a couple of years now, I think. So it's just like we're hanging out, you know, with three ginormous microphones in front of our face. It's totally normal hangout.
swyx [00:00:34]: Yeah. We've actually mentioned you on the podcast, I think, more often than any other Solaris tenant. Just because like you're one of the, you know, best performing, I think, LLM tool companies that have started up in the last couple of years.
Paul [00:00:50]: Yeah, I mean, it's been a whirlwind of a year, like Browserbase is actually pretty close to our first birthday. So we are one years old. And going from, you know, starting a company as a solo founder to... To, you know, having a team of 20 people, you know, a series A, but also being able to support hundreds of AI companies that are building AI applications that go out and automate the web. It's just been like, really cool. It's been happening a little too fast. I think like collectively as an AI industry, let's just take a week off together. I took my first vacation actually two weeks ago, and Operator came out on the first day, and then a week later, DeepSeat came out. And I'm like on vacation trying to chill. I'm like, we got to build with this stuff, right? So it's been a breakneck year. But I'm super happy to be here and like talk more about all the stuff we're seeing. And I'd love to hear kind of what you guys are excited about too, and share with it, you know?
swyx [00:01:39]: Where to start? So people, you've done a bunch of podcasts. I think I strongly recommend Jack Bridger's Scaling DevTools, as well as Turner Novak's The Peel. And, you know, I'm sure there's others. So you covered your Twilio story in the past, talked about StreamClub, you got acquired to Mux, and then you left to start Browserbase. So maybe we just start with what is Browserbase? Yeah.
Paul [00:02:02]: Browserbase is the web browser for your AI. We're building headless browser infrastructure, which are browsers that run in a server environment that's accessible to developers via APIs and SDKs. It's really hard to run a web browser in the cloud. You guys are probably running Chrome on your computers, and that's using a lot of resources, right? So if you want to run a web browser or thousands of web browsers, you can't just spin up a bunch of lambdas. You actually need to use a secure containerized environment. You have to scale it up and down. It's a stateful system. And that infrastructure is, like, super painful. And I know that firsthand, because at my last company, StreamClub, I was CTO, and I was building our own internal headless browser infrastructure. That's actually why we sold the company, is because Mux really wanted to buy our headless browser infrastructure that we'd built. And it's just a super hard problem. And I actually told my co-founders, I would never start another company unless it was a browser infrastructure company. And it turns out that's really necessary in the age of AI, when AI can actually go out and interact with websites, click on buttons, fill in forms. You need AI to do all of that work in an actual browser running somewhere on a server. And BrowserBase powers that.
swyx [00:03:08]: While you're talking about it, it occurred to me, not that you're going to be acquired or anything, but it occurred to me that it would be really funny if you became the Nikita Beer of headless browser companies. You just have one trick, ...

1 Listener

From RLHF to RLHB: The Case for Learning from Human Behavior - with Jeffrey Wang and Joe Reeve of Amplitude
Latent Space: The AI Engineer Podcast
06/08/23 • 49 min
Welcome to the almost 3k latent space explorers that joined us last month! We’re holding our first SF listener meetup with Practical AI next Monday; join us if you want to meet past guests and put faces to voices! All events are in /community.
Who among you regularly click the ubiquitous 👍 /👎 buttons in ChatGPT/Bard/etc?
Anyone? I don’t see any hands up.
OpenAI has told us how important reinforcement learning from human feedback (RLHF) is to creating the magic that is ChatGPT, but we know from our conversation with Databricks’ Mike Conover just how hard it is to get just 15,000 pieces of explicit, high quality human responses.
We are shockingly reliant on good human feedback. Andrej Karpathy’s recent keynote at Microsoft Build on the State of GPT demonstrated just how much of the training process relies on contractors to supply the millions of items of human feedback needed to make a ChatGPT-quality LLM (highlighted by us in red):
But the collection of good feedback is an incredibly messy problem. First of all, if you have contractors paid by the datapoint, they are incentivized to blast through as many as possible without much thought. So you hire more contractors and double, maybe triple, your costs. Ok, you say, lets recruit missionaries, not mercenaries. People should volunteer their data! Then you run into the same problem we and any consumer review platform run into - the vast majority of people send nothing at all, and those who do are disproportionately representing negative reactions. More subtle problems emerge when you try to capture subjective human responses - the reason that ChatGPT responses tend to be inhumanly verbose, is because humans have a well documented “longer = better” bias when classifying responses in a “laboratory setting”.
The fix for this, of course, is to get out of the lab and learn from real human behavior, not artificially constructed human feedback. You don’t see a thumbs up/down button in GitHub Copilot nor Codeium nor Codium. Instead, they work an implicit accept/reject event into the product workflow, such that you cannot help but to give feedback while you use the product. This way you hear from all your users, in their natural environments doing valuable tasks they are familiar with. The prototypal example in this is Midjourney, who unobtrusively collect 1 of 9 types of feedback from every user as part of their workflow, in exchange for much faster first draft image generations:
The best known public example of AI product telemetry is in the Copilot-Explorer writeup, which checks for the presence of generated code after 15-600 second intervals, which enables GitHub to claim that 40% of code is generated by Copilot.
This is fantastic and “obviously” the future of productized AI. Every AI application should figure out how to learn from all their real users, not some contractors in a foreign country. Most prompt engineers and prompt engineering tooling also tend to focus on pre-production prototyping, but could also benefit from A/B testing their prompts in the real world.
In short, AI may need Analytics more than Analytics needs AI.
Amplitude’s Month of AI
This is why Amplitude is going hard on AI - and why we recently spent a weekend talking to Jeffrey Wang, cofounder and chief architect at Amplitude, and Joe Reeve, head of AI, recording a li...
1 Listener

Why AI Agents Don't Work (yet) - with Kanjun Qiu of Imbue
Latent Space: The AI Engineer Podcast
10/14/23 • 65 min
Thanks to the over 11,000 people who joined us for the first AI Engineer Summit! A full recap is coming, but you can 1) catch up on the fun and videos on Twitter and YouTube, 2) help us reach 1000 people for the first comprehensive State of AI Engineering survey and 3) submit projects for the new AI Engineer Foundation.
See our Community page for upcoming meetups in SF, Paris, NYC, and Singapore.
This episode had good interest on Twitter.
Last month, Imbue was crowned as AI’s newest unicorn foundation model lab, raising a $200m Series B at a >$1 billion valuation. As “stealth” foundation model companies go, Imbue (f.k.a. Generally Intelligent) has stood as an enigmatic group given they have no publicly released models to try out. However, ever since their $20m Series A last year their goal has been to “develop generally capable AI agents with human-like intelligence in order to solve problems in the real world”.
From RL to Reasoning LLMs
Along with their Series A, they announced Avalon, “A Benchmark for RL Generalization Using Procedurally Generated Worlds”. Avalon is built on top of the open source Godot game engine, and is ~100x faster than Minecraft to enable fast RL benchmarking and a clear reward with adjustable game difficulty.
After a while, they realized that pure RL isn’t a good path to teach reasoning and planning. The agents were able to learn mechanical things like opening complex doors, climbing, but couldn’t go to higher level tasks. A pure RL world also doesn’t include a language explanation of the agent reasoning, which made it hard to understand why it made certain decisions. That pushed the team more towards the “models for reasoning” path:
“The second thing we learned is that pure reinforcement learning is not a good vehicle for planning and reasoning. So these agents were able to learn all sorts of crazy things: They could learn to climb like hand over hand in VR climbing, they could learn to open doors like very complicated, like multiple switches and a lever open the door, but they couldn't do any higher level things. And they couldn't do those lower level things consistently necessarily. And as a user, I do not want to interact with a pure reinforcement learning end to end RL agent. As a user, like I need much more control over what that agent is doing.”
Inspired by Chelsea Finn’s work on SayCan at Stanford, the team pivoted to have their agents do the reasoning in natural language instead. This development parallels the large leaps in reasoning that humans have developed as the scientific method:
“We are better at reasoning now than we were 3000 years ago. An example of a reasoning strategy is noticing you're confused. Then when I notice I'm confused, I should ask:
What was the original claim that was made?
What evidence is there for this claim?
Does the evidence support the claim?
Is the claim correct?
This is like a reasoning strategy that was developed in like the 1600s, you know, with like the advent of science. So that's an example of a reasoning strategy. There are tons of them. We employ all the time, lots of heuristics that help us be better at reasoning. And we can generate data that's much more specific to them.“
The Full Stack Model Lab
One year later, it would seem that the pivot to reasoning has had tremendous success, and Imbue has now reached a >$1B valuation, with particip...

Snipd: The AI Podcast App for Learning
Latent Space: The AI Engineer Podcast
03/14/25 • 77 min
We are working with Amplify on the 2025 State of AI Engineering Survey to be presented at the AIE World’s Fair in SF! Join the survey to shape the future of AI Eng!
We first met Snipd over a year ago, and were immediately impressed by the design, but were doubtful about the behavior of snipping as the title behavior:
Podcast apps are enormously sticky - Spotify spent almost $1b in podcast acquisitions and exclusive content just to get an 8% bump in market share among normies.
However, after a disappointing Overcast 2.0 rewrite with no AI features in the last 3 years, I finally bit the bullet and switched to Snipd.
It’s 2025, your podcast app should be able to let you search transcripts of your podcasts. Snipd is the best implementation of this so far.
And yet they keep shipping:
What impressed us wasn’t just how this tiny team of 4 was able to bootstrap a consumer AI app against massive titans and do so well; but also how seriously they think about learning through podcasts and improving retention of knowledge over time, aka “Duolingo for podcasts”.
As an educational AI podcast, that’s a mission we can get behind.
Full Video Pod
Find us on YouTube! This was the first pod we’ve ever shot outdoors!
Show Notes
Comparing Snipd transcription with our Bee episode
Gustav Söderström - Background Audio
Timestamps
[00:00:03] Takeaways from AI Engineer NYC
[00:00:17] Weather in New York.
[00:00:26] Swyx and Snipd.
[00:01:01] Kevin's AI summit experience.
[00:01:31] Zurich and AI.
[00:03:25] SigLIP authors join OpenAI.
[00:03:39] Zurich is very costly.
[00:04:06] The Snipd origin story.
[00:05:24] Introduction to machine learning.
[00:09:28] Snipd and user knowledge extraction.
[00:13:48] App's tech stack, Flutter, Python.
[00:15:11] How speakers are identified.
[00:18:29] The concept of "backgroundable" video.
[00:29:05] Voice cloning technology.
[00:31:03] Using AI agents.
[00:34:32] Snipd's future is multi-modal AI.
[00:36:37] Snipd and existing user behaviour.
[00:42:10] The app, summary, and timestamps.
[00:55:25] The future of AI and podcasting.
[1:14:55] Voice AI
Transcript
swyx [00:00:03]: Hey, I'm here in New York with Kevin Ben-Smith of Snipd. Welcome.
Kevin [00:00:07]: Hi. Hi. Amazing to be here.
swyx [00:00:09]: Yeah. This is our first ever, I think, outdoors podcast recording.
Kevin [00:00:14]: It's quite a location for the first time, I have to say.
swyx [00:00:18]: I was actually unsure because, you know, it's cold. It's like, I checked the temperature. It's like kind of one degree Celsius, but it's not that bad with the sun. No, it's quite nice. Yeah. Especially with our beautiful tea. With the tea. Yeah. Perfect. We're going to talk about Snips. I'm a Snips user. I'm a Snips user. I had to basically, you know, apart from Twitter, it's like the number one use app on my phone. Nice. When I wake up in the morning, I open Snips and I, you k...

Outlasting Noam Shazeer, crowdsourcing Chat + AI with >1.4m DAU, and becoming the "Western DeepSeek" — with William Beauchamp, Chai Research
Latent Space: The AI Engineer Podcast
01/26/25 • 75 min
One last Gold sponsor slot is available for the AI Engineer Summit in NYC. Our last round of invites is going out soon - apply here - If you are building AI agentsor AI engteams , this will be the single highest-signal conference of the year for you!
While the world melts down over DeepSeek, few are talking about the OTHER notable group of former hedge fund traders who pivoted into AI and built a remarkably profitable consumer AI business with a tiny team with incredibly cracked engineering team — Chai Research. In short order they have:
Started a Chat AI company well before Noam Shazeer started Character AI, and outlasted his departure.
Crossed 1m DAU in 2.5 years - William updates us on the pod that they’ve hit 1.4m DAU now, another +40% from a few months ago. Revenue crossed >$22m.
Launched the Chaiverse model crowdsourcing platform - taking 3-4 week A/B testing cycles down to 3-4 hours, and deploying >100 models a week.
While they’re not paying million dollar salaries, you can tell they’re doing pretty well for an 11 person startup:
The Chai Recipe: Building infra for rapid evals
Remember how the central thesis of LMarena (formerly LMsys) is that the only comprehensive way to evaluate LLMs is to let users try them out and pick winners?
At the core of Chai is a mobile app that looks like Character AI, but is actually the largest LLM A/B testing arena in the world, specialized on retaining chat users for Chai’s usecases (therapy, assistant, roleplay, etc). It’s basically what LMArena would be if taken very, very seriously at one company (with $1m in prizes to boot):
Chai publishes occasional research on how they think about this, including talks at their Palo Alto office:
William expands upon this in today’s podcast (34 mins in):
Fundamentally, the way I would describe it is when you're building anything in life, you need to be able to evaluate it. And through evaluation, you can iterate, we can look at benchmarks, and we can say the issues with benchmarks and why they may not generalize as well as one would hope in the challenges of working with them. But something that works incredibly well is getting feedback from humans. And so we built this thing where anyone can submit a model to our developer backend, and it gets put in front of 5000 users, and the users can rate it.
And we can then have a really accurate ranking of like which model, or users finding more engaging or more entertaining. And it gets, you know, it's at this point now, where every day we're able to, I mean, we evaluate between 20 and 50 models, LLMs, every single day, right. So even though we've got only got a team of, say, five AI researchers, they're able to iterate a huge quantity of LLMs, right. So our team ships, let's just say minimum 100 LLMs a week is what we're able to iterate through. Now, before that moment in time, we might iterate through three a week, we might, you know, there was a time when even doing like five a month was a challenge, right? By being able to change the feedback loops to the point where it's not, let's launch these three models, let's do an A-B test, let's assign, let's do different cohorts, let's wait 30 days to see what the day 30 retention is, which is the kind of the, if you're doing an app, that's like A-B testing 101 would be, do a 30-day retention test, assign different treatments to different cohorts and come back in 30 days. So that's insanely slow. That's just, it's too slow. And so we were able to get that 30-day feedback loop all the way down to something like three hours.

Latent.Space 2024 Year in Review
Latent Space: The AI Engineer Podcast
12/31/24 • 112 min
Applications for the 2025 AI Engineer Summit are up, and you can save the date for AIE Singapore in April and AIE World’s Fair 2025 in June.
Happy new year, and thanks for 100 great episodes! Please let us know what you want to see/hear for the next 100!
Full YouTube Episode with Slides/Charts
Like and subscribe and hit that bell to get notifs!
Timestamps
00:00 Welcome to the 100th Episode!
00:19 Reflecting on the Journey
00:47 AI Engineering: The Rise and Impact
03:15 Latent Space Live and AI Conferences
09:44 The Competitive AI Landscape
21:45 Synthetic Data and Future Trends
35:53 Creative Writing with AI
36:12 Legal and Ethical Issues in AI
38:18 The Data War: GPU Poor vs. GPU Rich
39:12 The Rise of GPU Ultra Rich
40:47 Emerging Trends in AI Models
45:31 The Multi-Modality War
01:05:31 The Future of AI Benchmarks
01:13:17 Pionote and Frontier Models
01:13:47 Niche Models and Base Models
01:14:30 State Space Models and RWKB
01:15:48 Inference Race and Price Wars
01:22:16 Major AI Themes of the Year
01:22:48 AI Rewind: January to March
01:26:42 AI Rewind: April to June
01:33:12 AI Rewind: July to September
01:34:59 AI Rewind: October to December
01:39:53 Year-End Reflections and Predictions
Transcript
[00:00:00] Welcome to the 100th Episode!
[00:00:00] Alessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co host Swyx for the 100th time today.
[00:00:12] swyx: Yay, um, and we're so glad that, yeah, you know, everyone has, uh, followed us in this journey. How do you feel about it? 100 episodes.
[00:00:19] Alessio: Yeah, I know.
[00:00:19] Reflecting on the Journey
[00:00:19] Alessio: Almost two years that we've been doing this. We've had four different studios. Uh, we've had a lot of changes. You know, we used to do this lightning round. When we first started that we didn't like, and we tried to change the question. The answer
[00:00:32] swyx: was cursor and perplexity.
[00:00:34] Alessio: Yeah, I love mid journey. It's like, do you really not like anything else?
[00:00:38] Alessio: Like what's, what's the unique thing? And I think, yeah, we, we've also had a lot more research driven content. You know, we had like 3DAO, we had, you know. Jeremy Howard, we had more folks like that.
[00:00:47] AI Engineering: The Rise and Impact
[00:00:47] Alessio: I think we want to do more of that too in the new year, like having, uh, some of the Gemini folks, both on the research and the applied side.
[00:00:54] Alessio: Yeah, but it's been a ton of fun. I think we both started, I wouldn't say as a joke, we were kind of like, Oh, we [00:01:00] should do a podcast. And I think we kind of caught the right wave, obviously. And I think your rise of the AI engineer posts just kind of get people. Sombra to congregate, and then the AI engineer summit.
[00:01:11] Alessio: And that's why when I look at our growth chart, it's kind of like a proxy for like the AI engineering industry as a whole, which is almost like, like, even if we don't do that much, we keep growing just because there's so many more AI engineers. So did you expect that growth or did you expect that would take longer for like the AI engineer thing to kind of like become, you know, everybody talks about it today.
[00:01:32] swyx: So, the sign of that, that we have won is that Gartner puts it at the top of the hype curve right now. So Gartner has called the peak in AI engineering. I did not expect, um, to what level. I knew that I was correct when I called it because I did like two months of work going into that. But I didn't know, You know, how quickly it could happen, and obviously there's a chance that I could be wrong.
[00:01:52] swyx: But I think, like, most people have come around to that concept. Hacker News hates it, which is a good sign. But there's enough people that have defined it, you know, GitHub, when [00:02:00] they launched GitHub Models, which is the Hugging Face clone, they put AI engineers in the banner, like, above the fold, like, in big So I think it's like kind of arrived as a meaningful and useful definition.
[00:02:12] swyx: I think people are trying to figure out where the boundaries are. I think that was a...

AI Magic: Shipping 1000s of successful products with no managers and a team of 12 — Jeremy Howard of Answer.ai
Latent Space: The AI Engineer Podcast
08/16/24 • 58 min
Disclaimer: We recorded this episode ~1.5 months ago, timing for the FastHTML release. It then got bottlenecked by Llama3.1, Winds of AI Winter, and SAM2 episodes, so we’re a little late. Since then FastHTML was released, swyx is building an app in it for AINews, and Anthropic has also released their prompt caching API.
Remember when Dylan Patel of SemiAnalysis coined the GPU Rich vs GPU Poor war? (if not, see our pod with him). The idea was that if you’re GPU poor you shouldn’t waste your time trying to solve GPU rich problems (i.e. pre-training large models) and are better off working on fine-tuning, optimized inference, etc. Jeremy Howard (see our “End of Finetuning” episode to catchup on his background) and Eric Ries founded Answer.AI to do exactly that: “Practical AI R&D”, which is very in-line with the GPU poor needs. For example, one of their first releases was a system based on FSDP + QLoRA that let anyone train a 70B model on two NVIDIA 4090s. Since then, they have come out with a long list of super useful projects (in no particular order, and non-exhaustive):
FSDP QDoRA: this is just as memory efficient and scalable as FSDP/QLoRA, and critically is also as accurate for continued pre-training as full weight training.
Cold Compress: a KV cache compression toolkit that lets you scale sequence length without impacting speed.
colbert-small: state of the art retriever at only 33M params
JaColBERTv2.5: a new state-of-the-art retrievers on all Japanese benchmarks.
gpu.cpp: portable GPU compute for C++ with WebGPU.
Claudette: a better Anthropic API SDK.
They also recently released FastHTML, a new way to create modern interactive web apps. Jeremy recently released a 1 hour “Getting started” tutorial on YouTube; while this isn’t AI related per se, but it’s close to home for any AI Engineer who are looking to iterate quickly on new products:
In this episode we broke down 1) how they recruit 2) how they organize what to research 3) and how the community comes together.
At the end, Jeremy gave us a sneak peek at something new that he’s working on that he calls dialogue engineering:
So I've created a new approach. It's not called prompt engineering. I'm creating a system for doing dialogue engineering. It's currently called AI magic. I'm doing most of my work in this system and it's making me much more productive than I was before I used it.
He explains it a bit more ~44:53 in the pod, but we’ll just have to wait for the public release to figure out exactly what he means.
Timestamps
[00:00:00] ...

AGI is Being Achieved Incrementally (DevDay Recap - cleaned audio)
Latent Space: The AI Engineer Podcast
11/08/23 • 141 min
We left a high amount of background audio in the Devday podcast, which many of you loved, but we definitely understand that some of you may have had trouble with it. Listener Klaus Breyer ran it through Auphonic with speech islolation and we figured we’d upload it as a backdated pod for people who prefer this. Of course it means that our speakers sound out of place since they now sound like they are talking loudly in a quiet room. Let us know in the comments what you think?
Timestamps
the cleaned part is only part 2:
[00:55:09] Part II: Spot Interviews
[00:55:59] Jim Fan (Nvidia) - High Level Takeaways
[01:05:19] Raza Habib (Humanloop) - Foundation Model Ops
[01:13:32] Surya Dantuluri (Stealth) - RIP Plugins
[01:20:53] Reid Robinson (Zapier) - AI Actions for GPTs
[01:30:45] Div Garg (MultiOn) - GPT4V for Agents
[01:36:42] Louis Knight-Webb (Bloop.ai) - AI Code Search
[01:48:36] Shreya Rajpal (Guardrails) - Guardrails for LLMs
[01:59:00] Alex Volkov (Weights & Biases, ThursdAI) - "Keeping AI Open"
[02:09:39] Rahul Sonwalkar (Julius AI) - Advice for Founders
Get full access to Latent.Space at www.latent.space/subscribe
Show more best episodes

Show more best episodes
FAQ
How many episodes does Latent Space: The AI Engineer Podcast have?
Latent Space: The AI Engineer Podcast currently has 121 episodes available.
What topics does Latent Space: The AI Engineer Podcast cover?
The podcast is about Entrepreneurship, Podcasts, Technology and Business.
What is the most popular episode on Latent Space: The AI Engineer Podcast?
The episode title 'Commoditizing the Petaflop — with George Hotz of the tiny corp' is the most popular.
What is the average episode length on Latent Space: The AI Engineer Podcast?
The average episode length on Latent Space: The AI Engineer Podcast is 77 minutes.
How often are episodes of Latent Space: The AI Engineer Podcast released?
Episodes of Latent Space: The AI Engineer Podcast are typically released every 6 days, 14 hours.
When was the first episode of Latent Space: The AI Engineer Podcast?
The first episode of Latent Space: The AI Engineer Podcast was released on Feb 23, 2023.
Show more FAQ

Show more FAQ