Agent Engineering with Pydantic + Graphs — with Samuel Colvin

02/06/25 • 64 min

1 Listener

Did you know that adding a simple Code Interpreter took o3 from 9.2% to 32% on FrontierMath? The Latent Space crew is hosting a hack night Feb 11th in San Francisco focused on CodeGen use cases, co-hosted with E2B and Edge AGI; watch E2B’s new workshop and RSVP here!

We’re happy to announce that today’s guest Samuel Colvin will be teaching his very first Pydantic AI workshop at the newly announced AI Engineer NYC Workshops day on Feb 22! 25 tickets left.

If you’re a Python developer, it’s very likely that you’ve heard of Pydantic. Every month, it’s downloaded >300,000,000 times, making it one of the top 25 PyPi packages. OpenAI uses it in its SDK for structured outputs, it’s at the core of FastAPI, and if you’ve followed our AI Engineer Summit conference, Jason Liu of Instructor has given two great talks about it: “Pydantic is all you need” and “Pydantic is STILL all you need”.

Now, Samuel Colvin has raised $17M from Sequoia to turn Pydantic from an open source project to a full stack AI engineer platform with Logfire, their observability platform, and PydanticAI, their new agent framework.

Logfire: bringing OTEL to AI

OpenTelemetry recently merged Semantic Conventions for LLM workloads which provides standard definitions to track performance like gen_ai.server.time_per_output_token. In Sam’s view at least 80% of new apps being built today have some sort of LLM usage in them, and just like web observability platform got replaced by cloud-first ones in the 2010s, Logfire wants to do the same for AI-first apps.

If you’re interested in the technical details, Logfire migrated away from Clickhouse to Datafusion for their backend. We spent some time on the importance of picking open source tools you understand and that you can actually contribute to upstream, rather than the more popular ones; listen in ~43:19 for that part.

Agents are the killer app for graphs

Pydantic AI is their attempt at taking a lot of the learnings that LangChain and the other early LLM frameworks had, and putting Python best practices into it. At an API level, it’s very similar to the other libraries: you can call LLMs, create agents, do function calling, do evals, etc.

They define an “Agent” as a container with a system prompt, tools, structured result, and an LLM. Under the hood, each Agent is now a graph of function calls that can orchestrate multi-step LLM interactions. You can start simple, then move toward fully dynamic graph-based control flow if needed.

“We were compelled enough by graphs once we got them right that our agent implementation [...] is now actually a graph under the hood.”

Why Graphs?

More natural for complex or multi-step AI workflows.

Easy to visualize and debug with mermaid diagrams.

Potential for distributed runs, or “waiting days” between steps in certain flows.

In parallel, you see folks like Emil Eifrem of Neo4j talk about GraphRAG as another place where graphs fit really well in the AI stack, so it might be time for more people to take them seriously.

Full Video Episode

Like and subscribe!

Chapters

00:00:00 Introductions

00:00:24 Origins of Pydantic

...

Logfire: bringing OTEL to AI

Agents are the killer app for graphs

“We were compelled enough by graphs once we got them right that our agent implementation [...] is now actually a graph under the hood.”

Why Graphs?

More natural for complex or multi-step AI workflows.

Easy to visualize and debug with mermaid diagrams.

Potential for distributed runs, or “waiting days” between steps in certain flows.

In parallel, you see folks like Emil Eifrem of Neo4j talk about GraphRAG as another place where graphs fit really well in the AI stack, so it might be time for more people to take them seriously.

Full Video Episode

Like and subscribe!

Chapters

00:00:00 Introductions

00:00:24 Origins of Pydantic

...

Previous Episode

The Agent Reasoning Interface: o1/o3, Claude 3, ChatGPT Canvas, Tasks, and Operator — with Karina Nguyen of OpenAI

Sponsorships and tickets for the AI Engineer Summit are selling fast ! See the new website with speakers and schedules live!

If you are building AI agents or leading teams of AI Engineers, this will be the single highest-signal conference of the year for you, this Feb 20-22nd in NYC.

We’re pleased to share that Karina will be presenting OpenAI’s closing keynote at the AI Engineer Summit. We were fortunate to get some time with her today to introduce some of her work, and hope this serves as nice background for her talk!

There are very few early AI careers that have been as impactful as Karina Nguyen’s. After stints at Notion, Square, Dropbox, Primer, the New York Times, and UC Berkeley, She joined Anthropic as employee ~60 and worked on a wide range of research/product roles for Claude 1, 2, and 3. We’ll just let her LinkedIn speak for itself:

Now, as Research manager and Post-training lead in Model Behavior at OpenAI, she creates new interaction paradigms for reasoning interfaces and capabilities, like ChatGPT Canvas, Tasks, SimpleQA, streaming chain-of-thought for o1 models, and more via novel synthetic model training.

Ideal AI Research+Product Process

In the podcast we got a sense of what Karina has found works for her and her team to be as productive as they have been:

Write PRD (Define what you want)

Funding (Get resources)

Prototype Prompted Baseline (See what’s possible)

Write and Run Evals (Get failures to hillclimb)

Model training (Exceed baseline without overfitting)

Bugbash (Find bugs and solve them)

Ship (Get users!)

We could turn this into a snazzy viral graphic but really this is all it is. Simple to say, difficult to do well. Hopefully it helps you define your process if you do similar product-research work.

Show Notes

Our Reasoning Price War post

Karina LinkedIn, Website, Twitter

OSINT visualization work

Ukraine 3D storytelling

Karina on Claude Artifacts

Karina on Claude 3 Benchmarks

Inspiration for Artifacts / Canvas from early UX work she did on GPT-3

“i really believe that things like canvas and tasks should and could have happened like 2 yrs ago, idk why we are lagging in the form factors” (tweet)

Our article on prompting o1 vs Karina’s Claude prompting principles

Canvas:

Next Episode

The AI Architect — Bret Taylor

If you’re in SF, join us tomorrow for a fun meetup at CodeGen Night!

If you’re in NYC, join us for AI Engineer Summit! The Agent Engineering track is now sold out, but 25 tickets remain for AI Leadership and 5 tickets for the workshops. You can see the full schedule of speakers and workshops at https://ai.engineer!

It’s exceedingly hard to introduce someone like Bret Taylor. We could recite his Wikipedia page, or his extensive work history through Silicon Valley’s greatest companies, but everyone else already does that.

As a podcast by AI engineers for AI engineers, we had the opportunity to do something a little different. We wanted to dig into what Bret sees from his vantage point at the top of our industry for the last 2 decades, and how that explains the rise of the AI Architect at Sierra, the leading conversational AI/CX platform.

“Across our customer base, we are seeing a new role emerge - the role of the AI architect. These leaders are responsible for helping define, manage and evolve their company's AI agent over time. They come from a variety of both technical and business backgrounds, and we think that every company will have one or many AI architects managing their AI agent and related experience.”

In our conversation, Bret Taylor confirms the Paul Buchheit legend that he rewrote Google Maps in a weekend, armed with only the help of a then-nascent Google Closure Compiler and no other modern tooling. But what we find remarkable is that he was the PM of Maps, not an engineer, though of course he still identifies as one. We find this theme recurring throughout Bret’s career and worldview. We think it is plain as day that AI leadership will have to be hands-on and technical, especially when the ground is shifting as quickly as it is today:

“There's a lot of power in combining product and engineering into as few people as possible... few great things have been created by committee.”

“If engineering is an order taking organization for product you can sometimes make meaningful things, but rarely will you create extremely well crafted breakthrough products. Those tend to be small teams who deeply understand the customer need that they're solving, who have a maniacal focus on outcomes.”

“And I think the reason why is if you look at like software as a service five years ago, maybe you can have a separation of product and engineering because most software as a service created five years ago. I wouldn't say there's like a lot of technological breakthroughs required for most business applications. And if you're making expense reporting software or whatever, it's useful... You kind of know how databases work, how to build auto scaling with your AWS cluster, whatever, you know, it's just, you're just applying best practices to yet another problem.

"When you have areas like the early days of mobile development or the early days of interactive web applications, which I think Google Maps and Gmail represent, or now AI agents, you're in this constant conversation with what the requirements of your customers and stakeholders are and all the different people interacting with it and the capabilities of the technology. And it's almost impossible to specify the requirements of a product when you're not sure of the limitations of the technology itself.”

This is the first time the difference between technical leadership for “normal” software and for “AI” software was articulated this clearly for us, and we’ll be thinking a lot about this going forward. We left a lot of nuggets in the conversation, so we hope you’ll just dive in with us (and thank Bret