AI Engineering for Art — with comfyanonymous, of ComfyUI

01/04/25 • 55 min

Applications for the NYC AI Engineer Summit, focused on Agents at Work, are open!

When we first started Latent Space, in the lightning round we’d always ask guests: “What’s your favorite AI product?”. The majority would say Midjourney. The simple UI of prompt → very aesthetic image turned it into a $300M+ ARR bootstrapped business as it rode the first wave of AI image generation.

In open source land, StableDiffusion was congregating around AUTOMATIC1111 as the de-facto web UI. Unlike Midjourney, which offered some flags but was mostly prompt-driven, A1111 let users play with a lot more parameters, supported additional modalities like img2img, and allowed users to load in custom models. If you’re interested in some of the SD history, you can look at our episodes with Lexica, Replicate, and Playground.

One of the people involved with that community was comfyanonymous, who was also part of the Stability team in 2023, decided to build an alternative called ComfyUI, now one of the fastest growing open source projects in generative images, and is now the preferred partner for folks like Black Forest Labs’s Flux Tools on Day 1. The idea behind it was simple: “Everyone is trying to make easy to use interfaces. Let me try to make a powerful interface that's not easy to use.”

Unlike its predecessors, ComfyUI does not have an input text box. Everything is based around the idea of a node: there’s a text input node, a CLIP node, a checkpoint loader node, a KSampler node, a VAE node, etc. While daunting for simple image generation, the tool is amazing for more complex workflows since you can break down every step of the process, and then chain many of them together rather than manually switching between tools. You can also re-start execution halfway instead of from the beginning, which can save a lot of time when using larger models.

To give you an idea of some of the new use cases that this type of UI enables:

Sketch something → Generate an image with SD from sketch → feed it into SD Video to animate

Generate an image of an object → Turn into a 3D asset → Feed into interactive experiences

Input audio → Generate audio-reactive videos

Their Examples page also includes some of the more common use cases like AnimateDiff, etc. They recently launched the Comfy Registry, an online library of different nodes that users can pull from rather than having to build everything from scratch. The project has >60,000 Github stars, and as the community grows, some of the projects that people build have gotten quite complex:

The most interesting thing about Comfy is that it’s not a UI, it’s a runtime. You can build full applications on top of image models simply by using Comfy. You can expose Comfy workflows as an endpoint and chain them together just like you chain a single node. We’re seeing the rise of AI Engineering applied to art.

Major Tom’s ComfyUI Resources from the Latent Space Discord

Major shoutouts to Major Tom on the LS Discord who is a image generation expert, who offered these pointers:

“best thing about comfy is the fact it supports almost immediately every new thing that comes out - unlike A1111 or forge, which still don't support flux cnet for instance. It will be perfect tool when conflicting nodes will be resolved”

AP Workflows from Alessandro Perili are a nice example of an all-in-one train-evaluate-generate system built atop Comfy

ComfyUI YouTubers to learn from:

@sebastiankamph

@NerdyRodent

Applications for the NYC AI Engineer Summit, focused on Agents at Work, are open!

To give you an idea of some of the new use cases that this type of UI enables:

Sketch something → Generate an image with SD from sketch → feed it into SD Video to animate

Generate an image of an object → Turn into a 3D asset → Feed into interactive experiences

Input audio → Generate audio-reactive videos

Major Tom’s ComfyUI Resources from the Latent Space Discord

Major shoutouts to Major Tom on the LS Discord who is a image generation expert, who offered these pointers:

AP Workflows from Alessandro Perili are a nice example of an all-in-one train-evaluate-generate system built atop Comfy

ComfyUI YouTubers to learn from:

@sebastiankamph

@NerdyRodent

Previous Episode

Latent.Space 2024 Year in Review

Applications for the 2025 AI Engineer Summit are up, and you can save the date for AIE Singapore in April and AIE World’s Fair 2025 in June.

Happy new year, and thanks for 100 great episodes! Please let us know what you want to see/hear for the next 100!

Full YouTube Episode with Slides/Charts

Like and subscribe and hit that bell to get notifs!

Timestamps

00:00 Welcome to the 100th Episode!

00:19 Reflecting on the Journey

00:47 AI Engineering: The Rise and Impact

03:15 Latent Space Live and AI Conferences

09:44 The Competitive AI Landscape

21:45 Synthetic Data and Future Trends

35:53 Creative Writing with AI

36:12 Legal and Ethical Issues in AI

38:18 The Data War: GPU Poor vs. GPU Rich

39:12 The Rise of GPU Ultra Rich

40:47 Emerging Trends in AI Models

45:31 The Multi-Modality War

01:05:31 The Future of AI Benchmarks

01:13:17 Pionote and Frontier Models

01:13:47 Niche Models and Base Models

01:14:30 State Space Models and RWKB

01:15:48 Inference Race and Price Wars

01:22:16 Major AI Themes of the Year

01:22:48 AI Rewind: January to March

01:26:42 AI Rewind: April to June

01:33:12 AI Rewind: July to September

01:34:59 AI Rewind: October to December

01:39:53 Year-End Reflections and Predictions

Transcript

[00:00:00] Welcome to the 100th Episode!

[00:00:00] Alessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co host Swyx for the 100th time today.

[00:00:12] swyx: Yay, um, and we're so glad that, yeah, you know, everyone has, uh, followed us in this journey. How do you feel about it? 100 episodes.

[00:00:19] Alessio: Yeah, I know.

[00:00:19] Reflecting on the Journey

[00:00:19] Alessio: Almost two years that we've been doing this. We've had four different studios. Uh, we've had a lot of changes. You know, we used to do this lightning round. When we first started that we didn't like, and we tried to change the question. The answer

[00:00:32] swyx: was cursor and perplexity.

[00:00:34] Alessio: Yeah, I love mid journey. It's like, do you really not like anything else?

[00:00:38] Alessio: Like what's, what's the unique thing? And I think, yeah, we, we've also had a lot more research driven content. You know, we had like 3DAO, we had, you know. Jeremy Howard, we had more folks like that.

[00:00:47] AI Engineering: The Rise and Impact

[00:00:47] Alessio: I think we want to do more of that too in the new year, like having, uh, some of the Gemini folks, both on the research and the applied side.

[00:00:54] Alessio: Yeah, but it's been a ton of fun. I think we both started, I wouldn't say as a joke, we were kind of like, Oh, we [00:01:00] should do a podcast. And I think we kind of caught the right wave, obviously. And I think your rise of the AI engineer posts just kind of get people. Sombra to congregate, and then the AI engineer summit.

[00:01:11] Alessio: And that's why when I look at our growth chart, it's kind of like a proxy for like the AI engineering industry as a whole, which is almost like, like, even if we don't do that much, we keep growing just because there's so many more AI engineers. So did you expect that growth or did you expect that would take longer for like the AI engineer thing to kind of like become, you know, everybody talks about it today.

[00:01:32] swyx: So, the sign of that, that we have won is that Gartner puts it at the top of the hype curve right now. So Gartner has called the peak in AI engineering. I did not expect, um, to what level. I knew that I was correct when I called it because I did like two months of work going into that. But I didn't know, You know, how quickly it could happen, and obviously there's a chance that I could be wrong.

[00:01:52] swyx: But I think, like, most people have come around to that concept. Hacker News hates it, which is a good sign. But there's enough people that have defined it, you know, GitHub, when [00:02:00] they launched GitHub Models, which is the Hugging Face clone, they put AI engineers in the banner, like, above the fold, like, in big So I think it's like kind of arrived as a meaningful and useful definition.

[00:02:12] swyx: I think people are trying to figure out where the boundaries are. I think that was a...

Next Episode

Beating Google at Search with Neural PageRank and $5M of H200s — with Will Bryk of Exa.ai

Applications close Monday for the NYC AI Engineer Summit focusing on AI Leadership and Agent Engineering! If you applied, invites should be rolling out shortly.

The search landscape is experiencing a fundamental shift. Google built a >$2T company with the “10 blue links” experience, driven by PageRank as the core innovation for ranking. This was a big improvement from the previous directory-based experiences of AltaVista and Yahoo. Almost 4 decades later, Google is now stuck in this links-based experience, especially from a business model perspective.

This legacy architecture creates fundamental constraints:

Must return results in ~400 milliseconds

Required to maintain comprehensive web coverage

Tied to keyword-based matching algorithms

Cost structures optimized for traditional indexing

As we move from the era of links to the era of answers, the way search works is changing. You’re not showing a user links, but the goal is to provide context to an LLM. This means moving from keyword based search to more semantic understanding of the content:

The link prediction objective can be seen as like a neural PageRank because what you're doing is you're predicting the links people share... but it's more powerful than PageRank. It's strictly more powerful because people might refer to that Paul Graham fundraising essay in like a thousand different ways. And so our model learns all the different ways.

All of this is now powered by a $5M cluster with 144 H200s:

This architectural choice enables entirely new search capabilities:

Comprehensive result sets instead of approximations

Deep semantic understanding of queries

Ability to process complex, natural language requests

As search becomes more complex, time to results becomes a variable:

People think of searches as like, oh, it takes 500 milliseconds because we've been conditioned... But what if searches can take like a minute or 10 minutes or a whole day, what can you then do?

Unlike traditional search engines' fixed-cost indexing, Exa employs a hybrid approach:

Front-loaded compute for indexing and embeddings

Variable inference costs based on query complexity

Mix of owned infrastructure ($5M H200 cluster) and cloud resources

Exa sees a lot of competition from products like Perplexity and ChatGPT Search which layer AI on top of traditional search backends, but Exa is betting that true innovation requires rethinking search from the ground up. For example, the recently launched Websets, a way to turn searches into structured output in grid format, allowing you to create lists and databases out of web pages. The company raised a $17M Series A to build towards this mission, so keep an eye out for them in 2025.

Chapters

00:00:00 Introductions

00:01:12 ExaAI's initial pitch and concept

00:02:33 Will's background at SpaceX and Zoox

00:03:45 Evolution of ExaAI (formerly Metaphor Systems)

00:05:38 Exa's link prediction technology

00:09:20 Meaning of the name "Exa"

00:10:36 ExaAI's new product launch and capabilities

00:13:33 Compute budgets and variable compute products

00:14:43 Websets as a B2B offering

00:19:28 How do you build a search engine?

00:22:43 What is Neural PageRank?

00:27:58 Exa use cases

00:35:00 Auto-prompting

00:38:42 Building agentic search

00:44:19 Is o1 on the path to AGI?

00:49:59 Company culture and nap pods

00:54:52 Economics of AI search and the future of search technology

Full YouTube Transcript

Please like and subscribe!

Show Notes

ExaAI

Web Search Product

Websets

Series A Announcement

Exa Nap Pods

Perplexity AI

Character.AI

Transcript

Alessio [00:00:00]: Hey, everyone. Welcome to the Latent Space podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co-host Swyx, founder of