
AI Inference: Good, Fast, and Cheap, with Lin Qiao & Dmytro Ivchenko of Fireworks AI
04/20/24 • 102 min
In this episode, we delve into the intricate world of AI inference with cofounders of Firework AI. Discover the strategies behind optimizing AI performance, the importance of balancing latency and throughput, and the nuances of different AI architectures from GPT-3 to Stable Diffusion. Learn about their partnership with Stability AI, their unique focus on reducing total cost of ownership, and their vision for a seamless developer experience.
RECOMMENDED PODCAST:
How Do You Use ChatGPT with Dan Shipper.
Dan Shipper talks to programmers, writers, founders, academics, tech executives, and others to walk through all of their ChatGPT use cases (including Nathan!). They even use ChatGPT together, live on the show. Listen to How Do You Use ChatGPT? from Dan Shipper and the team at Every, wherever you get your podcasts: https://link.chtbl.com/hdyuchatgpt
SPONSORS:
Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off https://www.omneky.com/
The Brave search API can be used to assemble a data set to train your AI models and help with retrieval augmentation at the time of inference. All while remaining affordable with developer first pricing, integrating the Brave search API into your workflow translates to more ethical data sourcing and more human representative data sets. Try the Brave search API for free for up to 2000 queries per month at https://bit.ly/BraveTCR
Plumb is a no-code AI app builder designed for product teams who care about quality and speed. What is taking you weeks to hand-code today can be done confidently in hours. Check out https://bit.ly/PlumbTCR for early access.
Head to Squad to access global engineering without the headache and at a fraction of the cost: head to https://choosesquad.com/ and mention “Turpentine” to skip the waitlist.
CHAPTERS :
(00:00:00) Introduction
(00:08:34) Compute Stack
(00:19:23) Fireworks Product Philosophy
(00:24:11) Sponsors : Brave / Omneky
(00:25:40) Fine-tuning Strategy
(00:38:40) Sponsors : Plumb / Squad
(00:40:33) NVIDIA Stack Overview
(00:47:14) TensorFlow Triton Service
(00:55:25) Reduced Precision Advantages
(01:03:57) Different Deployment Scenarios
(01:08:27) Seeking Intuition on Sharding
(01:19:28) Announcing Stability AI Partnership
(01:32:00) Closing Remarks
In this episode, we delve into the intricate world of AI inference with cofounders of Firework AI. Discover the strategies behind optimizing AI performance, the importance of balancing latency and throughput, and the nuances of different AI architectures from GPT-3 to Stable Diffusion. Learn about their partnership with Stability AI, their unique focus on reducing total cost of ownership, and their vision for a seamless developer experience.
RECOMMENDED PODCAST:
How Do You Use ChatGPT with Dan Shipper.
Dan Shipper talks to programmers, writers, founders, academics, tech executives, and others to walk through all of their ChatGPT use cases (including Nathan!). They even use ChatGPT together, live on the show. Listen to How Do You Use ChatGPT? from Dan Shipper and the team at Every, wherever you get your podcasts: https://link.chtbl.com/hdyuchatgpt
SPONSORS:
Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off https://www.omneky.com/
The Brave search API can be used to assemble a data set to train your AI models and help with retrieval augmentation at the time of inference. All while remaining affordable with developer first pricing, integrating the Brave search API into your workflow translates to more ethical data sourcing and more human representative data sets. Try the Brave search API for free for up to 2000 queries per month at https://bit.ly/BraveTCR
Plumb is a no-code AI app builder designed for product teams who care about quality and speed. What is taking you weeks to hand-code today can be done confidently in hours. Check out https://bit.ly/PlumbTCR for early access.
Head to Squad to access global engineering without the headache and at a fraction of the cost: head to https://choosesquad.com/ and mention “Turpentine” to skip the waitlist.
CHAPTERS :
(00:00:00) Introduction
(00:08:34) Compute Stack
(00:19:23) Fireworks Product Philosophy
(00:24:11) Sponsors : Brave / Omneky
(00:25:40) Fine-tuning Strategy
(00:38:40) Sponsors : Plumb / Squad
(00:40:33) NVIDIA Stack Overview
(00:47:14) TensorFlow Triton Service
(00:55:25) Reduced Precision Advantages
(01:03:57) Different Deployment Scenarios
(01:08:27) Seeking Intuition on Sharding
(01:19:28) Announcing Stability AI Partnership
(01:32:00) Closing Remarks
Previous Episode

Reading Minds from Shared Latent Space
In this episode, we explore the cutting-edge world of AI-powered brain imaging with the MindEye2 Project. Discover how researchers are reconstructing images from fMRI data, offering unprecedented insights into the human mind. We discuss the implications for neuroscience, clinical diagnosis, and the future of AI-assisted brain decoding.
RECOMMENDED PODCAST:
How Do You Use ChatGPT with Dan Shipper.
Dan Shipper talks to programmers, writers, founders, academics, tech executives, and others to walk through all of their ChatGPT use cases (including Nathan!). They even use ChatGPT together, live on the show. Listen to How Do You Use ChatGPT? from Dan Shipper and the team at Every, wherever you get your podcasts : https://link.chtbl.com/hdyuchatgpt
SPONSORS:
Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off https://www.omneky.com/
The Brave search API can be used to assemble a data set to train your AI models and help with retrieval augmentation at the time of inference. All while remaining affordable with developer first pricing, integrating the Brave search API into your workflow translates to more ethical data sourcing and more human representative data sets. Try the Brave search API for free for up to 2000 queries per month at https://bit.ly/BraveTCR
Plumb is a no-code AI app builder designed for product teams who care about quality and speed. What is taking you weeks to hand-code today can be done confidently in hours. Check out https://bit.ly/PlumbTCR for early access.
Head to Squad to access global engineering without the headache and at a fraction of the cost: head to https://choosesquad.com/ and mention “Turpentine” to skip the waitlist.
CHAPTERS:
(00:00) Introduction
(07:24) Evolution of Brain Decoding Tech
(10:33) From MindEye1 to MindEye2
(14:12) Sponsor: Omneky
(14:43) Foundation Models for Enhanced AI
(26:49) Future of Image Reconstruction from Brain Data
(34:10) Sponsors: Brave / Plumb / Squad
(44:23) Evaluating AI-Generated Reconstructions
(48:10) Understanding Semantic Info in the Brain
(51:47) Potential and Ethics of Brain Data Usage
(57:13) Foundation Models in Neuroscience
(01:02:46) Future of Brain-Computer Interfaces
Next Episode

Robotics Research Update, with Keerthana Gopalakrishnan and Ted Xiao of Google DeepMind
Google DeepMind researchers Keerthana Gopalakrishnan and Ted Xiao discuss their latest breakthroughs in AI robotics. Including models that enable robots to understand novel objects, learn from human demonstrations, and operate under ethical constraints. The conversation covers six groundbreaking papers that showcase rapid progress towards general-purpose robotics.
SPONSORS:
The Brave search API can be used to assemble a data set to train your AI models and help with retrieval augmentation at the time of inference. All while remaining affordable with developer first pricing, integrating the Brave search API into your workflow translates to more ethical data sourcing and more human representative data sets. Try the Brave search API for free for up to 2000 queries per month at https://bit.ly/BraveTCR
Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off https://www.omneky.com/
Plumb is a no-code AI app builder designed for product teams who care about quality and speed. What is taking you weeks to hand-code today can be done confidently in hours. Check out https://bit.ly/PlumbTCR for early access.
Head to Squad to access global engineering without the headache and at a fraction of the cost: head to https://choosesquad.com/ and mention “Turpentine” to skip the waitlist.
CHAPTERS :
(00:00) Introduction
(04:44) The Future of Robotics
(14:40) Sponsors : Brave / Omneky
(16:04) Inputs and Outputs
(24:03) A Leap Towards Generalist Robots
(32:08) Sponsors : Plumb / Squad
(33:52) Learning in Robotics
(41:12) Learning from On-the-Fly Examples
(41:57) Annotating Robot Actions
(50:17) Scaling and Safety
(01:03:05) Learning to Learn Faster
(01:08:43) Zero-Shot Learning
(01:15:15) Future Directions
If you like this episode you’ll love
Episode Comments
Generate a badge
Get a badge for your website that links back to this episode
<a href="https://goodpods.com/podcasts/the-cognitive-revolution-ai-builders-researchers-and-live-player-analy-251008/ai-inference-good-fast-and-cheap-with-lin-qiao-and-dmytro-ivchenko-of-49440184"> <img src="https://storage.googleapis.com/goodpods-images-bucket/badges/generic-badge-1.svg" alt="listen to ai inference: good, fast, and cheap, with lin qiao & dmytro ivchenko of fireworks ai on goodpods" style="width: 225px" /> </a>
Copy