
Commoditizing the Petaflop — with George Hotz of the tiny corp
06/20/23 • 72 min
1 Listener
We are now launching our dedicated new YouTube and Twitter! Any help in amplifying our podcast would be greatly appreciated, and of course, tell your friends!
Notable followon discussions collected on Twitter, Reddit, Reddit, Reddit, HN, and HN. Please don’t obsess too much over the GPT4 discussion as it is mostly rumor; we spent much more time on tinybox/tinygrad on which George is the foremost authority!
We are excited to share the world’s first interview with George Hotz on the tiny corp!
If you don’t know George, he was the first person to unlock the iPhone, jailbreak the PS3, went on to start Comma.ai, and briefly “interned” at the Elon Musk-run Twitter.
Tinycorp is the company behind the deep learning framework tinygrad, as well as the recently announced tinybox, a new $15,000 “luxury AI computer” aimed at local model training and inference, aka your “personal compute cluster”:
738 FP16 TFLOPS
144 GB GPU RAM
5.76 TB/s RAM bandwidth
30 GB/s model load bandwidth (big llama loads in around 4 seconds)
AMD EPYC CPU
1600W (one 120V outlet)
Runs 65B FP16 LLaMA out of the box (using tinygrad, subject to software development risks)
(In the episode, we also talked about the future of the tinybox as the intelligence center of every home that will help run models, at-home robots, and more. Make sure to check the timestamps 👀 )
The tiny corp manifesto
There are three main theses to tinycorp:
If XLA/PrimTorch are CISC, tinygrad is RISC: CISC (Complex Instruction Set Computing) are more complex instruction sets where a single instruction can execute many low-level operations. RISC (Reduced Instruction Set Computing) are smaller, and only let you execute a single low-level operation per instruction, leading to faster and more efficient instruction execution. If you’ve used the Apple Silicon M1/M2, AMD Ryzen, or Raspberry Pi, you’ve used a RISC computer.
If you can’t write a fast ML framework for GPU, you can’t write one for your own chip: there are many “AI chips” companies out there, and they all started from taping the chip. Some of them like Cerebras are still building, while others like Graphcore seem to be struggling. But building chips with higher TFLOPS isn’t enough: “There’s a great chip already on the market. For $999, you get a 123 TFLOP card with 24 GB of 960 GB/s RAM. This is the best FLOPS per dollar today, and yet...nobody in ML uses it.”, referring to the AMD RX 7900 XTX. NVIDIA’s lead is not only thanks to high-performing cards, but also thanks to a great developer platform in CUDA. Starting with the chip development rather than the dev toolkit is much more cost-intensive, so tinycorp is starting by writing a framework for off-the-shelf hardware rather than taping their own chip.
Turing completeness considered harmful: Once you call in to Turing complete kernels, you can
We are now launching our dedicated new YouTube and Twitter! Any help in amplifying our podcast would be greatly appreciated, and of course, tell your friends!
Notable followon discussions collected on Twitter, Reddit, Reddit, Reddit, HN, and HN. Please don’t obsess too much over the GPT4 discussion as it is mostly rumor; we spent much more time on tinybox/tinygrad on which George is the foremost authority!
We are excited to share the world’s first interview with George Hotz on the tiny corp!
If you don’t know George, he was the first person to unlock the iPhone, jailbreak the PS3, went on to start Comma.ai, and briefly “interned” at the Elon Musk-run Twitter.
Tinycorp is the company behind the deep learning framework tinygrad, as well as the recently announced tinybox, a new $15,000 “luxury AI computer” aimed at local model training and inference, aka your “personal compute cluster”:
738 FP16 TFLOPS
144 GB GPU RAM
5.76 TB/s RAM bandwidth
30 GB/s model load bandwidth (big llama loads in around 4 seconds)
AMD EPYC CPU
1600W (one 120V outlet)
Runs 65B FP16 LLaMA out of the box (using tinygrad, subject to software development risks)
(In the episode, we also talked about the future of the tinybox as the intelligence center of every home that will help run models, at-home robots, and more. Make sure to check the timestamps 👀 )
The tiny corp manifesto
There are three main theses to tinycorp:
If XLA/PrimTorch are CISC, tinygrad is RISC: CISC (Complex Instruction Set Computing) are more complex instruction sets where a single instruction can execute many low-level operations. RISC (Reduced Instruction Set Computing) are smaller, and only let you execute a single low-level operation per instruction, leading to faster and more efficient instruction execution. If you’ve used the Apple Silicon M1/M2, AMD Ryzen, or Raspberry Pi, you’ve used a RISC computer.
If you can’t write a fast ML framework for GPU, you can’t write one for your own chip: there are many “AI chips” companies out there, and they all started from taping the chip. Some of them like Cerebras are still building, while others like Graphcore seem to be struggling. But building chips with higher TFLOPS isn’t enough: “There’s a great chip already on the market. For $999, you get a 123 TFLOP card with 24 GB of 960 GB/s RAM. This is the best FLOPS per dollar today, and yet...nobody in ML uses it.”, referring to the AMD RX 7900 XTX. NVIDIA’s lead is not only thanks to high-performing cards, but also thanks to a great developer platform in CUDA. Starting with the chip development rather than the dev toolkit is much more cost-intensive, so tinycorp is starting by writing a framework for off-the-shelf hardware rather than taping their own chip.
Turing completeness considered harmful: Once you call in to Turing complete kernels, you can
Previous Episode

Emergency Pod: OpenAI's new Functions API, 75% Price Drop, 4x Context Length (w/ Alex Volkov, Simon Willison, Riley Goodside, Joshua Lochner, Stefania Druga, Eric Elliott, Mayo Oshin et al)
Full Transcript and show notes: https://www.latent.space/p/function-agents?sd=pf
Timestamps:
[00:00:00] Intro
[00:01:47] Recapping June 2023 Updates
[00:06:24] Known Issues with Long Context
[00:08:00] New Functions API
[00:10:45] Riley Goodside
[00:12:28] Simon Willison
[00:14:30] Eric Elliott
[00:16:05] Functions API and Agents
[00:18:25] Functions API vs Google Vertex JSON
[00:21:32] From English back to Code
[00:26:14] Embedding Price Drop and Pinecone Perspective
[00:30:39] Xenova and Huggingface Perspective
[00:34:23] Function Selection
[00:39:58] Designing Code Agents with Function API
[00:42:16] Models as Routers
[00:46:48] Prompt Engineering replaced by Finetuning
[00:52:15] The 2 Code x LLM Paradigms
[00:56:30] Smol Models for the future
[00:58:54] The Evolution of the GPT API
[01:03:27] Functions API Security vs Prompt Injection
[01:16:18] GPT Model Upgrades
[01:17:36] JSONformer
[01:21:03] Closing Comments - What We Want Next
Get full access to Latent.Space at www.latent.space/subscribe
Next Episode
![undefined - [Cognitive Revolution] The Tiny Model Revolution with Ronen Eldan and Yuanzhi Li of Microsoft Research](https://storage.googleapis.com/goodpods-images-bucket/episode_images/91041dd799c003d5847148c89183a66dc664eb4a188fcb2e877d4debe7c2e863.avif)
[Cognitive Revolution] The Tiny Model Revolution with Ronen Eldan and Yuanzhi Li of Microsoft Research
Thanks to the over 1m people that have checked out the Rise of the AI Engineer. It’s a long July 4 weekend in the US, and we’re celebrating with a podcast feed swap!
We’ve been big fans of Nathan Labenz and Erik Torenberg’s work at the Cognitive Revolution podcast for a while, which started around the same time as we did and has done an incredible job of hosting discussions with top researchers and thinkers in the field, with a wide range of topics across computer vision (a special focus thanks to Nathan’s work at Waymark), GPT-4 (with exceptional insight due to Nathan’s time on the GPT-4 “red team”), healthcare/medicine/biotech (Harvard Medical School, Med-PaLM, Tanishq Abraham, Neal Khosla), investing and tech strategy (Sarah Guo, Elad Gil, Emad Mostaque, Sam Lessin), safety and policy, curators and influencers and exceptional AI founders (Josh Browder, Eugenia Kuyda, Flo Crivello, Suhail Doshi, Jungwon Byun, Raza Habib, Mahmoud Felfel, Andrew Feldman, Matt Welsh, Anton Troynikov, Aravind Srinivas).
If Latent Space is for AI Engineers, then Cognitive Revolution covers the much broader field of ...
If you like this episode you’ll love
Episode Comments
Generate a badge
Get a badge for your website that links back to this episode
<a href="https://goodpods.com/podcasts/latent-space-the-ai-engineer-podcast-262472/commoditizing-the-petaflop-with-george-hotz-of-the-tiny-corp-31033444"> <img src="https://storage.googleapis.com/goodpods-images-bucket/badges/generic-badge-1.svg" alt="listen to commoditizing the petaflop — with george hotz of the tiny corp on goodpods" style="width: 225px" /> </a>
Copy