Tech on the Rocks

Kostas, Nitay

Join Kostas and Nitay as they speak with amazingly smart people who are building the next generation of technology, from hardware to cloud compute. Tech on the Rocks is for people who are curious about the foundations of the tech industry. Recorded primarily from our offices and homes, but one day we hope to record in a bar somewhere. Cheers!

All episodes

Best episodes

Top 10 Tech on the Rocks Episodes

Goodpods has curated a list of the 10 best Tech on the Rocks episodes, ranked by the number of listens and likes each episode have garnered from our listeners. If you are listening to Tech on the Rocks for the first time, there's no better place to start than with one of these standout episodes. If you are a fan of the show, vote for your favorite Tech on the Rocks episode by adding your comments to the episode page.

How Denormalized is Building ‘DuckDB for Streaming’ with Apache DataFusion

Tech on the Rocks

09/13/24 • 62 min

In this episode, Kostas and Nitay are joined by Amey Chaugule and Matt Green, co-founders of Denormalized. They delve into how Denormalized is building an embedded stream processing engine—think “DuckDB for streaming”—to simplify real-time data workloads. Drawing from their extensive backgrounds at companies like Uber, Lyft, Stripe, and Coinbase. Amey and Matt discuss the challenges of existing stream processing systems like Spark, Flink, and Kafka. They explain how their approach leverages Apache DataFusion, to create a single-node solution that reduces the complexities inherent in distributed systems.

The conversation explores topics such as developer experience, fault tolerance, state management, and the future of stream processing interfaces. Whether you’re a data engineer, application developer, or simply interested in the evolution of real-time data infrastructure, this episode offers valuable insights into making stream processing more accessible and efficient.

Contacts & Links
Amey Chaugule
Matt Green
Denormalized
Denormalized Github Repo

Chapters
00:00 Introduction and Background
12:03 Building an Embedded Stream Processing Engine
18:39 The Need for Stream Processing in the Current Landscape
22:45 Interfaces for Interacting with Stream Processing Systems
26:58 The Target Persona for Stream Processing Systems
31:23 Simplifying Stream Processing Workloads and State Management
34:50 State and Buffer Management
37:03 Distributed Computing vs. Single-Node Systems
42:28 Cost Savings with Single-Node Systems
47:04 The Power and Extensibility of Data Fusion
55:26 Integrating Data Store with Data Fusion
57:02 The Future of Streaming Systems
01:00:18 intro-outro-fade.mp3

Click here to view the episode transcript.

Proving Code Correctness: FizzBee and the Future of Formal Methods in Software Design with FizzBee's creator JP

Tech on the Rocks

10/08/24 • 61 min

In this episode, we chat with JP, creator of FizzBee, about formal methods and their application in software engineering. We explore the differences between coding and engineering, discussing how formal methods can improve system design and reliability. JP shares insights from his time at Google and explains why tools like FizzBee are crucial for distributed systems. We delve into the challenges of adopting formal methods in industry, the potential of FizzBee to make these techniques more accessible, and how it compares to other tools like TLA+. Finally, we discuss the future of software development, including the role of LLMs in code generation and the ongoing importance of human engineers in system design.

Links
FizzBee
FizzBee Github Repo
FizzBee Blog

Chapters
00:00 Introduction and Overview
02:42 JP's Experience at Google and the Growth of the Company
04:51 The Difference Between Engineers and Coders
06:41 The Importance of Rigor and Quality in Engineering
10:08 The Limitations of QA and the Need for Formal Methods
14:00 The Role of Best Practices in Software Engineering
14:56 Design Specification Languages for System Correctness
21:43 The Applicability of Formal Methods in Distributed Systems
31:20 Getting Started with FizzBee: A Practical Example
36:06 Common Assumptions and Misconceptions in Distributed Systems
43:23 The Role of FizzBee in the Design Phase
48:04 The Future of FizzBee: LLMs and Code Generation
58:20 Getting Started with FizzBee: Tutorials and Online Playground

Click here to view the episode transcript.

MLOps Evolution: Data, Experiments, and AI with Dean Pleban from DagsHub

Tech on the Rocks

09/27/24 • 53 min

In this episode, we chat with Dean Pleban, CEO of DagsHub, about machine learning operations. We explore the differences between DevOps and MLOps, focusing on data management and experiment tracking. Dean shares insights on versioning various components in ML projects and discusses the importance of user experience in MLOps tools. We also touch on DagsHub's integration of AI in their product and Dean's vision for the future of AI and machine learning in industry.

Links

DagsHub
The MLOps Podcast
Dean on LI

Chapters

00:00 Introduction and Background
03:03 Challenges of Managing Machine Learning Projects
10:00 The Concept of Experiments in Machine Learning
12:51 Data Curation and Validation for High-Quality Data
27:07 Connecting the Components of Machine Learning Projects with DAGS Hub
29:12 The Importance of Data and Clear Interfaces
43:29 Incorporating Machine Learning into DAGsHub
51:27 The Future of ML and AI

Unifying structured and unstructured data for AI: Rethinking ML infrastructure with Nikhil Simha and Varant Zanoyan

Tech on the Rocks

08/30/24 • 61 min

In this episode, we dive deep into the future of data infrastructure for AI and ML with Nikhil Simha and Varant Zanoyan, two seasoned engineers from Airbnb and Facebook. Nikhil and Varant share their journey from building real-time data systems and ML infrastructure at tech giants to launching their own venture.

The conversation explores the intricacies of designing developer-friendly APIs, the complexities of handling both batch and streaming data, and the delicate balance between customer needs and product vision in a startup environment.

Contacts & Links

Nikhil Simha
Varant Zanoyan
Chronon project

Chapters

00:00 Introduction and Past Experiences
04:38 The Challenges of Building Data Infrastructure for Machine Learning
08:01 Merging Real-Time Data Processing with Machine Learning
14:08 Backfilling New Features in Data Infrastructure
20:57 Defining Failure in Data Infrastructure
26:45 The Choice Between SQL and Data Frame APIs
34:31 The Vision for Future Improvements
38:17 Introduction to Chrono and Open Source
43:29 The Future of Chrono: New Computation Paradigms
48:38 Balancing Customer Needs and Vision
57:21 Engaging with Customers and the Open Source Community
01:01:26 Potential Use Cases and Future Directions

Click here to view the episode transcript.

Stream processing, LSMs and leaky abstractions with Chris Riccomini

Tech on the Rocks

08/23/24 • 53 min

In this episode, we chat with Chris Riccomini about the evolution of stream processing and the challenges in building applications on streaming systems. We also chat about leaky abstractions, good and bad API designs, what Chris loves and hates about Rust and finally about his exciting new project that involves object storage and LSMs.
Connect with Chris at:
LinkedIn
X
Blog
Materialized View Newsletter - His newsletter
The missing README - His book
SlateDB - His latest OSS Project

Chapters
00:00 Introduction and Background

04:05 The State of Stream Processing Today

08:53 The Limitations of SQL in Streaming Systems

14:00 Prioritizing the Developer Experience in Stream Processing

18:15 Improving the Usability of Streaming Systems

27:54 The Potential of State Machine Programming in Complex Systems

32:41 The Power of Rust: Compiling and Language Bindings

34:06 The Shift from Sidecar to Embedded Libraries Driven by Rust

35:49 Building an LSM on Object Storage: Cost-Effective State Management

39:47 The Unbundling and Composable Nature of Databases

47:30 The Future of Data Systems: More Companies and Focus on Metadata

Click here to view the episode transcript.

Semantic Layers: The Missing Link Between AI and Data with David Jayatillake from Cube

Tech on the Rocks

02/20/25 • 59 min

In this episode, we chat with David Jayatillake, VP of AI at Cube, about semantic layers and their crucial role in making AI work reliably with data.

We explore how semantic layers act as a bridge between raw data and business meaning, and why they're more practical than pure knowledge graphs.

David shares insights from his experience at Delphi Labs, where they achieved 100% accuracy in natural language data queries by combining semantic layers with AI, compared to just 16% accuracy with direct text-to-SQL approaches.

We discuss the challenges of building and maintaining semantic layers, the importance of proper naming and documentation, and how AI can help automate their creation.

Finally, we explore the future of semantic layers in the context of AI agents and enterprise data systems, and learn about Cube's upcoming AI-powered features for 2025.

00:00 Introduction to AI and Semantic Layers
05:09 The Evolution of Semantic Layers Before and After AI
09:48 Challenges in Implementing Semantic Layers
14:11 The Role of Semantic Layers in Data Access
18:59 The Future of Semantic Layers with AI
23:25 Comparing Text to SQL and Semantic Layer Approaches
27:40 Limitations and Constraints of Semantic Layers
30:08 Understanding LLMs and Semantic Errors
35:03 The Importance of Naming in Semantic Layers
37:07 Debugging Semantic Issues in LLMs
38:07 The Future of LLMs as Agents
41:53 Discovering Services for LLM Agents
50:34 What's Next for Cube and AI Integration

Reinventing Stream Processing: From LinkedIn to Responsive with Apurva Mehta

Tech on the Rocks

03/06/25 • 58 min

Summary

In this episode, Apurva Mehta, co-founder and CEO of Responsive, recounts his extensive journey in stream processing—from his early work at LinkedIn and Confluent to his current venture at Responsive.

He explains how stream processing evolved from simple event ingestion and graph indexing to powering complex, stateful applications such as search indexing, inventory management, and trade settlement.

Apurva clarifies the often-misunderstood concept of “real time,” arguing that low latency (often in the one- to two-second range) is more accurate for many applications than the instantaneous response many assume. He delves into the challenges of state management, discussing the limitations of embedded state stores like RocksDB and traditional databases (e.g., Postgres) when faced with high update rates and complex transactional requirements.

The conversation also covers the trade-offs between SQL-based streaming interfaces and more flexible APIs, and how Responsive is innovating by decoupling state from compute—leveraging remote state solutions built on object stores (like S3) with specialized systems such as SlateDB—to improve elasticity, cost efficiency, and operational simplicity in mission-critical applications.

Chapters

00:00 Introduction to Apurva Mehta and Streaming Background
08:50 Defining Real-Time in Streaming Contexts
14:18 Challenges of Stateful Stream Processing
19:50 Comparing Streaming Processing with Traditional Databases
26:38 Product Perspectives on Streaming vs Analytical Systems
31:10 Operational Rigor and Business Opportunities
38:31 Developers' Needs: Beyond SQL
45:53 Simplifying Infrastructure: The Cost of Complexity
51:03 The Future of Streaming Applications

Click here to view the episode transcript.

From GPU Compilers to architecting Kubernetes: A Conversation with Brian Grant

Tech on the Rocks

10/22/24 • 61 min

From GPU computing pioneer to Kubernetes architect, Brian Grant takes us on a fascinating journey through his career at the forefront of systems engineering. In this episode, we explore his early work on GPU compilers in the pre-CUDA era, where he tackled unique challenges in high-performance computing when graphics cards weren't yet designed for general computation. Brian then shares insights from his time at Google, where he helped develop Borg and later became the original lead architect of Kubernetes. He explains key architectural decisions that shaped Kubernetes, from its extensible resource model to its approach to service discovery, and why they chose to create a rich set of abstractions rather than a minimal interface. The conversation concludes with Brian's thoughts on standardization challenges in cloud infrastructure and his vision for moving beyond infrastructure as code, offering valuable perspective on both the history and future of distributed systems.

Links:
Brian Grant LI

Chapters

00:00 Introduction and Background
03:11 Early Work in High-Performance Computing
06:21 Challenges of Building Compilers for GPUs
13:14 Influential Innovations in Compilers
31:46 The Future of Compilers
33:11 The Rise of Niche Programming Languages
34:01 The Evolution of Google's Borg and Kubernetes
39:06 Challenges of Managing Applications in a Dynamically Scheduled Environment
48:12 The Need for Standardization in Application Interfaces and Management Systems
01:00:55 Driving Network Effects and Creating Cohesive Ecosystems
Click here to view the episode transcript.

Optimizing SQL with LLMs: Building Verified AI Systems at Espresso AI with Ben Lerner

Tech on the Rocks

01/03/25 • 66 min

In this episode, we chat with Ben, founder of Espresso AI, about his journey from building Excel Python integrations to optimizing data warehouse compute costs.

We explore his experience at companies like Uber and Google, where he worked on everything from distributed systems to ML and storage infrastructure.

We learn about the evolution of his latest venture, which started as a C++ compiler optimization project and transformed into a system for optimizing Snowflake workloads using ML.

Ben shares insights about applying LLMs to SQL optimization, the challenges of verified code transformation, and the importance of formal verification in ML systems. Finally, we discuss his practical approach to choosing ML models and the critical lesson he learned about talking to users before building products.

Chapters

00:00 Ben's Journey: From Startups to Big Tech
13:00 The Importance of Timing in Entrepreneurship
19:22 Consulting Insights: Learning from Clients
23:32 Transitioning to Big Tech: Experiences at Uber and Google
30:58 The Future of AI: End-to-End Systems and Data Utilization
35:53 Transitioning Between Domains: From ML to Distributed Systems
44:24 Espresso's Mission: Optimizing SQL with ML
51:26 The Future of Code Optimization and AI

Click here to view the episode transcript.

Evolving Data Infrastructure for the AI Era: AWS, Meta, and Beyond with Roy Ben-Alta

Tech on the Rocks

11/21/24 • 63 min

In this episode, we chat with Roy Ben-Alta, co-founder of Oakminer AI and former director at Meta AI Research, about his fascinating journey through the evolution of data infrastructure and AI. We explore his early days at AWS when cloud adoption was still controversial, his experience building large language models at Meta, and the challenges of training and deploying AI systems at scale. Roy shares valuable insights about the future of data warehouses, the emergence of knowledge-centric systems, and the critical role of data engineering in AI. We'll also hear his practical advice on building AI companies today, including thoughts on model evaluation frameworks, vendor lock-in, and the eternal "build vs. buy" decision. Drawing from his extensive experience across Amazon, Meta, and now as a founder, Roy offers a unique perspective on how AI is transforming traditional data infrastructure and what it means for the future of enterprise software.

Chapters

00:00 Introduction to Roy Benalta and AI Background
04:07 Warren Buffett Experience and MBA Insights
06:45 Lessons from Amazon and Meta Leadership
09:15 Early Days of AWS and Cloud Adoption
12:12 Redshift vs. Snowflake: A Data Warehouse Perspective
14:49 Navigating Complex Data Systems in Organizations
31:21 The Future of Personalized Software Solutions
32:19 Building Large Language Models at Meta
39:27 Evolution of Data Platforms and Infrastructure
50:50 Engineering Knowledge and LLMs
58:27 Build vs. Buy: Strategic Decisions for Startups

Show more best episodes

FAQ

How many episodes does Tech on the Rocks have?

Tech on the Rocks currently has 16 episodes available.

What topics does Tech on the Rocks cover?

The podcast is about Infrastructure, Cloud, Data, Podcasts and Technology.

What is the most popular episode on Tech on the Rocks?

The episode title 'Proving Code Correctness: FizzBee and the Future of Formal Methods in Software Design with FizzBee's creator JP' is the most popular.

What is the average episode length on Tech on the Rocks?

The average episode length on Tech on the Rocks is 61 minutes.

How often are episodes of Tech on the Rocks released?

Episodes of Tech on the Rocks are typically released every 14 days, 16 hours.

When was the first episode of Tech on the Rocks?

The first episode of Tech on the Rocks was released on Aug 23, 2024.

Show more FAQ

Tech on the Rocks

Kostas, Nitay

Top 10 Tech on the Rocks Episodes

FAQ

How many episodes does Tech on the Rocks have?

What topics does Tech on the Rocks cover?

What is the most popular episode on Tech on the Rocks?

What is the average episode length on Tech on the Rocks?

How often are episodes of Tech on the Rocks released?

When was the first episode of Tech on the Rocks?

Comments