Streaming Audio: Apache Kafka® & Real-Time Data
Confluent, founded by the original creators of Apache Kafka®
Streaming Audio features all things Apache Kafka®, Confluent, real-time data, and the cloud. We cover frequently asked questions, best practices, and use cases from the Kafka community—from Kafka connectors and distributed systems, to data mesh, data integration, modern data architectures, and data mesh built with Confluent and cloud Kafka as a service. Join our hosts as they stream through a series of interviews, stories, and use cases with guests from the data streaming industry. Apache®️, Apache Kafka, Kafka, and the Kafka logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.
All episodes
Best episodes
Seasons
Top 10 Streaming Audio: Apache Kafka® & Real-Time Data Episodes
Goodpods has curated a list of the 10 best Streaming Audio: Apache Kafka® & Real-Time Data episodes, ranked by the number of listens and likes each episode have garnered from our listeners. If you are listening to Streaming Audio: Apache Kafka® & Real-Time Data for the first time, there's no better place to start than with one of these standout episodes. If you are a fan of the show, vote for your favorite Streaming Audio: Apache Kafka® & Real-Time Data episode by adding your comments to the episode page.
Ask Confluent #8: Guozhang Wang on Kafka Streams Standby Tasks
Streaming Audio: Apache Kafka® & Real-Time Data
12/18/18 • 22 min
Gwen is joined in studio by special guest Guozhang Wang, Kafka Streams pioneer and engineering lead at Confluent. He’ll talk to us about standby tasks and how one deserializes message headers. In "Ask Confluent," Gwen Shapira (Data Architect, Confluent) and guests respond to a handful of questions and comments from Twitter, YouTube and elsewhere.
EPISODE LINKS
Consistent, Complete Distributed Stream Processing ft. Guozhang Wang
Streaming Audio: Apache Kafka® & Real-Time Data
07/22/21 • 29 min
Stream processing has become an important part of the big data landscape as a new programming paradigm to implement real-time data-driven applications. One of the biggest challenges for streaming systems is to provide correctness guarantees for data processing in a distributed environment. Guozhang Wang (Distributed Systems Engineer, Confluent) contributed to a leadership paper, along with other leaders in the Apache Kafka® community, on consistency and completeness in streaming processing in Apache Kafka in order to shed light on what a reimagined, modern infrastructure looks like.
In his white paper titled Consistency and Completeness: Rethinking Distributed Stream Processing in Apache Kafka, Guozhang covers the following topics in his paper:
- Streaming correctness challenges
- Stream processing with Kafka
- Exactly-once in Kafka Streams
For context, accurate, real-time data stream processing is more friendly to modern organizations that are composed of vertically separated engineering teams. Unlike in the past, stream processing was considered as an auxiliary system to normal batch processing oriented systems, often bearing issues around consistency and completeness. While modern streaming engines, such as ksqlDB and Kafka Streams are designed to be authoritative, as the source of truth, and are no longer treated as an approximation, by providing strong correctness guarantees. There are two major umbrellas of the correctness of guarantees:
- Consistency: Ensuring unique and extant records
- Completeness: Ensuring the correct order of records, also referred to as exactly-once semantics.
Guozhang also answers the question of why he wrote this academic paper, as he believes in the importance of knowledge sharing across the community and bringing industry experience back to academia (the paper is also published in SIGMOD 2021, one of the most important conference proceedings in the data management research area). This will help foster the next generation of industry innovation and push one step forward in the data streaming and data management industry. In Guozhang's own words, "Academic papers provide you this proof of concept design, which gets groomed into a big system."
EPISODE LINKS
- White Paper: Rethinking Distributed Stream Processing in Apache Kafka
- Blog: Rethinking Distributed Stream Processing in Apache Kafka
- Enabling Exactly-Once in Kafka Streams
- Why Kafka Streams Does Not Use Watermarks ft. Matthias Sax
- Streams and Tables: Two Sides of the Same Coin
- Watch the video version of this podcast
- Join the Confluent Community
- Learn more with Kafka tutorials, resources, and guides at Confluent Developer
- Live demo: Kafka streaming in 10 minutes on Confluent Cloud
- Use 60PDCAST to get $60 of free Confluent Cloud usage (details)
Scaling Apache Kafka in Retail with Microservices ft. Matt Simpson from Boden
Streaming Audio: Apache Kafka® & Real-Time Data
05/27/20 • 42 min
Apache Kafka® is a powerful toolset for microservice architectures. In this podcast, we’ll cover how Boden, an online retail company that specializes in high-end fashion linked to the royal family, used streaming microservices to modernize their business.
Matt Simpson (Solutions Architect, Boden) shares a real life use case showing how Kafka has helped Boden digitize their business, transitioning from catalogs to online sales, tracking stock, and identifying buying patterns. Matt also shares about what he's learned through using Kafka as well as the challenges of being a product master. And lastly, what is Matt excited for for the future of Boden? Find out in this episode!
EPISODE LINKS
- Digital Transformation in Style: How Boden Innovates Retail Using Apache Kafka
- Learn about Boden
- ETL and Event Streaming Explained ft. Stewart Bryson
- Connecting Snowflake and Apache Kafka ft. Isaac Kunen
- Instagram for Kensington Palace
- Join the Confluent Community Slack
- Learn more with Kafka tutorials, resources, and guides at Confluent Developer
Making Abstract Algebra Count in the World of Event Streaming ft. Sam Ritchie
Streaming Audio: Apache Kafka® & Real-Time Data
04/22/20 • 46 min
During his time at Twitter, Sam Ritchie (Staff Research Engineer, Google) led the development of Summingbird, a project that helped Twitter ingest and process massive amounts of data. It relieved some key pain points, saving developers at Twitter from doing work twice, as was a natural consequence of the then-current Lambda Architecture. In this episode, Sam dives teaches us some abstract algebra and explains how it has informed his attempts to make stream processing programs easy to write in a more general way.
EPISODE LINKS
Streaming Data Integration – Where Development Meets Deployment ft. James Urquhart
Streaming Audio: Apache Kafka® & Real-Time Data
04/15/20 • 55 min
Applications, development, deployment, and theory are all key pieces behind customer experience, event streaming, and improving systems and integration.
James Urquhart (Global Field CTO, VMware) is writing a book combining Wardley Mapping and Promise Theory to evaluate the future of event streaming and how it will become a more economic choice for users. James argues that reducing the cost of integration does not deter people from buying but instead encourages creativity to find more uses for integration. He stresses the importance of user experience and how knowing what users are going through helps mend products and workflows, which improves systems that bring economic value. The two then go into explanations around the Promise Theory, Jevons Paradox, and Geoffrey Moore's Core vs. Context Theory.
EPISODE LINKS
Machine Learning with TensorFlow and Apache Kafka ft. Chris Mattmann
Streaming Audio: Apache Kafka® & Real-Time Data
03/11/20 • 53 min
TensorFlow is an open source machine learning platform that can be used with Apache Kafka® for deep learning. Chris Mattmann, author of Machine Learning with TensorFlow, introduces us to TensorFlow as a Google technology that teaches computers how to think and make connections like humans do. For example, when there is a signifier that the mind processes, out comes a label to the object in front of you. TensorFlow is Google's version of wrangling various technologies to help group them together and work smoothly as large amounts of data flow through.
Chris also breaks down neural networks, how technology simulates cerebral processes that take place when our visual cortex receives a new image, plus a use case that involves Apache Kafka and event streaming to achieve TensorFlow's goals.
EPISODE LINKS
Distributed Systems Engineering with Apache Kafka ft. Gwen Shapira
Streaming Audio: Apache Kafka® & Real-Time Data
03/04/20 • 48 min
As an engineering leader managing a team, Gwen Shapira talks through the steps she took to get to Confluent and how she got started working with Apache Kafka®. She shares about what it's like being on the Project Management Committee (PMC) for the Apache Software Foundation as well as some of the responsibilities involved, such as choosing Kafka Improvement Proposals (KIPs), monitoring releases, and making contributions to the community.
For Gwen, part of finding Kafka was her willingness to take risks, learn all types of code bases, and leave companies for a new technology that showed promise and sparked her interest. Given that not only Kafka itself but also how people learn Kafka has changed, Gwen shares her best tips for approaching the project.
There are differences between distributed systems engineers and full stack engineers, and for anyone who wants to work at a company like Confluent, it’s important to showcase design and architecture knowledge, a knack for solving problems, and confidence in your ideas.
EPISODE LINKS
Streaming Call of Duty at Activision with Apache Kafka ft. Yaroslav Tkachenko
Streaming Audio: Apache Kafka® & Real-Time Data
01/27/20 • 46 min
Call of Duty: Modern Warfare is the most played Call of Duty multiplayer of this console generation with over $1 billion in sales and almost 300 million multiplayer matches. Behind the scenes, Yaroslav Tkachenko (Software Engineer and Architect, Activision) gets to be on the team behind it all, architecting, designing, and implementing their next-generation event streaming platform, including a large-scale, near-real-time streaming data pipeline using Kafka Streams and Kafka Connect.
Learn about how his team ingests huge amounts of data, what the backend of their massive distributed system looks like, and the automated services involved for collecting data from each pipeline.
EPISODE LINKS
Confluent Platform 5.4 | What's New in This Release + Updates
Streaming Audio: Apache Kafka® & Real-Time Data
01/22/20 • 14 min
A quick summary of new features, updates, and improvements in Confluent Platform 5.4, including Role-Based Access Control (RBAC), Structured Audit Logs, Multi-Region Clusters, Confluent Control Center enhancements, Schema Validation, and the preview for Tiered Storage.This release also includes pull queries and embedded connectors in preview as part of KSQL.
EPISODE LINKS
Kafka Streams in Action with Bill Bejeck
Streaming Audio: Apache Kafka® & Real-Time Data
09/27/18 • 49 min
Tim Berglund interviews Bill Bejeck about the Kafka Streams API and his new book, Kafka Streams in Action.
Show more best episodes
Show more best episodes
FAQ
How many episodes does Streaming Audio: Apache Kafka® & Real-Time Data have?
Streaming Audio: Apache Kafka® & Real-Time Data currently has 270 episodes available.
What topics does Streaming Audio: Apache Kafka® & Real-Time Data cover?
The podcast is about Open Source, Cloud, Data, How To, Podcasts, Technology and Education.
What is the most popular episode on Streaming Audio: Apache Kafka® & Real-Time Data?
The episode title 'Benchmarking Apache Kafka Latency at the 99th Percentile ft. Anna Povzner' is the most popular.
What is the average episode length on Streaming Audio: Apache Kafka® & Real-Time Data?
The average episode length on Streaming Audio: Apache Kafka® & Real-Time Data is 37 minutes.
How often are episodes of Streaming Audio: Apache Kafka® & Real-Time Data released?
Episodes of Streaming Audio: Apache Kafka® & Real-Time Data are typically released every 7 days.
When was the first episode of Streaming Audio: Apache Kafka® & Real-Time Data?
The first episode of Streaming Audio: Apache Kafka® & Real-Time Data was released on Jun 20, 2018.
Show more FAQ
Show more FAQ