Log in

goodpods headphones icon

To access all our features

Open the Goodpods app
Close icon
Streaming Audio: Apache Kafka® & Real-Time Data - Consistent, Complete Distributed Stream Processing ft. Guozhang Wang

Consistent, Complete Distributed Stream Processing ft. Guozhang Wang

Streaming Audio: Apache Kafka® & Real-Time Data

07/22/21 • 29 min

plus icon
bookmark
Share icon

Stream processing has become an important part of the big data landscape as a new programming paradigm to implement real-time data-driven applications. One of the biggest challenges for streaming systems is to provide correctness guarantees for data processing in a distributed environment. Guozhang Wang (Distributed Systems Engineer, Confluent) contributed to a leadership paper, along with other leaders in the Apache Kafka® community, on consistency and completeness in streaming processing in Apache Kafka in order to shed light on what a reimagined, modern infrastructure looks like.

In his white paper titled Consistency and Completeness: Rethinking Distributed Stream Processing in Apache Kafka, Guozhang covers the following topics in his paper:

  • Streaming correctness challenges
  • Stream processing with Kafka
  • Exactly-once in Kafka Streams

For context, accurate, real-time data stream processing is more friendly to modern organizations that are composed of vertically separated engineering teams. Unlike in the past, stream processing was considered as an auxiliary system to normal batch processing oriented systems, often bearing issues around consistency and completeness. While modern streaming engines, such as ksqlDB and Kafka Streams are designed to be authoritative, as the source of truth, and are no longer treated as an approximation, by providing strong correctness guarantees. There are two major umbrellas of the correctness of guarantees:

  1. Consistency: Ensuring unique and extant records
  2. Completeness: Ensuring the correct order of records, also referred to as exactly-once semantics.

Guozhang also answers the question of why he wrote this academic paper, as he believes in the importance of knowledge sharing across the community and bringing industry experience back to academia (the paper is also published in SIGMOD 2021, one of the most important conference proceedings in the data management research area). This will help foster the next generation of industry innovation and push one step forward in the data streaming and data management industry. In Guozhang's own words, "Academic papers provide you this proof of concept design, which gets groomed into a big system."

EPISODE LINKS

07/22/21 • 29 min

plus icon
bookmark
Share icon

Streaming Audio: Apache Kafka® & Real-Time Data - Consistent, Complete Distributed Stream Processing ft. Guozhang Wang

Transcript

Tim Berglund:
Guozhang Wang is a software engineer and a computer science Ph.D. who I work with here at Confluent. He's recently written a paper on consistency and completeness in stream processing in Apache Kafka and submitted it to an academic journal. Now interestingly, this isn't just a thing he did in his spare time. It's a thing that is actually a part of his work here. And you might wonder why would a product company, a cloud company do a thing like that? Well, I ask him that questio

Generate a badge

Get a badge for your website that links back to this episode

Select type & size
Open dropdown icon
share badge image

<a href="https://goodpods.com/podcasts/streaming-audio-apache-kafka-and-real-time-data-99374/consistent-complete-distributed-stream-processing-ft-guozhang-wang-15711791"> <img src="https://storage.googleapis.com/goodpods-images-bucket/badges/generic-badge-1.svg" alt="listen to consistent, complete distributed stream processing ft. guozhang wang on goodpods" style="width: 225px" /> </a>

Copy