jackyfkc.github.io

教土豆学计算机

Kafka Stream

Kafka Streams is a client library for building mission-critical real-time applications and micro-services, where the input and/or output data is stored in Kafka clusters.

Concepts

Architecture Overview

Stream Architecture

A topology contains an acyclic graph of sources, processors, and sinks.

Local State Stores

Stream Local State

Windowing A Stream

Developer Guide

Application Id

An identifier for the stream processing application. Must be unique within the Kafka cluster.

This id is used in the following places to isolate resources used by the application from others:

When an application is updated, it is recommended to change `application.id` unless it is safe to let the updated application re-use the existing data in internal topics and state stores. One pattern could be to embed version information within application.id, e.g., `my-app-v1.0.0` vs. `my-app-v1.0.2`.

Kafka Bootstrap Servers

A list of host/port pairs to use for establishing the initial connection to the Kafka cluster

One Kafka cluster only: Currently Kafka Streams applications can only talk to a single Kafka cluster specified by this config value. In the future Kafka Streams will be able to support connecting to different Kafka clusters for reading input streams and/or writing output streams.

ZooKeeper connection

Zookeeper connection string for Kafka topic management

Before 0.10.0, Kafka Streams needs to access ZooKeeper directly for creating its internal topics. Internal topics are created when a state store is used, or when a stream is repartitioned for aggregation.

In 0.10.0 release, ZooKeeper dependency of Kafka Streams and zookeeper.connect is temporary and removed after KIP-4 is incorporated.

References