Skip to content

Apache Kafka

Apache Kafka is an open-source distributed event streaming platform used to build real-time data pipelines and streaming applications. Originally developed by LinkedIn and later open-sourced through the Apache Software Foundation, Kafka is designed to handle high-throughput, low-latency data transmission across systems in a fault-tolerant and scalable way.

At its core, Kafka is a message broker. It allows systems to publish (write) and subscribe to (read) streams of records, similar to a message queue. However, unlike traditional message brokers, Kafka stores these records durably and allows consumers to replay them as needed. This design makes Kafka ideal for both real-time and historical data processing.

Kafka organizes messages into topics, and each topic is split into partitions. Producers write messages to topics, while consumers read from them. The use of partitions allows Kafka to scale horizontally by distributing load across multiple servers, called brokers, and supports parallel processing by multiple consumers. Kafka also maintains a configurable retention period, so data can be reprocessed or analyzed long after it's published.

Thanks to its high performance and reliability, Kafka is widely used in industries like finance, e-commerce, telecommunications, and more. It often serves as the backbone for data platforms, enabling event-driven architectures, microservices communication, log aggregation, and real-time analytics.

Apache Kafka is built in Java.

Kafka Topics

The Integration Platform uses three Kafka topics, as follows:

Main Event Topic

The main event topic receives messages describing individual data change notifications, where each message indicates that a particular record in a particular file was created, updated, or deleted. Each message includes the current state of the records data, and update messages also include the original state of the records data before the update took place. Each message is assigned a unique offset number within the topic, meaning that the messages are stored in exactly the same sequence as the changes occurred in the origin application.

Other special types of messages also exist but are not currently implemented. For example, in the future, it will be possible for an origin application to send "start transaction" and "stop transaction" messages that can be used to group related messages together into logical transactions, and consumer agents will be able to process those messages together as a real transaction.

Snapshot Request Topic

The snapshot request topic can be written to to initiate a new snapshot of the origin application's data. These requests usually originate from either a target system agent, such as a SQL Replicator Agent or ISAM Replicator agent, or from system management tools.

The consumer of this topic is a snapshot agent.

Snapshot Response Topic

The snapshot response topic is used by the snapshot agent to send a completion message back to the originator of a snapshot request once the snapshot has been completed and uploaded to network storage, such as an S3 bucket provided by SeaweedFS or some other provider.