Apache Kafka

Apache Kafka is an open-source distributed event streaming platform used to build real-time data pipelines and streaming applications. Originally developed by LinkedIn and later open-sourced through the Apache Software Foundation, Kafka is designed to handle high-throughput, low-latency data transmission across systems in a fault-tolerant and scalable way.

At its core, Kafka is a message broker. It allows systems to publish (write) and subscribe to (read) streams of records, similar to a message queue. However, unlike traditional message brokers, Kafka stores these records durably and allows consumers to replay them as needed. This design makes Kafka ideal for both real-time and historical data processing. Apache Kafka is built in Java.

Kafka organizes messages into topics, and each topic is split into partitions. Producers write messages to topics, while consumers read from them. The use of partitions allows Kafka to scale horizontally by distributing load across multiple servers, called brokers, and supports parallel processing by multiple consumers. Kafka also maintains a configurable retention period, so data can be reprocessed or analyzed long after it's published.

Thanks to its high performance and reliability, Kafka is widely used in industries like finance, e-commerce, telecommunications, and more. It often serves as the backbone for data platforms, enabling event-driven architectures, microservices communication, log aggregation, and real-time analytics.

Kafka Topics

A Kafka topic is a named stream of records in Apache Kafka where producers publish messages and consumers read them. Topics are append-only logs, split into partitions for scalability and parallelism, and each message is stored in order within a partition and retained for a configurable period. The Integration Platform uses three Kafka topics, as follows:

Change Data Capture (CDC) Topic

The change data capture topic receives messages describing individual data change notifications, where each message indicates that a particular record in a particular file was created, updated, or deleted. Each message includes the current state of the record's data, and update messages also include the original state of the record's data before the update took place. Each message is assigned a unique offset number within the topic, meaning that the messages are stored in exactly the same sequence as the changes occurred on the system of record.

Other special types of messages also exist but are not currently implemented. For example, in the future, it will be possible for your system of record to send "start transaction" and "stop transaction" messages that can be used to group related messages together into logical transactions, and consumer agents will be able to process those messages together as a real transaction.

Snapshot Request Topic

If any component needs to initiate the creation of a new snapshot of the system of record’s data, it can do so by sending a message to the snapshot request topic. These requests usually originate from either a target system agent, such as a SQL Agent or ISAM Agent, or from system management tools.

The Snapshot Agent consumes messages from this topic.

Snapshot Response Topic

Having received a request for the creation of a snapshot and completed the creation and upload of the snapshot to network storage, the Snapshot Agent sends a completion message back to the snapshot response topic, to be consumed by whatever component initially requested the snapshot.