Apache Kafka

Language: Java

Data

Kafka was originally developed at LinkedIn to handle real-time activity stream data and later became an open-source project under the Apache Software Foundation. It enables applications to process and react to streams of events efficiently and reliably.

Apache Kafka is a distributed streaming platform used for building real-time data pipelines and streaming applications. It provides high-throughput, fault-tolerant, and scalable messaging between producers and consumers.

Installation

maven: Add org.apache.kafka:kafka-clients dependency in pom.xml
gradle: Add implementation 'org.apache.kafka:kafka-clients:3.6.1' in build.gradle

Usage

Kafka allows developers to produce and consume messages to topics, stream processing using Kafka Streams, and integrate with other systems using Kafka Connect. It supports distributed, fault-tolerant, and scalable messaging architecture.

Creating a Kafka producer

import org.apache.kafka.clients.producer.*;
import java.util.Properties;

Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

KafkaProducer<String, String> producer = new KafkaProducer<>(props);
producer.send(new ProducerRecord<>("test-topic", "key1", "Hello Kafka"));
producer.close();

Creates a Kafka producer and sends a simple message to a topic.

Creating a Kafka consumer

import org.apache.kafka.clients.consumer.*;
import java.util.*;

Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test-group");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");

KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("test-topic"));
while(true) {
    for (ConsumerRecord<String, String> record : consumer.poll(Duration.ofMillis(100))) {
        System.out.println("Received: " + record.value());
    }
}

Creates a Kafka consumer that subscribes to a topic and continuously polls for messages.

Using Kafka Streams

import org.apache.kafka.streams.*;
import org.apache.kafka.streams.kstream.*;

Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-app");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
StreamsBuilder builder = new StreamsBuilder();
KStream<String, String> source = builder.stream("input-topic");
source.mapValues(value -> value.toUpperCase()).to("output-topic");
KafkaStreams streams = new KafkaStreams(builder.build(), props);
streams.start();

Demonstrates a simple Kafka Streams application that transforms messages to uppercase and writes to another topic.

Configuring partitions and keys

producer.send(new ProducerRecord<>("test-topic", "key1", "Message 1"));
producer.send(new ProducerRecord<>("test-topic", "key2", "Message 2"));

Specifies keys to control message partitioning for ordered processing per key.

Using consumer groups

// Multiple consumers in the same group share messages from partitions, enabling parallel processing.

Kafka allows scaling consumption by adding consumers to a group; messages are distributed across group members.

Handling offsets

consumer.commitSync();

Manually commit offsets to track processed messages and ensure fault-tolerance.

Error Handling

TimeoutException: Occurs when the producer cannot send a message in time. Check broker availability and network connectivity.
SerializationException: Ensure that the key and value serializers/deserializers match the data types.
GroupRebalanceException: Occurs when consumers in a group are rebalancing. Handle gracefully in the consumer application.

Best Practices

Use appropriate key selection to control partitioning and ordering.

Configure producer retries and acknowledgments for reliability.

Use consumer groups for scaling processing.

Monitor consumer lag to detect processing bottlenecks.

Leverage Kafka Streams for complex transformations and stateful processing.