Language: Java
Serialization / Data
Avro was created as part of the Hadoop project to provide a language-agnostic, schema-based serialization system. It allows schemas to evolve over time, supports rich data structures, and integrates seamlessly with Hadoop, Kafka, and other big data tools. Its compact binary format and dynamic typing make it efficient for data storage and streaming.
Apache Avro is a data serialization system that provides compact, fast, and binary data serialization for Java and other languages. It is widely used in big data ecosystems for storing and transmitting structured data efficiently.
Add dependency in pom.xml:
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
<version>1.12.3</version>
</dependency>implementation 'org.apache.avro:avro:1.12.3'Avro uses JSON-defined schemas to generate Java classes. It supports serialization, deserialization, and schema evolution. Avro can serialize data in a compact binary format or a readable JSON format.
{
"type": "record",
"name": "User",
"namespace": "com.example",
"fields": [
{"name": "name", "type": "string"},
{"name": "age", "type": "int"}
]
}Defines a simple Avro record `User` with `name` and `age` fields.
# Using avro-tools
java -jar avro-tools-1.12.3.jar compile schema user.avsc src/main/javaGenerates Java classes from the Avro schema for use in serialization/deserialization.
User user = User.newBuilder().setName("Alice").setAge(30).build();
ByteArrayOutputStream out = new ByteArrayOutputStream();
DatumWriter<User> writer = new SpecificDatumWriter<>(User.class);
BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(out, null);
writer.write(user, encoder);
encoder.flush();
byte[] serializedData = out.toByteArray();Serializes a `User` object to a compact binary format.
DatumReader<User> reader = new SpecificDatumReader<>(User.class);
BinaryDecoder decoder = DecoderFactory.get().binaryDecoder(serializedData, null);
User deserializedUser = reader.read(null, decoder);
System.out.println(deserializedUser.getName());Deserializes binary data back into a `User` object.
Schema schema = new Schema.Parser().parse(new File("user.avsc"));
GenericRecord record = new GenericData.Record(schema);
record.put("name", "Bob");
record.put("age", 25);Demonstrates dynamic usage of Avro records without generating Java classes.
// Add a new optional field in schema without breaking old data
{ "name": "email", "type": ["null", "string"], "default": null }Supports adding optional fields while maintaining backward and forward compatibility.
Use specific Java classes for better type safety when possible.
Leverage default values in schemas for schema evolution.
Use Avro binary format for compact storage and JSON format for readability during debugging.
Integrate with Kafka or Hadoop for streaming and batch data processing.
Validate schemas before serialization to prevent runtime errors.