Schema registry introduction and configurations

Ranjeet Borate
2 min readMay 4, 2023

--

Due the Apache Kafka’s publish subscribe architecture, scalable nature and fault tolerant ability, it has been a gaining popularity among the developers community.

Schema registry working visualized

As we know, we have the producers and consumers in kafka which perform their jobs as their names suggests, Kafka generally sends data into byte format over the network, but as we know that producers and consumers do not communicate with each other directly, we have kafka topic in place.

On this kafka topic the producer sends the message which is then consumed by the consumer, but the consumer still needs to know what type of data it’s going to deserialize. This is where schema registry comes into effect.

Schema registry is an application which falls outside of the kafka cluster but handles the schema distribution between the producer and consumer by persisting a copy of the schema into the schema registry’s local cache.

With the schema registry in the place, the kafka producer first communicates with the schema registry to check if the local cache has the schema already present. If it doesn’t find the schema then it registers one into the local cache and once the producer gets the schema, it serializes the message with the schema and converts the message into binary format and prepends with the schemaId provided by schema registry and then the message is produced over the topic.

When consumer receives the message, it gets the schemaId prepended to the message and finds the schema definition in schema registry with the schemaId for the deserializing the received message into relevant schema. But if the consumer doesn’t find a suitable schema with the schemaId, the schema register will let producer know that the schema agreement has been broken.

The above give explanation was about how the schema registry works, and below I’m explaining what all configurations do we need to do for starting schema registry in distributed mode and standalone mode.

Schema Registry in Distributed Mode:

For starting schema registry into distributed mode, below are few of the properties which we need to modify in schema registry properties.

listeners=http://<HOSTNAME>:<PORT>
schema.registry.group.id=schema-registry-cluster
kafkastore.topic=schema-registry-topic

Here,

  • listeners: Is the URL using which the schema registry uses to listen to the kafka producer/consumers etc.
  • schema.registry.group.id: Is the cluster name under which the schema registry instance will start, for bringing multiple schema registry instances under single cluster, we must have multiple property files and should keep this property’s value same in all property files.
  • kafkastore.topic: Is the topic name on which the schemas are stored.

Schema Registry in Standalone Mode:

For starting schema registry in standalone mode we need to ensure that the property mentioned below is not present in any of the schema registry instances which had been brought up in the standalone mode.

schema.registry.group.id=******** //Property should be removed or value must be kept blank

--

--

Ranjeet Borate
Ranjeet Borate

Written by Ranjeet Borate

Interested in Tech • General Knowledge Awareness • Astronomy • Airforce and Aircrafts • History • Trekking

No responses yet