The aim of this post is to describe the configuration required for a Flink application, deployed on a Kerberos secured Hadoop/Yarn cluster, to connect to a Kerberos-secured Apache Kafka cluster using two different keytabs. The following steps worked for me. Depending on your environment setup, the specific steps may vary even though the general idea might just be similar.
This post assumes that you are already able to connect to the Hadoop/Yarn cluster. Connecting to a secured Hadoop cluster is fairly straightforward and the official documentation explains it well.
I recently worked on a project that used Spark Structured Streaming using Apache Spark, Confluent SchemaRegistry and Apache Kafka. Due to some versioning constraints between the various components, I had to write a custom implementation of the KafkaAvroSerializer class for serializing Spark Dataframes into Avro format. The serialized data was then published to Kafka. This post is based on the examples specified in the Confluent documentation here.
In newer versions of Confluent Schema Registry, lot of the implementations detailed below have been simplified and much easier to use. The standard recommended usage of the Confluent KafkaAvroSerializer is fairly simple in that it requires you to set it as one of the Kafka properties that is used when initializing a KafkaProducer:
Towards the end of 2017, I found myself falling behind on my habit of reading books. It wasn’t that I was not reading lesser than usual or not reading at all. It was just that I spent most of my time reading newspapers and magazines; mostly the latter, which is something I enjoyed very much. Due to this, when it came to reading books, I did not have much to be happy about.