Role : Kafka Developer
Location : Pune , India
Exp : 10- 15 years
Location : Pune, India (with Travel to Onsite)
Experience Required :
10+ years overall, with 5+ years in Kafka-based data streaming development. Must have delivered production-grade Kafka pipelines integrated with real-time data sources and downstream analytics platforms.
Overview :
We are looking for a Kafka Developer to design and implement real-time data ingestion pipelines using Apache Kafka. The role involves integrating with upstream flow record sources, transforming and validating data, and streaming it into a centralized data lake for analytics and operational intelligence.
Key Responsibilities :
- Develop Kafka producers to ingest flow records from upstream systems such as flow record exporters (e.g., IPFIX-compatible probes).
- Build Kafka consumers to stream data into Spark Structured Streaming jobs and downstream data lakes.
- Define and manage Kafka topic schemas using Avro and Schema Registry for schema evolution.
- Implement message serialization, transformation, enrichment, and validation logic within the streaming pipeline.
- Ensure exactly once processing, checkpointing, and fault tolerance in streaming jobs.
- Integrate with downstream systems such as HDFS or Parquet-based data lakes, ensuring compatibility with ingestion standards.
- Collaborate with Kafka administrators to align topic configurations, retention policies, and security protocols.
- Participate in code reviews, unit testing, and performance tuning to ensure high-quality deliverables.
- Document pipeline architecture, data flow logic, and operational procedures for handover and support.
Required Skills & Qualifications :
Proven experience in developing Kafka producers and consumers for real-time data ingestion pipelines.Strong hands-on expertise in Apache Kafka, Kafka Connect, Kafka Streams, and Schema Registry.Proficiency in Apache Spark (Structured Streaming) for real-time data transformation and enrichment.Solid understanding of IPFIX, NetFlow, and network flow data formats; experience integrating with nProbe Cento is a plus.Experience with Avro, JSON, or Protobuf for message serialization and schema evolution.Familiarity with Cloudera Data Platform components such as HDFS, Hive, YARN, and Knox.Experience integrating Kafka pipelines with data lakes or warehouses using Parquet or Delta formats.Strong programming skills in Scala, Java, or Python for stream processing and data engineering tasks.Knowledge of Kafka security protocols including TLS / SSL, Kerberos, and access control via Apache Ranger.Experience with monitoring and logging tools such as Prometheus, Grafana, and Splunk.Understanding of CI / CD pipelines, Git-based workflows, and containerization (Docker / Kubernetes)(ref : hirist.tech)