Description : Role Overview
We are seeking a highly skilled Senior Kafka Data Engineer to design, build, and manage robust data pipelines that power both batch and real-time data processing across our enterprise data ecosystem. This role requires deep technical expertise in Cloudera, Azure Databricks, Kafka, and other cloud-based data platforms. The ideal candidate will be passionate about building scalable and high-performing data solutions, ensuring data quality, and enabling data-driven decision-making across the organization.
Key Responsibilities :
Data Pipeline Design & Development :
- Design, develop, test, and maintain end-to-end batch and streaming data pipelines using Cloudera, Apache Spark, Kafka, and Azure Data Services such as ADF, Databricks, and Cosmos DB.
- Build efficient ETL and ELT frameworks to transform raw data into structured, usable formats for downstream analytics and reporting.
- Implement data ingestion frameworks from multiple structured and unstructured sources (APIs, databases, streams, files, etc.).
- Automate and orchestrate complex data workflows using Azure Data Factory and Airflow (if applicable).
Performance Optimization & Data Quality :
Optimize data pipelines for scalability, performance, reliability, and cost efficiency.Implement data validation, monitoring, and error-handling mechanisms to ensure high-quality data delivery.Perform root cause analysis on data issues and propose long-term solutions for stability and consistency.Collaboration & Solution Design :
Collaborate with Data Architects, Analysts, and Data Scientists to design data models that align with business requirements.Partner with business stakeholders to translate requirements into technical data pipeline solutions.Contribute to the development and implementation of data governance, metadata management, and lineage tracking practices.Innovation & Continuous Improvement :
Evaluate and integrate emerging technologies and tools in the data ecosystem (e.g., Delta Lake, Iceberg, Lakehouse architectures).Advocate for and implement DevOps and CI / CD practices for data pipelines using tools like Git, Azure DevOps, Jenkins, or similar.Contribute to data platform modernization initiatives, including migration to cloud-native or Lakehouse architectures.Mentorship & Leadership :
Provide technical leadership and mentorship to junior data engineers, ensuring adherence to best practices in coding, testing, and deployment.Review code and ensure compliance with established engineering and data management standards.Qualifications & Skills :
Required Technical Skills :
8+ years of IT experience, with 5+ years in Data Engineering and cloud-based data platforms.Strong hands-on experience with Cloudera / Hadoop Ecosystem, Apache Spark, and Kafka (Confluent or Apache) for batch and streaming data.Expertise in Azure data services Data Factory (ADF), Databricks, Cosmos DB, Synapse Analytics.Strong programming proficiency in Python or Scala, with advanced SQL skills.In-depth knowledge of NoSQL databases (Cosmos DB, MongoDB) including data modeling, indexing, and query optimization.Experience in building Lakehouse / Data Lake architectures and managing data across distributed storage environments.Familiarity with data security, compliance, and governance frameworks.Preferred Skills :
Knowledge of containerization and orchestration tools (Docker, Kubernetes).Familiarity with streaming frameworks like Structured Streaming, Flink, or Storm.Experience with data cataloging tools (e.g., Purview, Collibra, or Alation).Working knowledge of CI / CD pipelines and infrastructure-as-code (Terraform, ARM templates).Soft Skills :
Strong analytical and problem-solving abilities with a focus on optimization and data flow efficiency.Excellent communication and collaboration skills to work cross-functionally with engineering, analytics, and business teams.Demonstrated ability to mentor junior engineers and lead by example in an agile, fast-paced environment.Proactive mindset with a passion for continuous learning and innovation.(ref : hirist.tech)