We are seeking a Data Engineer with strong Apache NiFi expertise to design and implement pipelines that move and transform data from Cloudera (HDFS / Hive / Impala) into Apache Iceberg tables, with downstream integration into Snowflake and Databricks. The ideal candidate will have hands-on experience with modern data lakehouse architectures and will play a critical role in enabling scalable, governed, and high-performance data platforms.
Key Responsibilities
Data Ingestion & Pipeline Development=
Design, configure, and maintain NiFi data flows to extract, transform, and load data from Cloudera into Iceberg tables.
Implement streaming and batch ingestion pipelines with NiFi processors and custom scripting where needed.
Optimize NiFi workflows for scalability, reliability, and monitoring.
Data Lakehouse Enablement
Build and manage Apache Iceberg-based datasets for structured, semi-structured, and unstructured data.
Ensure schema evolution, partitioning, and metadata management in Iceberg.
Develop integration flows from Iceberg to Snowflake and Databricks for analytics, ML, and reporting use cases.
Integration & Orchestration
Work with Snowflake to ingest curated data from Iceberg for enterprise reporting and commercial insights.
Collaborate with Databricks teams to enable advanced analytics and machine learning use cases.
Integrate NiFi pipelines with orchestration tools (Airflow, Oozie, or AWS / Azure / GCP schedulers).
Performance, Security & Governance
Tune NiFi flows and Snowflake / Databricks ingestion for performance and cost optimization.
Implement role-based security and ensure compliance (HIPAA, GDPR, SOX if applicable).
Work with governance teams to enable lineage, metadata tracking, and auditability.
Qualifications
Bachelor’s degree in Computer Science, Information Systems, or related field.
5+ years of data engineering experience, with at least 2+ years working with Apache NiFi.
Strong experience with Cloudera ecosystem (HDFS, Hive, Impala, Spark).
Hands-on expertise with Apache Iceberg (schema evolution, time travel, partitioning, compaction).
Working knowledge of Snowflake and Databricks integration patterns.
Proficiency in SQL and one programming language (Python, Java, or Scala).
Understanding of data lakehouse architectures and ETL / ELT best practices.
Data Engineer • dombivli, India