About the Role :
We are looking for an experienced Data Engineer with strong hands-on expertise in building, optimizing, and maintaining large-scale data processing systems. The ideal candidate will have deep technical knowledge in Apache Spark, data lakehouse technologies, and streaming platforms, along with solid programming experience and exposure to modern orchestration and deployment tools.
Key Responsibilities :
- Design, build, and optimize scalable data pipelines for batch and streaming data using Apache Spark and Hive.
- Develop and maintain robust ETL / ELT frameworks for data ingestion, transformation, and integration across multiple data sources.
- Implement data lakehouse architectures using Iceberg, Delta Lake, or Apache Hudi to support advanced analytics and reporting.
- Work with streaming technologies like Kafka and Solace for real-time data processing and ingestion.
- Collaborate with cross-functional teams including Data Scientists, Analysts, and Application Developers to deliver reliable data solutions.
- Automate and orchestrate workflows using Airflow or Control-M and integrate SQL transformations using DBT (preferred).
- Ensure high performance, scalability, and reliability of data systems while adhering to data governance and quality standards.
- Deploy, monitor, and manage data applications within containerized environments using Docker and Kubernetes.
- Optimize data storage, access, and retrieval from object stores such as S3 or MinIO.
- Contribute to design reviews, code reviews, and continuous improvement of data engineering best practices.
Required Skills and Qualifications :
6- 9 years of hands-on experience in data engineering and large-scale distributed systems.Strong expertise in Apache Spark (both batch and streaming).Proficiency in Python, Scala, or Java for building data pipelines.Experience with Hive and data transformation frameworks such as DBT (preferred).Solid understanding of SQL and performance tuning for large datasets.Experience working with Kafka, Solace, and object storage systems (S3, MinIO).Familiarity with Docker and Kubernetes for application deployment and container orchestration.Hands-on experience with data lakehouse formats such as Iceberg, Delta Lake, or Hudi.Knowledge of workflow orchestration tools (Airflow, Control-M) and CI / CD concepts.Excellent problem-solving, debugging, and analytical skills.Strong communication and collaboration abilities in a cross-functional environment.Preferred Qualifications :
Experience in cloud environments (AWS, Azure, or GCP).Understanding of data security, governance, and compliance frameworks.Exposure to data modeling, metadata management, and data catalog tools.Certification in Big Data or Cloud Platforms is a plus.(ref : hirist.tech)