Description :
We are looking for an experienced Data Engineer with a strong background in Java and hands-on expertise in Python / PySpark, Big Data technologies, and Google Cloud Platform (GCP). The ideal candidate should be capable of designing, developing, and maintaining large-scale data processing systems, ensuring data quality, scalability, and performance. Exposure to machine learning (ML) pipelines and data model optimization will be a strong advantage.
Key Responsibilities :
- Design, develop, and optimize data pipelines and ETL workflows using Java, Python, and PySpark.
- Work with Big Data frameworks (Hadoop, Spark, Hive, Kafka, etc.) for data ingestion and transformation.
- Implement and manage data solutions on GCP (BigQuery, Dataflow, Dataproc, Pub / Sub, Cloud Storage).
- Collaborate with data scientists and ML engineers to operationalize machine learning models.
- Optimize performance and scalability of distributed data processing systems.
- Ensure data quality, governance, and security throughout the data lifecycle.
- Work with cross-functional teams to translate business requirements into data solutions.
Required Skills :
Strong programming skills in Java and Python.Expertise in PySpark for distributed data processing.Experience with Big Data ecosystems (Hadoop, Spark, Hive, Kafka, etc.).Hands-on experience with GCP services (BigQuery, Dataflow, Dataproc, Pub / Sub).Good understanding of data warehousing, data modeling, and ETL design patterns.Exposure to machine learning concepts and model deployment workflows.Proficiency in SQL and working with NoSQL databases.Strong problem-solving and analytical skills.Good to Have :
Experience with CI / CD, Airflow, or other orchestration tools.Familiarity with containerization (Docker, Kubernetes).Knowledge of other cloud platforms (AWS, Azure) is a plus.Education :
Bachelors or Masters degree in Computer Science, Information Technology, or a related field.(ref : hirist.tech)