Role : Data Engineer
Type : Permanent (No third-party payroll)
Location : Pune, Hyderabad
Joining : 0 to 30 days / Immediate Joiners
Job Description :
We are seeking a skilled and motivated Data Engineer to design, build, and maintain robust data pipelines and infrastructure using Python, SQL, GCP, and AWS. You will play a critical role in enabling data-driven decision-making by ensuring data is accessible, reliable, and well-structured for analytics, reporting, and machine learning use cases.
Responsibilities :
- Design, develop, and maintain scalable data pipelines and ETL / ELT workflows using Python and SQL.
- Build and optimize data models, data marts, and data lakes on cloud platforms (GCP and AWS).
- Work with tools and services like BigQuery, Cloud Storage, Pub / Sub, Lambda, S3, Glue, Redshift, etc., to manage end-to-end data workflows.
- Collaborate with data scientists, analysts, and product teams to understand data requirements and deliver clean, reliable, and timely datasets.
- Implement data quality, validation, and governance frameworks to ensure the accuracy and integrity of data.
- Monitor pipeline performance, troubleshoot issues, and optimize for speed and cost-efficiency.
- Set up infrastructure-as-code (IaC) where appropriate using tools like Terraform or CloudFormation (optional but a plus).
- Ensure data security and compliance with organizational and regulatory policies (e.g., GDPR, HIPAA).
- Maintain documentation of data processes, systems, and flows.
Requirements :
Strong proficiency in Python for data engineering (e.g., pandas, pySpark, Airflow).Advanced knowledge of SQL (analytical queries, optimization, indexing, joins, CTEs).Hands-on experience with cloud-native data services in GCP (BigQuery, Dataflow, Cloud Functions, Pub / Sub) and AWS (Redshift, S3, Glue, Lambda).Experience with data orchestration tools (e.g., Apache Airflow, Cloud Composer).Familiarity with CI / CD and version control systems (e.g., GitHub, GitLab, Jenkins).Experience with data warehousing concepts, dimensional modeling, and data lake architectures.Strong understanding of data security and governance best practices.Excellent problem-solving and communication skills.(ref : hirist.tech)