We are seeking a highly skilled and motivated
Data Engineer
to join our team. The ideal candidate will be responsible for designing, developing, and optimizing large-scale data pipelines and data warehouse solutions, utilizing a modern, cloud-native data stack. You'll play a crucial role in transforming raw data into actionable insights, ensuring data quality, and maintaining the infrastructure required for seamless data flow.
Key Responsibilities
Develop, construct, test, and maintain robust and scalable large-scale ETL pipelines
using
PySpark
for processing and
Apache Airflow
for workflow orchestration.
Design and implement both
Batch ETL
and
Streaming ETL
processes to handle various data ingestion requirements.
Build and optimize data structures and schemas in cloud data warehouses like
AWS Redshift .
Work extensively with AWS data services, including
AWS EMR
for big data processing,
AWS Glue
for serverless ETL, and
Amazon S3
for data storage.
Implement and manage real-time data ingestion pipelines using technologies like
Kafka
and
Debezium
for Change Data Capture (CDC).
Interact with and integrate data from various relational and NoSQL databases such as
MySQL ,
PgSQL (PostgreSQL) , and
MongoDB .
Monitor, troubleshoot, and optimize data pipeline performance and reliability.
Collaborate with data scientists, analysts, and other engineering teams to understand data needs and deliver high-quality, reliable data solutions.
Ensure data governance, security, and quality across all data platforms.
Required Skills & Qualifications
Technical Skills
Expert proficiency
in developing ETL / ELT solutions using
PySpark .
Strong experience in workflow management and scheduling tools, specifically
Apache Airflow .
In-depth knowledge of
AWS data services
including : AWS EMR
(Elastic MapReduce)
AWS Glue
AWS Redshift
Amazon S3
Proven experience implementing and managing data streams using
Kafka .
Familiarity with Change Data Capture (CDC) tools like
Debezium .
Hands-on experience with diverse database technologies :
MySQL ,
PgSQL , and
MongoDB .
Solid understanding of data warehousing concepts, dimensional modeling, and best practices for both batch and real-time data processing.
Proficiency in a scripting language, preferably
Python .
General Qualifications
Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
Excellent problem-solving, analytical, and communication skills.
Ability to work independently and collaboratively in a fast-paced, dynamic environment.
Nice to Have (Preferred Skills)
Experience with Infrastructure as Code (e.g., Terraform, CloudFormation).
Knowledge of containerization technologies (Docker, Kubernetes).
Familiarity with CI / CD pipelines.
Data Engineer • Dehra Dun, Uttarakhand, India