We are seeking a highly skilled and motivated Data Engineer to join our team. The ideal candidate will be responsible for designing, developing, and optimizing large-scale data pipelines and data warehouse solutions, utilizing a modern, cloud-native data stack. You'll play a crucial role in transforming raw data into actionable insights, ensuring data quality, and maintaining the infrastructure required for seamless data flow.
Key Responsibilities
Develop, construct, test, and maintain robust and scalable large-scale ETL pipelines using PySpark for processing and Apache Airflow for workflow orchestration.
Design and implement both Batch ETL and Streaming ETL processes to handle various data ingestion requirements.
Build and optimize data structures and schemas in cloud data warehouses like AWS Redshift .
Work extensively with AWS data services, including AWS EMR for big data processing, AWS Glue for serverless ETL, and Amazon S3 for data storage.
Implement and manage real-time data ingestion pipelines using technologies like Kafka and Debezium for Change Data Capture (CDC).
Interact with and integrate data from various relational and NoSQL databases such as MySQL , PgSQL (PostgreSQL) , and MongoDB .
Monitor, troubleshoot, and optimize data pipeline performance and reliability.
Collaborate with data scientists, analysts, and other engineering teams to understand data needs and deliver high-quality, reliable data solutions.
Ensure data governance, security, and quality across all data platforms.
Required Skills & Qualifications
Technical Skills
Expert proficiency in developing ETL / ELT solutions using PySpark .
Strong experience in workflow management and scheduling tools, specifically Apache Airflow .
In-depth knowledge of AWS data services including :
AWS EMR (Elastic MapReduce)
AWS Glue
AWS Redshift
Amazon S3
Proven experience implementing and managing data streams using Kafka .
Familiarity with Change Data Capture (CDC) tools like Debezium .
Hands-on experience with diverse database technologies : MySQL , PgSQL , and MongoDB .
Solid understanding of data warehousing concepts, dimensional modeling, and best practices for both batch and real-time data processing.
Proficiency in a scripting language, preferably Python .
General Qualifications
Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
Excellent problem-solving, analytical, and communication skills.
Ability to work independently and collaboratively in a fast-paced, dynamic environment.
Nice to Have (Preferred Skills)
Experience with Infrastructure as Code (e.g., Terraform, CloudFormation).
Knowledge of containerization technologies (Docker, Kubernetes).
Familiarity with CI / CD pipelines.
Data Engineer • Tirupur, Tamil Nadu, India