Freelance Data Engineer

upGradLudhiana, Punjab, India

5 days ago

Job description

We are seeking a highly skilled and motivated

Data Engineer

to join our team. The ideal candidate will be responsible for designing, developing, and optimizing large-scale data pipelines and data warehouse solutions, utilizing a modern, cloud-native data stack. You'll play a crucial role in transforming raw data into actionable insights, ensuring data quality, and maintaining the infrastructure required for seamless data flow.

Key Responsibilities

Develop, construct, test, and maintain robust and scalable large-scale ETL pipelines

using

PySpark

for processing and

Apache Airflow

for workflow orchestration.

Design and implement both

Batch ETL

and

Streaming ETL

processes to handle various data ingestion requirements.

Build and optimize data structures and schemas in cloud data warehouses like

AWS Redshift .

Work extensively with AWS data services, including

AWS EMR

for big data processing,

AWS Glue

for serverless ETL, and

Amazon S3

for data storage.

Implement and manage real-time data ingestion pipelines using technologies like

Kafka

and

Debezium

for Change Data Capture (CDC).

Interact with and integrate data from various relational and NoSQL databases such as

MySQL ,

PgSQL (PostgreSQL) , and

MongoDB .

Monitor, troubleshoot, and optimize data pipeline performance and reliability.

Collaborate with data scientists, analysts, and other engineering teams to understand data needs and deliver high-quality, reliable data solutions.

Ensure data governance, security, and quality across all data platforms.

Required Skills & Qualifications

Technical Skills

Expert proficiency

in developing ETL / ELT solutions using

PySpark .

Strong experience in workflow management and scheduling tools, specifically

Apache Airflow .

In-depth knowledge of

AWS data services

including : AWS EMR

(Elastic MapReduce)

AWS Glue

AWS Redshift

Amazon S3

Proven experience implementing and managing data streams using

Kafka .

Familiarity with Change Data Capture (CDC) tools like

Debezium .

Hands-on experience with diverse database technologies :

MySQL ,

PgSQL , and

MongoDB .

Solid understanding of data warehousing concepts, dimensional modeling, and best practices for both batch and real-time data processing.

Proficiency in a scripting language, preferably

Python .

General Qualifications

Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.

Excellent problem-solving, analytical, and communication skills.

Ability to work independently and collaboratively in a fast-paced, dynamic environment.

Nice to Have (Preferred Skills)

Experience with Infrastructure as Code (e.g., Terraform, CloudFormation).

Knowledge of containerization technologies (Docker, Kubernetes).

Familiarity with CI / CD pipelines.

Create a job alert for this search

Data Engineer • Ludhiana, Punjab, India