We are seeking an experienced Full Stack Data Engineer with 5–6 years of industry experience. The ideal candidate will have a proven track record of working on live projects, preferably within the manufacturing or energy sectors. He / she will play a key role in developing and maintaining scalable data solutions using PySpark, SQL, and modern data engineering frameworks .
Key Responsibilities
- Develop and deploy end-to-end data pipelines and solutions integrating with various data sources and systems.
- Collaborate with cross-functional teams to understand data requirements and deliver effective BI and analytical solutions.
- Implement data ingestion, transformation, and processing workflows using Spark (PySpark / Scala) and SQL .
- Develop and maintain data models and ETL / ELT processes , ensuring high performance, scalability, reliability, and data quality.
- Build and maintain APIs and data services to support analytics, reporting, and application integration.
- Ensure data quality, integrity, and security across all stages of the data lifecycle.
- Monitor, troubleshoot, and optimize pipeline performance in a cloud-based environment.
- Write clean, modular, and well-documented Python / Scala / SQL / PySpark code.
- Integrate data from various sources including APIs, relational / non-relational databases, IoT devices, and external providers.
- Ensure adherence to data governance, security, and compliance policies.
Required Skills & Experience
Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.5-6 years of hands-on experience in Data Engineering , with a strong focus on Apache Spark (PySpark) .Strong programming skills in Python / PySpark and / or Scala , with deep understanding of Apache Spark .Strong SQL skills for data manipulation, analysis, and performance tuning.Strong understanding of data architecture, data modeling, ETL / ELT processes , and data warehousing concepts.Experience building and maintaining ETL / ELT pipelines in production environments.Experience working with structured and unstructured data, including JSON, Parquet, Avro, and time-series data.Familiarity with cloud-based data platforms (Azure / AWS / GCP preferred).Familiarity with CI / CD pipelines and tools like Azure DevOps, Git, and DevOps practices for data engineering.Excellent problem-solving skills, attention to detail, and ability to work independently or as part of a team.Strong communication skills for interaction with technical and non-technical stakeholders.