Job Title : Data Engineer
Experience : 2 4 years
Key Responsibilities :
- Architect and lead the development of scalable, reusable Python-based ETL pipelines to ingest, process, and manage data from complex enterprise systems such as SAP, OCPLM, and others.
- Drive data transformation strategies, ensuring raw data is cleaned, enriched, and optimized for downstream analytics, machine learning models, and visualization platforms.
- Collaborate cross-functionally with data scientists, analysts, and supply chain stakeholders to enable predictive and prescriptive analytics solutions that inform critical business decisions.
- Mentor and guide junior data engineers, promoting best practices in data pipeline development, automation, and code quality across the team.
- Lead initiatives in data automation, leveraging Python, cloud-native orchestration tools, and CI / CD practices to enhance efficiency and reliability.
- Oversee cloud data engineering efforts, particularly within Azure Data Lake and Databricks environments, ensuring solutions are secure, cost-effective, and scalable.
- Establish and enforce data governance standards, including data integrity, quality assurance, lineage tracking, and documentation throughout the pipeline lifecycle.
- Stay abreast of advancements in AI / ML, providing structured, production-ready data to accelerate model development and deployment in supply chain contexts.
- Influence architectural decisions, tool selection, and long-term data strategy in alignment with organizational goals.
Required Skills & Qualifications :
24 years of experience in data engineering or analytical programming roles.Strong programming skills in Python for data transformation and automation.Good command over SQL for querying and joining large datasets.Familiarity with Azure Data Lake, Databricks, or similar cloud platforms.Solid analytical thinking and problem-solving ability in working with complex, real-world datasets.Understanding of ML concepts like regression, classification, forecasting (hands-on not mandatory).Ability to understand enterprise data structures from systems like SAP or OCPLM, even without direct system access.Tools & Technologies :
Python, Pandas, NumPy, SQLAzure Data Lake, DatabricksML libraries : scikit-learn, XGBoost (as needed)(ref : hirist.tech)