Key Responsibilities :
- Dataiku Leadership : Lead data engineering initiatives focusing on leveraging Dataiku's capabilities for data preparation, analysis, visualization, and deploying data-driven solutions.
- Data Pipeline Development : Design, develop, and optimize scalable and robust data pipelines to support business intelligence and advanced analytics projects, including automation of ETL / ELT processes from diverse data sources.
- Data Modeling & Architecture : Apply best practices in data modeling (dimensional, Kimball, Inmon) to create efficient, scalable database architectures ensuring data integrity and performance.
- ETL / ELT Expertise : Implement, manage, and optimize ETL / ELT workflows using various tools to maintain reliable, high-quality data flow and accessibility.
- Gen AI Integration : Explore and implement solutions using LLM Mesh or similar frameworks to integrate Generative AI capabilities into data engineering processes.
- Programming & Scripting : Use Python and SQL extensively for data manipulation, automation, and development of custom data solutions.
- Cloud Platform Deployment : Deploy and manage scalable data solutions on AWS or Azure cloud platforms, leveraging cloud services for performance and cost efficiency.
- Data Quality & Governance : Ensure integration of data sources maintains high-quality, consistent, and accessible data; implement and follow data governance best practices.
- Collaboration & Mentorship : Work closely with data scientists, analysts, and other stakeholders to translate data requirements into effective solutions; mentor junior team members when needed.
- Performance Optimization : Monitor and optimize data pipeline and system performance continuously to meet business needs.
Required Skills & Experience :
Proficiency in Dataiku for data prep, visualization, and building end-to-end data pipelines and applications.Strong expertise in data modeling techniques such as dimensional modeling (Kimball, Inmon).Extensive experience with ETL / ELT tools and processes (e.g., Dataiku built-in tools, Apache Airflow, Talend, SSIS).Familiarity with LLM Mesh or similar Generative AI frameworks.Advanced skills in Python programming and SQL querying for data manipulation and automation.Hands-on experience with cloud platforms like AWS or Azure for scalable data deployments.Understanding of Generative AI concepts and potential applications.Excellent analytical, problem-solving, communication, and interpersonal skills.Bonus Skills (Nice to Have) :
Experience with big data technologies such as Spark, Hadoop, Snowflake.Knowledge of data governance and security best practices.Familiarity with MLOps principles and tools.Contributions to open-source projects in data engineering or AI.Education :
Bachelor's or Master's degree in Computer Science, Data Science, Engineering, or a related quantitative field.Skills Required
Data Warehousing, Big Data, Cloud Computing