We are looking for an experienced Data Scientist to join on a 6-month engagement. The ideal candidate will bring expertise in data engineering, time series forecasting, machine learning frameworks, and large-scale data processing. The candidate will collaborate with internal teams to design, develop, and deploy data science solutions for business-critical use cases.
Job Description :
- Develop, optimize, and maintain scalable data pipelines and workflows using Python, SQL, and PySpark.
- Manage and configure Databricks workspaces, clusters, and notebooks.
- Implement Delta Lake-based solutions for data versioning, ACID transactions, and time travel.
- Perform advanced time series forecasting, anomaly detection, and model development using MLflow.
- Apply strong feature engineering and machine learning techniques to real-world datasets.
- Ensure data quality and validation across different stages of the pipeline.
- Build reusable, modular, and configuration-driven code using OOP principles.
- Write unit tests, integration tests, and validations for robust pipeline execution
- Collaborate with cross-functional teams to identify use cases and deliver scalable solutions.
- Maintain clear communication with stakeholders through insights and technical presentations.
Mandatory Skills : Python, SQL, Git / GitHub, Spark / PySpark, Databricks, Delta Lake, ETL / ELT, YAML, MLflow, Feature Engineering, Time Series Analysis, Data Quality & Validation, OOP, Testing.
Required Skills : Docker, Monitoring & Logging, Problem-Solving, Collaboration, Communication, Ownership, Adaptability, Continuous Learning.