Lead, manage, and mentor a high-performing team of data engineersDesign, develop, and implement data pipelines, ETL processes, and data integration solutionsTake ownership of data pipeline projects from inception to deployment, manage scope, timelines, and risksDevelop and maintain data models for biopharma scientific data, data dictionaries, and other documentation to ensure data accuracy and consistencyOptimize large datasets for query performanceCollaborate with global multi-functional teams including research scientists to understand data requirements and design solutions that meet business needsImplement data security and privacy measures to protect sensitive dataLeverage cloud platforms (AWS preferred) to build scalable and efficient data solutionsCollaborate with Data Architects, Business SMEs, Software Engineers and Data Scientists to design and develop end-to-end data pipelines to meet fast paced business needs across geographic regionsIdentify and resolve data-related challengesAdhere to best practices for coding, testing, and designing reusable code / componentExplore new tools and technologies that will help to improve ETL platform performanceParticipate in sprint planning meetings and provide estimations on technical implementationBasic Qualifications :
- Doctorate Degree OR
- Masters degree with 4 - 6 years of experience in Computer Science, IT, Computational Chemistry, Computational Biology / Bioinformatics or related field OR
- Bachelors degree with 6 - 8 years of experience in Computer Science, IT, Computational Chemistry, Computational Biology / Bioinformatics or related field OR
- Diploma with 10 - 12 years of experience in Computer Science, IT, Computational Chemistry, Computational Biology / Bioinformatics or related field
Preferred Qualifications :
- 3+ years of experience in implementing and supporting biopharma scientific research data analytics (software platforms)
Functional Skills : Must-Have Skills :
- Proficiency in SQL and Python for data engineering, test automation frameworks (pytest), and scripting tasks
- Hands on experience with big data technologies and platforms, such as Databricks, Apache Spark (PySpark, SparkSQL), workflow orchestration, performance tuning on big data processing
- Excellent problem-solving skills and the ability to work with large, complex datasets
- Able to engage with business collaborators and mentor team to develop data pipelines and data models
Good-to-Have Skills :
- A passion for tackling complex challenges in drug discovery with technology and data
- Good understanding of data modeling, data warehousing, and data integration concepts
- Good experience using RDBMS (e.g. Oracle, MySQL, SQL server, PostgreSQL)
- Knowledge of cloud data platforms (AWS preferred)
- Experience with data visualization tools (e.g. Dash, Plotly, Spotfire)
- Experience with diagramming and collaboration tools such as Miro, Lucidchart or similar tools for process mapping and brainstorming
- Experience writing and maintaining technical documentation in Confluence
- Understanding of data governance frameworks, tools, and best practices
Professional Certifications :
- Databricks Certified Data Engineer Professional preferred
Soft Skills :
- Excellent critical-thinking and problem-solving skills
- Good communication and collaboration skills
- Demonstrated awareness of how to function in a team setting
- Demonstrated presentation skills
Skills Required
data engineering , Data Modeling, Databricks, Sql, Python, Aws