Key Responsibilities
Data Pipeline Development & Optimization
- Collaborate with business and data stakeholders to gather data requirements and ensure compliance with data governance policies
- Develop and maintain CI / CD-enabled ETL pipelines using Databricks and AWS services such as S3, Glue, Lambda, and EMR
- Optimize data pipelines for performance, scalability, and cost-efficiency
- Troubleshoot pipeline failures and resolve bottlenecks to maintain data pipeline health
Documentation & Data Governance
Maintain documentation on data definitions, lineage, quality rules, and transformation logicEnsure data privacy and audit readiness by adhering to data governance standardsSupport data quality assessment, cleansing, and standardization initiativesValidation & Compliance
Assist in validation and compliance readiness, including development of IQ / OQ / PQ protocols and validation plansEnsure compliance with GxP, 21 CFR Part 11, and Annex 11 guidelinesConfigure and maintain audit trails, user access controls, and role-based privilegesChange Management & Support
Support change control activities following ITIL standards, ensuring traceability of system updates and patchesUse Jira and ServiceNow for managing project tasks, incidents, and service requestsAgile & Collaboration
Participate in Agile / SAFe ceremonies including PI Planning, backlog grooming, and sprint planningProvide story estimations and collaborate with scrum teams to deliver user storiesPractice Agile methodologies such as Kanban and Lean within Product Development TeamsCloud Integration & Automation
Automate and optimize data frameworks and development processes for cost efficiencyCollaborate on clinical system integrations and REST API development (MuleSoft, Python)Design secure, scalable, and cost-optimized AWS-based solutions using tools like AWS Cost ExplorerQualifications
Basic Qualifications
Bachelor's degree2 to 6 years of relevant experienceMust-Have Skills
Ability to analyze complex data and identify / remediate issuesScripting skills in PowerShell or PythonProficiency in Databricks and Data Lake technologiesExperience in RESTful API and MuleSoft integrationFamiliarity with clinical system integrationsStrong communication and stakeholder management skillsAgile methodology experience with Jira and ConfluenceUnderstanding of cost optimization using AWSGood-to-Have Skills
IS Security awarenessExperience with Agile project methodologiesFamiliarity with AWS, Azure, or GCP and IAM servicesKnowledge of IGA, RBAC, and incident response in identity managementProfessional Certifications
Microsoft, AWS, or GCP Cloud Certification (preferred)Security Certification (preferred)Microsoft Azure Certification (preferred)Skills Required
Databricks, Aws, S3, glue , Lambda, Emr