Job Title : Senior Data Engineer
Experience : 5–8 years
Industry : Pharmaceutical / Biotechnology
Location : Bangalore
Employment Type : Full-Time
Overview
Our client – one of the largest data science companies – is seeking to hire a Senior Data Engineer with 5–8 years of hands-on experience in designing, building, and maintaining scalable, production-grade data platforms and pipelines. The ideal candidate will have a proven track record of delivering robust data solutions on cloud-native ecosystems, with strong client-facing engagement skills to gather requirements, present technical solutions, and drive stakeholder alignment. Demonstrated expertise in end-to-end data engineering lifecycle, cloud infrastructure, DevOps practices, and clear communication with business and technical audiences is essential. Experience in pharmaceutical, biotechnology, or medical device data environments is a strong advantage but not mandatory.
The ideal candidate will embrace the Company’s Decision Sciences Lifecycle and ways of working, acting as a trusted client-facing partner to enable long-term business, financial, and operational outcomes. They will design, implement, and support secure, performant data solutions using modern cloud tools, CI / CD pipelines, and a global delivery model. The role involves requirements elicitation, solution architecture presentation, pipeline development, performance optimization, and structured problem-solving in client engagements. Candidates must deliver high-quality, monitored, and well-documented solutions while leading offshore / nearshore team execution through mentoring, code reviews, and day-to-day delivery coordination.
Candidates must have hands-on, recent experience and strong proficiency in the following core data engineering domains :
Cloud-Native Data Pipeline Development & Orchestration
- Design and development of scalable ETL / ELT pipelines using Apache Spark (Databricks, Azure Synapse, AWS EMR), Apache Airflow, dbt, Prefect, or equivalent orchestration platforms
- Hands-on implementation of batch and streaming ingestion from diverse sources (databases, APIs, flat files, message queues, SaaS platforms, etc.)
- Advanced proficiency in Python (PySpark, pandas), SQL, and version-controlled pipeline development (Git) with CI / CD integration (GitHub Actions, Azure DevOps, Jenkins)
- Expertise with lakehouse formats (Delta Lake, Apache Iceberg, Hudi), schema evolution, partitioning, incremental processing, and performance tuning
- Implementation of data quality frameworks (Great Expectations, Monte Carlo, Deequ), lineage tracking, and automated testing
- Client-facing experience translating business requirements into technical designs and presenting pipeline architecture, runbooks, and monitoring dashboards
Cloud Infrastructure Management & Data Platform Engineering
Deep hands-on experience with at least one major cloud provider (AWS, Azure, or GCP) using core data services : S3 / ADLS Gen2 / GCS, Glue / Athena, Snowflake, Databricks, Synapse, Redshift / Serverless.Infrastructure-as-Code proficiency using Terraform, CloudFormation, or Azure ARM / Bicep for provisioning data lakes, warehouses, networking, IAM, encryption, and loggingImplementation of security and governance controls (VPC / peering, KMS encryption, access policies, audit trails) suitable for sensitive data environmentsSetup of observability, alerting, and cost governance using CloudWatch, Azure Monitor, Prometheus + Grafana, or equivalentContainerization (Docker) and orchestration (Kubernetes, ECS / EKS / AKS / GKE) for data services when requiredAbility to explain infrastructure decisions and cost implications to non-technical client stakeholdersAdditional competencies considered strong advantages
Experience handling pharma, biotech, or medical device data (clinical, RWE, safety, manufacturing, IoT / device telemetry)Exposure to Agentic AI or LLM-powered implementations in data engineering workflows (e.g., automated metadata enrichment, smart data cataloging, self-healing pipelines)LLM modeling and GenAI capability (e.g., prompt engineering for data transformation, RAG patterns for documentation, or GenAI-assisted code generation)Familiarity with Agile methodologies, JIRA, Confluence, and leading client-facing sprint ceremonies