Databricks SQL Engineer (Pharma Domain) - Contract (6 Months)
Location : Mumbai / Pune / Bangalore / Chennai / Ahmedabad / Noida, India (Chennai Most Preferred)
Experience : 5-10 Years
Shift Timings : Day shift IST, starting from 12 : 00 PM IST
Work Mode : Contract (Fixed Term Contract)
Duration : 6 Months
Job Summary :
We are seeking an experienced and highly skilled Databricks SQL Engineer with a strong understanding of the Pharmaceutical or Life Sciences domain to join our offshore Data Engineering team on a 6-month contract. In this role, you will be pivotal in designing, building, and optimizing efficient, scalable SQL-based data models and pipelines within the Databricks ecosystem, leveraging Databricks SQL, Spark SQL, and Delta Lake. You will be responsible for transforming raw data into valuable analytical insights, directly supporting critical decision-making processes across various pharma-related business functions. Your domain expertise will be crucial in understanding and addressing the specific data challenges and requirements of the pharmaceutical industry.
Responsibilities :
- Data Modeling & Schema Design : Design and implement efficient and scalable data models within Databricks SQL and Delta Lake, optimized for analytical workloads specific to the pharmaceutical domain (e.g., clinical trial data, drug development data, sales & marketing data, supply chain data, regulatory data).
- ETL / ELT Pipeline Development : Develop and maintain robust ETL / ELT pipelines using Databricks SQL and Spark SQL to ingest, transform, cleanse, and load data from various source systems relevant to the pharma industry (e.g., transactional databases, data lakes, external data providers, APIs).
- Performance Optimization : Optimize Databricks SQL queries and Spark SQL jobs for performance and scalability to ensure efficient data processing and query execution on large datasets.
- Data Quality & Governance : Implement data quality checks and validation rules within the data pipelines to ensure the accuracy, consistency, and reliability of data, adhering to pharma-specific data governance standards.
- Delta Lake Implementation & Management : Leverage the features of Delta Lake for data reliability, versioning, and ACID properties within the Databricks environment.
- Collaboration with Data Scientists & Analysts : Collaborate closely with Data Scientists and Business Analysts to understand their analytical requirements and translate them into efficient SQL-based data solutions.
- Pharma Domain Expertise : Apply your understanding of pharmaceutical or life sciences business processes, data nuances, and regulatory requirements to design relevant and insightful data models and pipelines.
- Documentation : Create and maintain clear and comprehensive technical documentation for data models, ETL / ELT pipelines, and data transformations.
- Troubleshooting & Support : Identify, troubleshoot, and resolve data-related issues and performance bottlenecks within the Databricks environment.
- Adherence to Best Practices : Follow data engineering best practices for coding standards, data governance, security, and performance.
- Integration with Downstream Systems : Ensure seamless integration of processed data with downstream analytical tools and reporting platforms used within the pharma organization.
Technical Skills & Qualifications :
5-10 years of relevant experience in Data Engineering.Strong proficiency in Databricks SQL and Spark SQL for data manipulation, transformation, and querying.Hands-on experience with Delta Lake and its features (ACID transactions, time travel, schema evolution).Solid understanding of data warehousing concepts, dimensional modeling, and schema design principles.Experience in building and optimizing ETL / ELT pipelines for large-scale data processing.Strong SQL skills with experience in writing complex queries and performance tuning.Proven experience working with large datasets and distributed computing environments (Spark).Understanding of data quality principles and implementation of data validation processes.Excellent analytical and problem-solving skills.Strong communication (written and verbal) and collaboration skills.Willingness to work in the Day shift IST, starting from 12 : 00 PM IST.Preferred Skills :
Strong understanding of the Pharmaceutical or Life Sciences domain, including familiarity with relevant data types (clinical trial data, drug discovery data, sales data, supply chain data).Experience with other data engineering tools and technologies within the Azure Databricks ecosystem (Databricks Workflows, Databricks Delta Live Tables).Experience with data integration tools beyond SQL.Familiarity with cloud data warehousing solutions (Azure Synapse Analytics, Snowflake).Experience with scripting languages like Python for data engineering tasks.Knowledge of data governance frameworks and tools.Experience with visualization tools (Power BI, Tableau) and their integration with Databricks.Understanding of data security and compliance requirements within the pharmaceutical industry.ref : hirist.tech)