Roles & Responsibilities :
- Architect and maintain robust, scalable data pipelines using Databricks, Spark, and Delta Lake for both batch and real-time data processing.
- Lead technology evaluation and adoption initiatives to improve productivity, scalability, and data delivery.
- Optimize data processing performance through Spark tuning, job scheduling, and efficient resource utilization.
- Develop innovative solutions to enhance data ingestion, transformation, lineage tracking, and observability.
- Build metadata-driven frameworks to promote pipeline consistency and reuse.
- Promote a culture of engineering excellence, continuous improvement, and experimentation.
- Collaborate with architecture, platform, governance, and analytics teams to support the enterprise data strategy.
- Define and monitor SLOs, KPIs, and data quality metrics for production systems.
- Translate business requirements into scalable, governed data products in partnership with stakeholders.
- Mentor and guide engineers to adopt modern engineering tools and practices.
- Work closely with DevOps, architects, and analysts to ensure alignment of engineering strategies with business objectives.
- Stay current on data technology trends and best practices to continually enhance the data platform architecture.
Must-Have Skills :
Strong hands-on experience with Databricks, PySpark, SparkSQL, Apache Spark, AWS, Python, and SQL.Deep understanding of workflow orchestration, job performance tuning, and big data processing.Proficient with AWS services relevant to data engineering.Knowledge of enterprise-wide data architecture patterns such as Data Fabric or Data Mesh.Demonstrated ability to learn and apply new technologies quickly.Strong problem-solving, analytical, and teamwork skills.Experience with Scaled Agile Framework (SAFe), Agile delivery, and DevOps practices.Good-to-Have Skills :
Industry expertise in biotech or pharmaceutical sectors.Experience writing APIs to enable data access for consumers.Familiarity with SQL / NoSQL databases and vector databases for LLM use cases.Experience with OLAP and OLTP data modeling and performance tuning.Exposure to software engineering best practices, including Git, CI / CD pipelines (e.g., Jenkins, Maven), and DevOps automation.Education & Certifications :
12 to 17 years of experience in Computer Science, Information Technology, or related field.AWS Certified Data Engineer (preferred)Databricks Certification (preferred)SAFe Certification (preferred)Soft Skills :
Excellent analytical and troubleshooting capabilities.Strong written and verbal communication skills.Able to work effectively in global, distributed teams.Highly self-motivated and proactive.Capable of managing multiple priorities simultaneously.Strong team player with a focus on collaboration and shared success.Quick learner with strong organizational and presentation skills.Skills Required
Databricks, Pyspark, Apache Spark, Sparksql, Aws, Python