Job descriptionJob Title: Data Engineer / Senior Data Engineer Location: Bangalore Experience: 5+ years Job Type: (Hybrid, Fulltime) Immediate joiners or notice period of less than 10 days are needed Purpose: As a Data Engineer at LogixHealth, you will work with a globally distributed team of engineers to design and build cutting edge solutions that directly improve the healthcare industry. You'll contribute to our fast-paced, collaborative environment and bring your expertise to continue delivering innovative technology solutions, while mentoring others. Duties and Responsibilities: - Contribute to the creation of a self-service data platform for reporting and analytics - Design and build data solutions using Databricks, SQL, Python, Spark, and Delta Lake in the Azure ecosystem (Blob Storage, Data Factory, Event Hubs) - Adhere to best practices of ETL / ELT processes (data quality management, data processing, data partitioning, maintainability and reusability) - Collaborate with engineers, product, and business leaders to ensure data platform is integrated with other systems and technologies (Tableau, Power BI, APIs, custom applications) - Establish CI/CD processes, test frameworks, infrastructure-as-code tools, and monitoring/alerting (Git, Terraform, Azure DevOps / GitHub Actions / Jenkins, Azure Monitor / Datadog) - Adhere to the Code of Conduct and be familiar with all compliance policies and procedures stored in LogixGarden relevant to this position Qualifications: To perform this job successfully, an individual must be able to perform each duty satisfactorily. The requirements listed below are representative of the knowledge, skills, and/or ability required. Reasonable accommodation may be made to enable individuals with disabilities perform the duties. Education (Degrees, Certificates, Licenses, Etc.): BS (or higher, MS / PhD) degree in Computer Science / related field, or equivalent technical experience. Experience: - 5+ years of strong hands-on experience in Apache Spark and Databricks, building scalable data pipelines and distributed data processing systems in cloud environments - Deep expertise in Databricks ecosystem including: - Delta Lake - Delta Live Tables (DLT) - Unity Catalog - Workflow orchestration (Jobs) - Strong programming experience in PySpark / Spark (Python or Scala preferred) for large-scale data engineering workflows - Proven experience designing high-performance Spark jobs, optimization techniques (partitioning, caching, AQE, joins, skew handling) - Experience integrating Databricks with (good to have): - Azure Data Factory - Event Hubs / streaming pipelines - External orchestration tools like Airflow - Working knowledge of cloud data platforms (Azure preferred) including Blob Storage, NoSQL DB's - Experience with relational databases (MS SQL, PostgreSQL, MySQL) is good to have. - Exposure to data governance, security, and compliance (Unity Catalog, RBAC, data lineage) Core Skills (Needed): Expert-level Spark (PySpark/Scala): - DataFrames, Spark SQL, Structured Streaming - Performance tuning & debugging - Handling large-scale datasets (TB+ scale) Databricks Expertise: - Notebooks, Jobs, Workflows - Delta Lake (ACID, schema evolution, optimization) - Delta Live Tables (pipeline design & orchestration) - Unity Catalog (data governance, access control) Data Engineering on Databricks: - Batch + streaming pipelines - Medallion architecture (Bronze/Silver/Gold) - Incremental processing & CDC patterns