Data Engineer – Databricks + GCP (5+ yrs )
Remote
Contract
Skills : PySpark, Databricks, Delta Lake, SQL
- Build ETL pipelines
- Manage Delta Lake
- Data integrations
- Improve performance
Key Responsibilities
Design, build, and maintain robust ETL / ELT pipelines using PySpark on Databricks .Manage and administer Delta Lake tables — including schema evolution, partitioning, ACID-compliant data storage, versioning, and data retention.Ingest data from diverse sources (relational databases, flat files, APIs, streaming systems) — structured, semi-structured, and unstructured — and transform them into analytics-ready datasets.Implement data integrations across systems and platforms — including cloud storage (e.g., AWS S3 / Azure Data Lake), data warehouses / lakehouses, and data catalogs.Optimize data-processing workflows for performance and cost — tuning Spark jobs, optimizing partitioning, caching, query patterns, and resource utilization.Build pipelines for both batch and streaming data workflows; implement data quality checks, validation, error handling, logging, and monitoring.Collaborate with data analysts, data scientists, and business stakeholders to translate analytical requirements into scalable data solutions.Maintain metadata, documentation, version control; work with DevOps / CI-CD pipelines for deployment and orchestration of workflows (e.g., via Databricks Workflows, Git, Airflow, etc.)