We are seeking a skilled Databricks Architect to design, implement, and optimize scalable data solutions within our cloud-based data platform. This role requires extensive knowledge of Databricks (Azure / AWS), data engineering, and a deep understanding of data architecture principles, with the ability to drive strategy, best practices, and hands-on implementation for high-performance data processing and analytics solutions.
Responsibilities :
- Solution Architecture :
- Design and architect end-to-end data solutions using Databricks and Azure / AWS, including data ingestion, processing, and storage.
- Delta Lake Implementation :
- Leverage Delta Lake and Lakehouse architecture to create robust, unified data structures that support advanced analytics and machine learning.
- Data Processing Development :
- Develop, design, and automate large-scale, high-performance data processing systems (batch and / or streaming) to drive business growth and enhance the product experience.
- Performance Tuning :
- Ensure optimal performance of data pipelines and workloads by implementing best practices for resource management, auto-scaling, and query optimization in Databricks.
- Engineering Best Practices :
- Advocate for high-quality software engineering practices in building scalable data infrastructure and pipelines.
- Architecture / Solution Development :
- Develop Architecture or solution for large data project using Databricks.
- Project Leadership :
- Lead data engineering projects to ensure pipelines are reliable, efficient, testable, and maintainable.
- Data Modeling :
- Design data models optimized for storage, retrieval, and critical product and business requirements.
- Logging Architecture :
- Understand and influence logging to support data flow, implementing logging best practices as needed.
- Standardization and Tooling :
- Contribute to shared data engineering tools and standards to boost productivity and quality for Data Engineers across the company.
- Collaboration :
- Work closely with leadership, engineers, program managers, and data scientists to understand and meet data needs.
- Partner Education :
- Use data engineering expertise to identify gaps and improve existing logging and processes for partners.
- Data Governance :
- Collaborate with stakeholders to build data lineage, data governance, and data cataloging using unity catalog.
- Agile Project Management :
- Lead projects using agile methodologies.
- Communication :
- Communicate effectively with stakeholders at all organizational levels.
- Team Development :
- Recruit, retain, and develop team members, preparing them for increased responsibilities and challenges.
Requirements :
10+ years of relevant industry experience.ETL Expertise :Skilled in custom ETL design, implementation, and maintenance.Data Modeling :Experience in developing and designing data models for reporting systems.Databricks Proficiency :Hands-on experience with Databricks SQL workloads.Data Ingestion :Expertise in data ingestion from offline files (e.g., CSV, TXT, JSON) along with API and DB, CDC data ingestion. Should have handled such projects in past.Pipeline Observability :Skilled in setting up robust observability for complete pipelines and Databricks in Azure / AWS.Database Knowledge :Proficient in relational databases and SQL query authoring.Programming and Frameworks :Experience with Java, Scala, Spark, PySpark, Python, and Databricks.Cloud Platforms :Cloud experience required (Azure / AWS preferred).Data Scale Handling :Experience working with large-scale data.Pipeline Design and Operations :Proven experience in designing, building, and operating robust data pipelines.Performance Monitoring :Skilled in deploying high-performance pipelines with reliable monitoring and logging.Cross-Team Collaboration :Able to work effectively across teams to establish overarching data architecture and provide team guidance.ETL Optimization :Ability to optimize ETL pipelines to reduce data transfer and storage costs.Auto Scaling :Skilled in using Databricks SQL s auto-scaling feature to adjust worker numbers based on workload.Tech Stack : Cloud Platform :Azure / AWS.Azure / AWS :Databricks SQL Serverless, Databricks SQL, Databricks workspaces, Databricks notebooks, Databricks job scheduling, Data Catalog.Data Architecture :Delta Lake, Lakehouse concepts.Data Processing :Spark Structured / Streaming.File Formats :CSV, Avro, Parquet.CI / CD :CI / CD for ETL pipelines.Governance Model :Databricks SQL unified governance model (Unity Catalog) across clouds, supporting open formats and APIs.Skills Required
Data Modeling, Spark, Databricks, Azure, Aws, Etl