Role Overview
Architect- Data Engineering is responsible for overseeing the strategy, design, development, and management of data infrastructure and pipelines within an organization. This role involves a strong technical leadership and collaboration with other teams to ensure the efficient collection, storage, processing, and analysis of large datasets. The Architect typically Leads a team of data engineers, associate data architects, and analysts, ensuring that data workflows are scalable, reliable, and meet the business's requirements.
Responsibilities
- Lead the design, development, and maintenance of data pipelines and ETL processes architect and implement scalable data solutions using Databricks and AWS.
- Optimize data storage and retrieval systems using Rockset, Clickhouse, and CrateDB.
- Develop and maintain data APIs using FastAPI.
- Orchestrate and automate data workflows using Airflow.
- Collaborate with data scientists and analysts to support their data needs.
- Ensure data quality, security, and compliance across all data systems.
- Mentor junior data engineers and promote best practices in data engineering.
- Evaluate and implement new data technologies to improve the data infrastructure.
- Participate in cross-functional projects and provide technical leadership.
- Manage and optimize data storage solutions using AWS S3, implementing best practices for data lakes and data warehouses.
- Implement and manage Databricks Unity Catalog for centralized data governance and access control across the organization.
Qualifications
Bachelor's or master's degree in computer science, Engineering, or related field10+ years of experience in data engineering, with at least 6 years in a lead roleStrong proficiency in Python, PySpark, and SQLExtensive experience with Databricks and AWS cloud servicesHands-on experience with Airflow for workflow orchestrationFamiliarity with FastAPI for building high-performance APIsExperience with columnar databases like Rockset, Clickhouse, and CrateDBSolid understanding of data modeling, data warehousing, and ETL processesExperience with version control systems (e.g., Git) and CI / CD pipelinesExcellent problem-solving skills and ability to work in a fast-paced environmentStrong communication skills and ability to work effectively in cross-functional teamsKnowledge of data governance, security, and compliance best practicesProficiency in designing and implementing data lake architectures using AWS S3Experience with Databricks Unity Catalog or similar data governance and metadata management toolsPreferred Qualifications
Experience with real-time data processing and streaming technologiesFamiliarity with machine learning workflows and MLOpsCertifications in Databricks, AWSExperience implementing data mesh or data fabric architecturesKnowledge of data lineage and metadata management best practicesTech Stack
Databricks, Python, PySpark, SQL, Airflow, FastAPI, AWS (S3, IAM, ECR, Lambda), Rockset, Clickhouse, CrateDB
We Offer
Opportunity to work on business challenges from top global clientele with high impact.Vast opportunities for self-development, including online university access and sponsored certifications.Sponsored Tech Talks, industry events & seminars to foster innovation and learning.Generous benefits package including health insurance, retirement benefits, flexible work hours, and more.Supportive work environment with forums to explore passions beyond work.This role presents an exciting opportunity for a motivated individual to contribute to the development of cutting-edge solutions while advancing their career in a dynamic and collaborative environment.
(ref : hirist.tech)
Skills Required
Airflow, Git, Pyspark, Databricks, FastAPI, Sql, Python, Aws