Role Overview :
We are seeking a Senior Python Data Engineer (SE4) who can take ownership of data platform design, architecture, and AI integration strategies.
This role goes beyond hands-on engineeringit requires leading projects, mentoring engineers, and shaping scalable data + AI ecosystems.
You will drive the integration of LLMs into enterprise-grade data pipelines, orchestrate large-scale data processing solutions, and ensure enterprise data governance while working with cross-functional Responsibilities :
- Lead the design and implementation of enterprise-grade distributed data platforms using PySpark and SparkSQL.
- Architect and optimize scalable ETL / ELT pipelines across batch and streaming environments.
- Drive LLM-powered solution design using LangChain, including Agents, Toolkits, Document Loaders, and SparkSQLToolkit.
- Integrate vector databases and design RAG pipelines for NLP and generative AI applications.
- Define standards for data governance, security, lineage, and compliance across cloud platforms.
- Collaborate with architects, data scientists, and business stakeholders to translate business needs into technical solutions.
- Mentor SE2 / SE3 engineers, providing technical leadership and code reviews.
- Own performance, scalability, and cost optimization on Databricks, AWS EMR, and MLflow Skills :
- Strong programming expertise in Python with advanced knowledge of data engineering design patterns.
- Deep experience with PySpark, SparkSQL, and distributed data processing.
- Hands-on experience in architecting and deploying data solutions on Databricks, AWS EMR, or Azure Data platforms.
- Strong understanding of LangChain and LLM integration frameworks.
- Expertise in SQL, advanced data modeling, and distributed databases.
- Proven ability to lead teams, own projects, and deliver end-to-end Skills :
- Experience with Google ADK Prompt Engineering and advanced LLM fine-tuning.
- Familiarity with multi-cloud ecosystems (AWS, Azure, GCP).
- Knowledge of MLOps practices, MLflow, Unity Catalog, and feature stores.
- Exposure to data security, compliance frameworks, and metadata management
(ref : hirist.tech)