Description :
Key Responsibilities :
- Design, build, and maintain scalable ETL / ELT pipelines to process structured and unstructured data.
- Develop and optimize data workflows using PySpark, Python, and cloud data services.
- Implement data ingestion, cleansing, transformation, and validation processes.
- Work extensively on Azure cloud services (preferred), including Azure Databricks, Data Lake,
Data Factory, Synapse, etc.
Collaborate with cross-functional teams to understand data requirements and deliver efficient data models.Manage and optimize relational and non-relational databases such as SQL Server, PostgreSQL, and NoSQL stores.Ensure data quality, reliability, security, and compliance across all pipelines.Support CI / CD and version control processes for data engineering workflows.Troubleshoot and improve performance of data systems and pipelines.Document data processes, architecture, and best practices.Required Skills & Experience :
2+ years of experience in data engineering or related roles.Strong experience with PySpark and Python for data processing.Hands-on experience with Azure Cloud services (preferred).Experience with Azure Databricks for large-scale data transformations.Knowledge of any cloud platform : AWS or GCP.Proficiency in relational databases such as SQL Server, PostgreSQL, and familiarity withNoSQL systems.
Solid understanding of distributed computing, data warehousing concepts, and data pipeline design.Experience working with version control (Git) and CI / CD tool(ref : hirist.tech)