Job Purpose :
As a Data Engineer - ETL / Spark, you will be responsible for designing, developing, and maintaining data integration workflows and pipelines.
You will work with cross-functional teams to enable seamless data movement, transformation, and storage, ensuring data is accurate, reliable, and optimized for analytics and business intelligence use Responsibilities :
ETL / ELT Development & Maintenance :
- Design, develop, and maintain efficient ETL workflows using modern tools and frameworks.
- Optimize and automate data pipelines to ensure high performance and scalability.
- Monitor and troubleshoot ETL processes to ensure reliability and data quality.
Big Data Engineering with Spark :
Develop and optimize Spark-based applications for large-scale data processing.Implement distributed data processing workflows leveraging Hadoop, Spark, or other big data ecosystems.Ensure Spark jobs are performance-tuned and cost-optimized.Data Management & Integration :
Work with structured and unstructured data from multiple sources, ensuring proper cleansing, transformation, and loading.Integrate data from APIs, databases, cloud services, and third-party platforms.Ensure compliance with data governance, security, and privacy standards.Collaboration & Business Enablement :
Partner with data analysts, data scientists, and business stakeholders to understand requirements and deliver data solutions.Support data-driven initiatives by providing reliable and timely datasets.Document processes, data flows, and technical specifications for reference and knowledge :Bachelors / Masters degree in Computer Science, Information Technology, or related field.4- 8 years of hands-on experience in ETL development and data engineering.Strong expertise in Apache Spark (PySpark / Scala / Java) for distributed data processing.Proficiency with ETL tools (e.g., Informatica, Talend, Databricks, AWS Glue, or similar).Strong knowledge of SQL and experience with relational databases (e.g., Oracle, SQL Server, PostgreSQL, MySQL).Experience with cloud platforms (AWS, Azure, or GCP) and cloud-native data services.Familiarity with data warehousing concepts and modern architectures (Snowflake, Redshift, BigQuery).Understanding of data governance, data quality, and security best practices.Excellent problem-solving, communication, and collaboration skills.Good to Have :
Experience with streaming platforms (Kafka, Kinesis, Spark Streaming).Knowledge of containerization and orchestration (Docker, Kubernetes, Airflow).Exposure to machine learning pipelines and data science collaboration.Certifications in AWS / Azure / GCP data services(ref : hirist.tech)