About the role
We are seeking Senior Data Engineers to join our data engineering team, who are passionate about data & analytics. The ideal fit for the role will have a strong background in handling large volumes of data with Apache Spark to build , enhance bespoke systems to harness data .The key focus of the position is to build or maintain systems to capture and store data on behalf of the business.
Key Responsibilities
- Design, develop, and maintain ETL processes and data pipelines using AWS Glue with Pyspark.
- Collaborate with data scientists, analysts, and other stakeholders to understand data requirements and deliver high-quality data solutions.
- Optimize and tune data pipelines for performance and scalability.
- Ensure data quality and integrity through robust testing and validation processes.
- Implement data governance and security best practices.
- Monitor and troubleshoot data pipelines to ensure continuous data flow and address any issues promptly.
- Stay up-to-date with the latest trends and technologies in data engineering.
Required Qualification
B.E / B.Tech, preferably in Computer Science Engineering with relevant work experience.7+ years of experience in handling data and designing ETL pipelines with mandatory 4+ years experience in writing Spark.Experience with AWS services such as Glue, Athena, S3, and Redshift is a plus.Good to have exposure to data modeling, data analytics, and design in both batch processing and real-time streaming.Solid understanding of data mapping, data processing patterns, distributed computing, and building applications for real-time and batch analyticsStrong programming skills in design and implementation using Python, PySpark, SQLGood exposure in database architecture with SQL.Experience with multiple file formats like Avro, Parquet, ORC, and JSON.Developing, constructing, testing, and maintaining architectures for data lakes, data pipelines, data warehouses, and large-scale data processing systems on Databricks.