We are seeking a talented and experienced Big Data Engineer to join our growing team in Pune. The ideal candidate will have a strong passion for data and a proven track record of designing, building, and maintaining scalable and efficient big data pipelines and platforms. You will be responsible for leveraging cutting-edge technologies to process, transform, and analyze large datasets, enabling data-driven insights for our business.
Responsibilities :
- Design, develop, and maintain robust and scalable big data pipelines using Python and PySpark.
- Implement data ingestion, transformation, and loading processes from various sources into our data lake and data warehouse.
- Work extensively with Hadoop, Apache Spark, and Databricks environments to process and analyze large volumes of data.
- Utilize Delta Tables, JSON, and Parquet file formats effectively for efficient data storage and retrieval.
- Develop and optimize data processing jobs for performance and cost efficiency.
- Collaborate with data scientists, analysts, and other engineers to understand data requirements and deliver appropriate solutions.
- Troubleshoot and resolve issues related to data pipelines, data quality, and platform performance.
- Contribute to the continuous improvement of our big data architecture and best practices.
- Stay updated with the latest big data technologies and trends.
Skills and Qualifications :
4+ years of hands-on experience as a Big Data Engineer.Must be proficient with Python and PySpark for data manipulation, scripting, and pipeline development.In-depth knowledge of Hadoop and Apache Spark ecosystems.Extensive experience working with Databricks, including notebook development, job orchestration, and Delta Lake management.Must have extensive experience with Delta Tables, including understanding their features, optimizations, and practical application.Strong practical experience with JSON and Parquet file formats, including their structure, advantages, and usage in big data environments.Good to have experience with AWS data analytics services such as Athena, Glue, Redshift, and EMR.Familiarity with Data Warehousing concepts and principles will be a significant plus.Must have knowledge of NoSQL (e.g., MongoDB, Cassandra) and RDBMS databases (e.g., MySQL, PostgreSQL), understanding their strengths, weaknesses, and appropriate use cases.Ability to solve complex data processing and transformation-related problems efficiently and effectively.Good communication skills, with the ability to articulate technical concepts clearly to both technical and non-technical stakeholders.Strong analytical and problem-solving abilities.ref : hirist.tech)