Job Description : Software Data Engineer
As a Software Engineer specializing in Telecom Products, you will be a vital member of our R&D team, contributing to the development of cutting-edge telecommunications solutions. Your role will involve leveraging your extensive experience to design and implement robust software solutions for scalable telecom products and features
Position Summary
We are seeking a skilled Big Data Engineer with deep expertise in PySpark, Python, MySQL, Hadoop and Linux. The ideal candidate should have extensive experience deploying and managing these tools within on-premises environments (no public cloud). Proficiency in cluster management, performance optimization, and monitoring is essential for this role.
Key Responsibilities :
Develop and implement data ingestion pipelines and ETL processes using PySpark and Python capable of managing and processing large volumes of data, ranging in the millions of records.
Design and develop APIs using Python to facilitate data integration and processing.
Deploy, configure, and maintain Hadoop ecosystem components including HDFS , YARN, Hive, and HBase in on-premises environments.
Optimize the performance of Big Data applications and queries, ensuring efficient resource utilization.
Manage and monitor cluster health , including tuning and troubleshooting performance issues.
Ensure data security , integrity , and availability within the Big Data infrastructure.
Collaborate with cross-functional teams to integrate data analytics solutions and support business requirements.
Required Skills and Qualifications :
Proven experience 3+ years working with Big Data technologies such as Hadoop, PySpark, and MySQL in on-premises environments and handling large-scale data volumes, including datasets with millions of records, in on-premises environments.
Strong programming skills in Python, with experience in developing APIs.
Hands-on experience in Linux.
Expertise in deploying, managing, and optimizing Big Data clusters in a local data center.
Proficiency in monitoring and performance tuning of Hadoop and PySpark clusters.
Ability to troubleshoot and resolve complex issues related to data processing and infrastructure.
Excellent problem-solving skills and the ability to work independently as well as part of a team.
Strong communication skills with the ability to convey technical concepts to non-technical stakeholders.
Experience in dockers
Preferred Skills :
Experience with Apache Airflow, Kafka, Apache Flink, and Apache NiFi for data orchestration and streaming.
Certification in Hadoop or related Big Data technologies.
Application Process
Interested candidates should submit their resume and a cover letter outlining their qualifications and experience in Big Data engineering and on-premises cluster management and include details of at least one relevant project involving the handling of large data volumes (in millions of records), you have worked on, including a brief description of the project, your role and the technologies used. This is mandatory requirement for our screening process .
Python Developer • Mumbai, Maharashtra, India