Job Description : Software Data Engineer
As a Software Engineer specializing in Telecom Products, you will be a vital member of our R&D team, contributing to the development of cutting-edge telecommunications solutions. Your role will involve leveraging your extensive experience to design and implement robust software solutions for scalable telecom products and features
Position Summary
We are seeking a skilled Big Data Engineer with deep expertise in PySpark, Python, MySQL, Hadoop and Linux. The ideal candidate should have extensive experience deploying and managing these tools within on-premises environments (no public cloud). Proficiency in cluster management, performance optimization, and monitoring is essential for this role.
Key Responsibilities :
- Develop and implement data ingestion pipelines and ETL processes using PySpark and Python capable of managing and processing large volumes of data, ranging in the millions of records.
- Design and develop APIs using Python to facilitate data integration and processing.
- Deploy, configure, and maintain Hadoop ecosystem components including HDFS , YARN, Hive, and HBase in on-premises environments.
- Optimize the performance of Big Data applications and queries, ensuring efficient resource utilization.
- Manage and monitor cluster health , including tuning and troubleshooting performance issues.
- Ensure data security , integrity , and availability within the Big Data infrastructure.
- Collaborate with cross-functional teams to integrate data analytics solutions and support business requirements.
Required Skills and Qualifications :
Bachelor's degree in computer science, Engineering, or a related field;Proven experience 3+ years working with Big Data technologies such as Hadoop, PySpark, and MySQL in on-premises environments and handling large-scale data volumes, including datasets with millions of records, in on-premises environments.Strong programming skills in Python, with experience in developing APIs.Hands-on experience in Linux.Expertise in deploying, managing, and optimizing Big Data clusters in a local data center.Proficiency in monitoring and performance tuning of Hadoop and PySpark clusters.Ability to troubleshoot and resolve complex issues related to data processing and infrastructure.Excellent problem-solving skills and the ability to work independently as well as part of a team.Strong communication skills with the ability to convey technical concepts to non-technical stakeholders.Experience in dockersPreferred Skills :
Experience with Apache Airflow, Kafka, Apache Flink, and Apache NiFi for data orchestration and streaming.Certification in Hadoop or related Big Data technologies.Application Process
Interested candidates should submit their resume and a cover letter outlining their qualifications and experience in Big Data engineering and on-premises cluster management and include details of at least one relevant project involving the handling of large data volumes (in millions of records), you have worked on, including a brief description of the project, your role and the technologies used. This is mandatory requirement for our screening process .