Job Description : Responsibilities :
- Design, execution, and management of a large and complex distributed data system.
- Monitoring of performance and optimizing existing projects.
- Researching and integrating any Big Data tools and frameworks required to provide the requested capabilities.
- Understanding Business / Data requirements and implementing scalable solutions.
- Creating reusable components and data tools that help all the teams in the company to integrate with our data platform.
Requirements :
1- 4 years of experience in big data technologies (Apache Hadoop) and relational databases (MS SQL Proficiency in at least one of the following programming languages : Java, Python, or Scala.Expertise in SQL Proficiency in Apache Spark.Hands-on knowledge of working with Data Frames, Data Sets, RDDs, Spark SQL / PySpark / Scala APIs, with a deep understanding of Performance Optimizations.Good Understanding of Distributed Storage (HDFS / S3).Strong analytical / quantitative skills and comfortable working with very large sets of data.Experience with the integration of data across multiple data sources.Good understanding of distributed computing principles.It would be good if you have the following skills :
Experience with Message Queues (e. g., Apache Kafka).Experience with MPP systems (e. g., Redshift / Snowflake).Experience with NoSQL storage (e. g., MongoDB).(ref : hirist.tech)