Evaluate domain, financial and technical feasibility of solution ideas with help of all key stakeholders
Design, develop, and maintain highly scalable data processing applications
Write efficient, reusable and well documented code
Deliver big data projects using Spark, Scala , Python, SQL
Maintain and tune existing Spark applications to the fullest.
Find opportunities for optimizing existing spark applications.
Work closely with QA, Operations and various teams to deliver error free software on time
Actively lead / participate daily agile / scrum meetings
Take responsibility for Apache Spark development and implementation
Translate complex technical and functional requirements into detailed designs
Investigate alternatives for data storing and processing to ensure implementation of the most streamlined solutions
Serve as a mentor for junior staff members by conducting technical training sessions and reviewing project outputs
Qualifications :
Engineering graduates in computer science backgrounds preferred with 8+ years of software development experience with Hadoop framework components(HDFS, Spark, Spark, Scala, PySpark)
Excellent at verbal, written and presentation skills
Ability to present and defend a solution with technical facts & business proficiency
Understanding of data-warehousing and data-modeling techniques
Strong data engineering skills
Knowledge of Core Java, Linux, SQL, and any scripting language
At least 6+ years of experience using Python / Scala, Spark, SQL
Knowledge of shell scripting is a plus
Knowledge of Core and Advance Java is a plus.
Experience in developing and tuning spark applications
Excellent understanding of spark architecture, data frames and tuning spark
Strong knowledge of database concepts, systems architecture, and data structures is a must Process oriented with strong analytical and problem solving skills
Experience in writing Python standalone applications dealing with PySpark API