Responsibilities :
- Should be able to write code for the given scenario.
- Should have knowledge on Spark related queries
- MUST Have Flink coding / troubleshoot pipeline level knowledge
- Core Python (example : apply validation rules to csv file, String comparison, collections and Basic constructs)
- Spark Optimization and Spark Submit command.
- SQL MUST ( Example : join and Aggregate window Functions)
- Excellent Communication Skills
- Basics of Streaming pipeline
- basics of Spark Session, Streaming processing, and transformation of streaming data.
- Spark Streaming, kafka , hive etc. Spark or any Streaming technology
- Basics of Spark, hive etc.
- Overall Spark concepts about Sessions , context etc.
- Spark Streaming or Flink Streaming with Kafka project (hands on)
- Azure Cloud and services
Some of the set of queries might be asked for Evaluation :
Difference between Spark Streaming session Vs Batch SessionsSpark Structured Streamingspark.readStream() method used to read data in Spark Streaming ApplicationUse of writeStream method and what are different arguments supported by it to write data frame to provided sink, it has arguments like format, location, writeMode etc.Action need to call for reading data from kafka queue start()print output of streaming operation on Terminal Using format( console )Skills Required
Apache Flink, Apache Spark, Spark Streaming, Kafka