Proven experience in SQL, Spark, Hadoop ecosystem
- Have worked on multiple TBs of data volume from ingestion to consumption.
- Work with business stakeholders to identify and document high impact business problems and potential solutions.
- Good understanding of Data Lake / Lakehouse architecture and experience / exposure to Hadoop (cloudera, hortonworks) and / or AWS.
- Work on end-to-end data lifecycle from Data Ingestion, Data Transformation and Data Consumption layer. Versed with API and its usability.
- Suitable candidate will also be proficient in Spark, Spark Streaming, hive, SQLs.
- Suitable candidate will also demonstrate experience with big data infrastructure inclusive of MapReduce, Hive, HDFS, YARN, HBase, Oozie, etc.
- The candidate will additionally demonstrate substantial experience and a deep knowledge of relational databases.
- Good skills in technical debugging of the code in case of issues. Also, working with git for code versioning
- Creating Technical Design Documentation of the projects / pipelines
(ref : hirist.tech)