We are looking for a PySpark developer ( with ETL background ) to be able to design and build solution on one of our customer programs. This to build a data standardized and curation layer that will integrate data across internal and external sources, provide analytical insights and integrate with customer’s critical systems.
Roles and Responsibilities
- Ability to design, build and unit test the application in Spark / Pyspark
- In-depth knowledge of Hadoop, Spark, and similar frameworks
- Ability to understand existing ETL logic to convert into Spark / PySpark
- Good implementation experience of oops concepts
- Knowledge of Unix shell scripting, RDBMS, Hive, HDFS File System, HDFS File Types, HDFS compression codec
- Experience in processing large amounts of structured and unstructured data, including integrating data from multiple sources
- Experience in working with Bitbucket and CI-CD process
- Have knowledge of the agile methodology for delivering the projects
- Good communication skills
Skills
Minimum 2 years of extensive experience in design, build and deployment of PySpark-based applicationsExpertise in handling complex large-scale Big Data environmentsMinimum 2 years of experience in the following : HIVE, YARN, HDFSExperience in working in ETL products e.g. Ab Initio, Informatica, Data Stage etc.Hands-on experience writing complex SQL queries, exporting, and importing large amounts of data using utilitiesExperience : 6 to 10 Years (EARLY JOINERS ONLY)
Location : Pune OR Chennai OR Hyderabad
Note : We can even consider someone with good hands-on experience in Spark and Scala.