We are looking for a PySpark developer ( with ETL background ) to be able to design and build solution on one of our customer programs. This to build a data standardized and curation layer that will integrate data across internal and external sources, provide analytical insights and integrate with customer’s critical systems.
Roles and Responsibilities
- Ability to design, build and unit test the application in Spark/Pyspark
- In-depth knowledge of Hadoop, Spark, and similar frameworks
- Ability to understand existing ETL logic to convert into Spark/PySpark
- Good implementation experience of oops concepts
- Knowledge of Unix shell scripting, RDBMS, Hive, HDFS File System, HDFS File Types, HDFS compression codec
- Experience in processing large amounts of structured and unstructured data, including integrating data from multiple sources
- Experience in working with Bitbucket and CI-CD process
- Have knowledge of the agile methodology for delivering the projects
- Good communication skills
Skills
- Minimum 2 years of extensive experience in design, build and deployment of PySpark-based applications
- Expertise in handling complex large-scale Big Data environments
- Minimum 2 years of experience in the following: HIVE, YARN, HDFS
- Experience in working in ETL products e.g. Ab Initio, Informatica, Data Stage etc.
- Hands-on experience writing complex SQL queries, exporting, and importing large amounts of data using utilities
Experience: 6 to 10 Years (EARLY JOINERS ONLY)
Location: Pune OR Chennai OR Hyderabad
Note: We can even consider someone with good hands-on experience in Spark and Scala.