Job Description
We are looking for a Python & PySpark developer and data engineer who can design and build solutions for one of our Fortune 500 Client programs in the realm of Financial Master & Reference Data Management. This is high visibility, fast-paced key initiative will integrate data across internal and external sources, provide analytical insights, and integrate with the customer’s critical systems.
Key Responsibilities
Ability to design, build and unit test applications on Spark framework on Python.
Build Python and PySpark based applications based on data in both Relational (Oracle, SQL Server) and NoSQL database e.g. Redis, Valkey
Build ingestion pipelines to load data into in-memory caching database.
Build data pipelines to load and export data to / from the Master Data Management or Reference Data Management Platform using PySpark
Hands on experience doing at least one MDM implementation using TIBCO, Informatica, GoldenSource or custom-built MDM / RDM platforms
Experience with designing / building Data APIs and its interaction with data consumers
Optimize performance of the built Spark applications using configurations around Spark Context, Spark-SQL and Data Frame.
Build Python programs using module libs e.g. pandas, requests, json, flask, pickle
Experience in processing large amounts of structured data, including integrating data from multiple sources.
Ability to design & build real-time applications using REST API, JSON, XML
Ability to build solutions on AWS services using Glue ETL, Lambda functions on Python
Create and maintain integration and regression testing framework on Jenkins integrated with BitBucket and / or GIT repositories
Participate in the agile development process, and document and communicate issues and bugs relative to data standards in scrum meetings
Work collaboratively with onsite and offshore team.
Develop & review technical documentation for artifacts delivered.
Ability to solve complex data-driven scenarios and triage towards defects and production issues
Participate in code release and production deployment.
Challenge and inspire team members to achieve business results in a fast paced and quickly changing environment
Requirements
BE / B.Tech / B.Sc. in Computer Science / Statistics, Econometrics from an accredited college or university.
Minimum 4 years of extensive experience in design, build and deployment of PySpark-based applications.
Hands-on experience in Redis, Valkey, OpenSearch database
In-depth knowledge of Python core programming language – lists, dictionaries, tuples
Expertise on Python libraries e.g. pandas, requests, json, flask, pickle
Good understanding on Informatica PowerCenter for building data pipelines
Understanding of Master Data Management processes & domain is preferred.
Hands-on experience writing complex SQL queries, exporting, and importing large amounts of data using utilities.
Ability to build abstracted, modularized reusable code components.
Hands-on experience in generating / parsing XML, JSON documents, and REST API request / responses
Hands-on experience in Redis, Valkey, OpenSearch database
Able to quickly adapt and learn.
Able to jump into an ambiguous situation and take the lead on resolution.
Able to communicate and coordinate across various teams.
Are comfortable tackling new challenges and new ways of working
Are ready to move from traditional methods and adapt into agile ones
Comfortable challenging your peers and leadership team.
Can prove yourself quickly and decisively.
Excellent communication skills and Good Customer Centricity.
Strong Target & High Solution Orientation.
Data Engineer • Delhi, India