Lead the design, development, and implementation of high-performance, scalable, and resilient data pipelines for both batch and real-time (stream) data processing.
Utilize your strong expertise in Apache Spark with Python to build efficient data transformation and processing jobs.
Ensure data pipeline solutions are fault-tolerant and reliable, guaranteeing data quality and integrity throughout the data lifecycle.
Work extensively with cloud-native data services, specifically demonstrating a minimum of 3+ years of hands-on experience with AWS services such as S3, DMS, Redshift, Glue, Lambda, Kinesis, MSK, or equivalent services from Azure / GCP.
Develop and optimize complex SQL queries and work with various NoSQL technologies to manage and retrieve data efficiently.
Collaborate with cross-functional teams, including data scientists, analysts, and other engineers, to understand data requirements and deliver impactful data solutions.
Participate in data modeling, schema design, and data governance initiatives.
Monitor, troubleshoot, and optimize existing data pipelines and infrastructure for performance and cost efficiency.
Contribute to best practices for data engineering, ensuring maintainability, scalability, and security of our data platforms.
Leverage your experience across multiple domains to adapt and apply best practices to diverse data challenges.
What We're Looking For :
Minimum of 6 to 10 years of progressive experience building data solutions within Big Data environments.
A strong ability to build robust, resilient, scalable, fault-tolerant, and reliable data pipelines.
Mandatory hands-on experience with Apache Spark using Python for both batch and stream data processing.
Solid knowledge and practical experience in both batch and stream data processing methodologies.
Demonstrated exposure to working on data projects across multiple domains.
Strong hands-on capabilities with both SQL and NoSQL technologies.
Must have a minimum of 3+ years of hands-on experience with AWS services like S3, DMS, Redshift, Glue, Lambda, Kinesis, MSK, or similar data-focused services from Azure / GCP.
Excellent problem-solving, analytical, and debugging skills.
Ability to work independently and collaboratively in a remote team environment.