Job Description
We are seeking a highly experienced and hands-on Lead / Senior Data Engineer to architect, develop, and optimize data solutions in a cloud-native environment. The ideal candidate will have 7–12 years of strong technical expertise in AWS Glue, PySpark, and Python , along with experience designing robust data pipelines and frameworks for large-scale enterprise systems. Prior exposure to the financial domain or regulated environments is a strong advantage.
Key Responsibilities :
Solution Architecture : Design scalable and secure data pipelines using AWS Glue, PySpark, and related AWS services (EMR, S3, Lambda, etc.)
Leadership & Mentorship : Guide junior engineers, conduct code reviews, and enforce best practices in development and deployment.
ETL Development : Lead the design and implementation of end-to-end ETL processes for structured and semi-structured data.
Framework Building : Develop and evolve data frameworks, reusable components, and automation tools to improve engineering productivity.
Performance Optimization : Optimize large-scale data workflows for performance, cost, and reliability.
Data Governance : Implement data quality, lineage, and governance strategies in compliance with enterprise standards.
Collaboration : Work closely with product, analytics, compliance, and DevOps teams to deliver high-quality solutions aligned with business goals.
CI / CD Automation : Set up and manage continuous integration and deployment pipelines using AWS CodePipeline, Jenkins, or GitLab.
Documentation & Presentations : Prepare technical documentation and present architectural solutions to stakeholders across levels.
Requirements
Required Qualifications :
7–12 years of experience in data engineering or related fields.
Strong expertise in Python programming with a focus on data processing.
Extensive experience with AWS Glue (both Glue Jobs and Glue Studio / Notebooks).
Deep hands-on experience with PySpark for distributed data processing.
Solid AWS knowledge : EMR, S3, Lambda, IAM, Athena, CloudWatch, Redshift, etc.
Proven experience in architecture and managing complex ETL workflows .
Proficiency with Apache Airflow or similar orchestration tools.
Hands-on experience with CI / CD pipelines and DevOps best practices.
Familiarity with data quality , data lineage , and metadata management .
Strong experience working in agile / scrum teams.
Excellent communication and stakeholder engagement skills.
Preferred / Good to Have :
Experience in financial services, capital markets, or compliance systems .
Knowledge of data modeling , data lakes , and data warehouse architecture .
Familiarity with SQL (Athena / Presto / Redshift Spectrum).
Exposure to ML pipeline integration or event-driven architecture is a plus.
Benefits
Flexible work culture and remote options
Opportunity to lead cutting-edge cloud data engineering projects
Skill-building in large-scale, regulated environments.
Requirements
We are seeking a highly experienced and hands-on Lead / senior data engineer to architect, develop, and optimize data solutions in a cloud-native environment. The ideal candidate will have 7–12 years of strong technical expertise in AWS Glue, PySpark, and Python, along with experience designing robust data pipelines and frameworks for large-scale enterprise systems. Prior exposure to the financial domain or regulated environments is a strong advantage. Key Responsibilities : Solution Architecture : Design scalable and secure data pipelines using AWS Glue, PySpark, and related AWS services (EMR, S3, Lambda, etc.) Leadership & Mentorship : Guide junior engineers, conduct code reviews, and enforce best practices in development and deployment. ETL Development : Lead the design and implementation of end-to-end ETL processes for structured and semi-structured data. Framework Building : Develop and evolve data frameworks, reusable components, and automation tools to improve engineering productivity. Performance Optimization : Optimize large-scale data workflows for performance, cost, and reliability. Data Governance : Implement data quality, lineage, and governance strategies in compliance with enterprise standards. Collaboration : Work closely with product, analytics, compliance, and DevOps teams to deliver high-quality solutions aligned with business goals. CI / CD Automation : Set up and manage continuous integration and deployment pipelines using AWS CodePipeline, Jenkins, or GitLab. Documentation & Presentations : Prepare technical documentation and present architectural solutions to stakeholders across levels. Required Qualifications : 7–12 years of experience in data engineering or related fields. Strong expertise in Python programming with a focus on data processing. Extensive experience with AWS Glue (both Glue Jobs and Glue Studio / Notebooks). Deep hands-on experience with PySpark for distributed data processing. Solid AWS knowledge : EMR, S3, Lambda, IAM, Athena, CloudWatch, Redshift, etc. Proven experience in architecting and managing complex ETL workflows. Proficiency with Apache Airflow or similar orchestration tools. Hands-on experience with CI / CD pipelines and DevOps best practices. Familiarity with data quality, data lineage, and metadata management. Strong experience working in agile / scrum teams. Excellent communication and stakeholder engagement skills. Preferred / Good to Have : Experience in financial services, capital markets, or compliance systems. Knowledge of data modeling, data lakes, and data warehouse architecture. Familiarity with SQL (Athena / Presto / Redshift Spectrum). Exposure to ML pipeline integration or event-driven architecture is a plus. What We Offer : Flexible work culture and remote options Opportunity to lead cutting-edge cloud data engineering projects Skill-building in large-scale, regulated environments.
Data Engineer • Hyderabad, TG, in