Description GSPANN is hiring Senior Data Engineers with expertise in Amazon Web Services (AWS) to design, build, and optimize scalable data pipelines and architectures on cloud platforms. Candidates must be proficient in Python, Hive, and Airflow, with strong problem-solving and data engineering expertise.
Role and Responsibilities
- Actively contribute to all phases of the software development lifecycle, including requirements gathering, functional and technical design, development, testing, roll-out, and support.
- Solve complex business problems by applying a disciplined development methodology.
- Build scalable, flexible, efficient, and supportable solutions using appropriate technologies.
- Analyze source and target system data, and map transformations that meet business requirements.
- Collaborate with clients and onsite coordinators throughout project phases.
- Design and implement product features in partnership with business and technology stakeholders.
- Anticipate, identify, and resolve issues related to data management to improve data quality.
- Clean, prepare, and optimize large-scale data for ingestion and consumption.
- Support new data management initiatives and restructure existing data architectures as needed.
- Implement automated workflows and routines using workflow scheduling tools such as Apache Airflow.
- Apply continuous integration, test-driven development, and production deployment frameworks.
- Review and contribute to design, code, test plans, and dataset implementations by other data engineers, ensuring adherence to standards.
- Analyze and profile data to design scalable and reliable solutions.
- Troubleshoot data issues, perform root cause analysis, and proactively resolve product-related problems.
Skills and Experience
6+ years of experience developing data and analytics solutions.Bachelor’s degree in Computer Science or a related field.Expertise in building data lake solutions with AWS technologies such as Elastic MapReduce (EMR), Simple Storage Service (S3), Apache Hive, and PySpark.Strong knowledge of relational SQL.Proficiency in scripting languages, particularly Python.Experience with source control tools such as GitHub and related development processes.Hands-on experience with workflow scheduling tools such as Apache Airflow.In-depth knowledge of AWS cloud services, including S3, EMR, and Databricks.Strong problem-solving and analytical mindset with a passion for data solutions.Proven track record in designing, developing, and testing data pipelines.Comfortable working in Agile teams.Influence and communicate effectively with both technical and business stakeholders, verbally and in writing.Quickly learn new programming languages, technologies, and frameworks.