About the Role
We're seeking an experienced Infrastructure Engineer to join our platform team, handling massive-scale data processing and analytics infrastructure that supports over 5B+ events and more than 5M+ DAU. We're looking for someone who can help us scale gracefully while optimizing for performance, cost, and resiliency.
Key Responsibility
- Design, implement, and manage our AWS infrastructure, with a strong emphasis on automation, resilience, and cost efficiency.
- Implement and manage stream processing frameworks (Kafka).
- Handle orchestration and ETL workloads, employing services like AWS Glue, Athena, Redshift, or Apache Airflow.
- Develop P0 and P1 issues alert and resolution process / pipeline.
- Monitor, debug, and resolve production issues related to data and infrastructure in real time.
- Implement IAM controls, logging, alerts, and Security Best Practices across all components.
- Provide deployment automation (Docker) and collaborate with application engineers to enable smooth delivery.
Required Skills
3+ years of experience with AWS services (VPC, EC2, S3, Security Groups, RDS, MSK).Ability to handle 5 billion events / day and 1M+ concurrent users' workloads gracefully.Familiar with scripting (Python, Terraform) and automation practices (Infrastructure as Code).Familiar with network fundamentals, Linux, scaling strategies, backup routines, and CDC pipeline.Collaborative team player — able to work with engineers, data analysts, and stakeholders.Preferred Tools
AWS : EC2, S3, VPC, Security Groups, RDS, DocumentDB, MSK, Glue, Athena, CloudWatchInfrastructure as Code : TerraformScripted automation : Python, BashContainer orchestration : Docker, ECS, or EKSWorkflow orchestration : Apache AirflowStreaming framework : Apache KafkaOther : Linux, Git, Security best practices (IAM, Security Groups, ACM)Skills Required
Apache Airflow, Terraform, Docker, Linux, AWS Glue, Kafka, Python, Aws