Talent.com
This job offer is not available in your country.
Lead Engineer - Cloud Reliability

Lead Engineer - Cloud Reliability

ConfidentialBengaluru / Bangalore
9 days ago
Job description

Company Summary

DISH Network Technologies India Pvt. Ltd is a technology subsidiary of EchoStar. Our organization is at the forefront of technology, serving as a disruptive force and driving innovation and value on behalf of our customers.

Our product portfolio includes Boost Mobile (consumer wireless), DISH TV (direct broadcast satellite), Sling TV (over-the-top service provider), Hughes (global satellite connectivity solutions) and Hughesnet (satellite internet).

Our facilities in India are some of EchoStar's largest development centers outside the U.S. As a hub for technological convergence, our engineering talent is a catalyst for innovation in multimedia network and communications development.

EchoStar does not charge any fees to job applicants or candidates at any point during the recruitment and hiring process. We strongly advise all individuals to be vigilant and disregard any unsolicited requests for payment or personal financial information related to an EchoStar employment opportunity.

Department Summary

Our Technology teams challenge the status quo and reimagine capabilities across industries. Whether through research and development, technology innovation or solution engineering, our people play vital roles in connecting consumers with the products and platforms of tomorrow.

Job Duties and Responsibilities

System Reliability & Performance :

Design, implement, and maintain monitoring, alerting, and logging solutions for webMethods, GemFire, AWS services, and Kubernetes clusters to proactively identify and resolve issues.

Develop and implement automation for operational tasks, incident response, and system provisioning / de-provisioning.

Participate in on-call rotations to respond to critical incidents, troubleshoot complex problems, and perform root cause analysis (RCA).

Identify and eliminate toil through automation and process improvements.

Conduct performance tuning and capacity planning for all supported platforms.

Platform Expertise :

webMethods : Support, maintain, and optimize webMethods Integration Server, Universal Messaging, API Gateway, and related components. Experience with webMethods upgrades, patching, and configuration management.

GemFire : Administer and optimize GemFire clusters, ensuring high availability, data consistency, and performance for critical applications. Troubleshoot GemFire-related issues, including cache misses, replication problems, and member failures.

AWS Cloud : Manage and optimize AWS cloud resources (EC2, S3, RDS, VPC, IAM, CloudWatch, Lambda, etc.) for scalability, security, and cost-efficiency.

Rancher Kubernetes : Administer, troubleshoot, and optimize Kubernetes clusters managed by Rancher. Experience with Helm charts, Kubernetes operators, ingress controllers, and network policies.

Collaboration & Best Practices :

Collaborate closely with development teams to ensure new features and services are designed for reliability, scalability, and observability.

Implement and champion SRE best practices, including SLO / SLA definition, error budgeting, chaos engineering, and blameless post-mortems.

Develop and maintain documentation for systems, processes, and runbooks.

Mentor junior engineers and contribute to a culture of continuous learning and improvement.

Skills, Experience and Requirements

Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.

8+ years of experience in an SRE, DevOps, or highly technical operations role.

Deep expertise in at least two, and strong proficiency in all, of the following :

webMethods Integration Platform (Integration Server, Universal Messaging, API Gateway).

VMware GemFire (or other distributed in-memory data grids like Apache Geode, Redis Enterprise).

AWS cloud services (EC2, S3, RDS, VPC, CloudWatch, EKS etc.).

Kubernetes administration, particularly with Rancher and EKS.

Strong scripting and programming skills : Python, Go, Java, Bash.

Experience with Infrastructure as Code (IaC) tools such as Terraform or CloudFormation.

Proficiency with CI / CD pipelines (e.g., Jenkins, GitLab CI, AWS CodePipeline).

Experience with monitoring and logging tools (e.g., Dynatrace, Prometheus, Grafana, ELK Stack, Datadog, Splunk).

Solid understanding of networking concepts (TCP / IP, DNS, Load Balancing, VPNs).

Excellent problem-solving, analytical, and communication skills.

Ability to work effectively in a fast-paced, collaborative environment.

Nice to have skills

Experience with other integration platforms or message brokers.

Knowledge of other distributed databases or caching technologies.

AWS Certifications.

Kubernetes Certifications (CKA, CKAD, CKS).

Experience with chaos engineering principles and tools.

Familiarity with agile methodologies.

Benefits

  • Insurance and Wellbeing
  • Financial & Retiral Benefit Program
  • Mental Wellbeing
  • Employee Stock Purchase Program (ESPP)
  • Professional Development Reimbursement
  • Time Off
  • Team Outings

Skills Required

Networking Concepts, Java, Bash, Python, Go

Create a job alert for this search

Lead Cloud Engineer • Bengaluru / Bangalore