This job offer is not available in your country.

Site Reliability Engineer - AIOps / Observability Services

Intraedge Technologies Ltd.Hyderabad

30+ days ago

Job description

L2Observability / AIOps :

Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems.

SRE ensures internally critical and externally visible systems have reliability and uptime appropriate to users' needs and a fast rate of improvement while keeping an ever-watchful eye on capacity and performance.

SRE is a mindset, and a set of engineering approaches focused on optimizing existing systems, building infrastructure, and eliminating work through automation.

As a Site Reliability Engineer with focus on observability you will build and operate next generation observability platforms.

As an SRE with Observability focus you will :

Explore the complex IT estates of our clients to understand their observability / AIOps opportunities, identify the areas to improvise.
Collaborate to architect unified observability and AIOps strategies which employ leading AI technology.
Implement enterprise observability / AIOps technology and processes.
Amplify observability / AIOps outcomes by accelerating adoption across technology and business include :
Architect observability solutions to address the gaps in order to reduce organizational MTTD and MTTR objectives.
Developing API-driven micro-services that combine into large and complex platforms.
Planning and executing highly parallel distributed object storage transformations and migrations.
Maintaining automated test suites using CI / CD tools.
Participating in collaborative projects with small software engineering teams.
Develop automation, processes, and tools designed to make our services simpler and more robust.
Participate in troubleshooting, capacity planning and analysis, performance analysis activities.
Advise management on service onboarding strategies and execution.

What we are looking for :

Entrepreneurs who seek challenging problems to solve.

Creativity, initiative and acute attention to detail.

Thirst for innovation and solving problems at lightning speed.

Passion for automating everything repetitive.

Obsession with software scalability and performance under high loads.

Love for using and contributing to open-source software.

Please bring to the table :

Experience in architecting complex IT solutions.

Understanding of observability dimensions(Metrics, logs, traces).

Excellent communication and stakeholder management skills.

Development experience, comfortable working in multiple languages(Python, Java, Go and Ruby a plus).

Experience working in collaborative coding environments (peer review, continuous integration, etc).

7+ years of application development.

Experience working in distributed remote teams across multiple time zones.

Experience in large scale operations environments.

7+ years of experience with Linux / Unix development or systems administration.

3+ years of experience with networking systems and technologies.

Deep understanding of network performance and security.

Ability to identify tasks which require automation and implement required automation.

Configuration Management tools experience with Puppet, Chef, SaltStack.

Hands-on operational experience in a high-volume or critical production service environment distributed systems, capacity planning, continuous deployment.

BA / BS in Computer Science preferred, or equivalent experience (advanced degrees preferred).

We have opportunities to work with and learn :

Object Storage Minio / S3 / etc.

Data Collection OpenTelemetry / Grafana Alloy / etc.

Message Bus Kafka / NSQ / etc.

Scaling Databases Relational database technologies at large scale Scheduling & Orchestration Cloud Platforms AWS / Azure.

(ref : hirist.tech)

Create a job alert for this search

Site Reliability Engineer • Hyderabad

Related jobs

Promoted

Site Reliability Engineer

GSPANN Technologies, Inchyderabad, telangana, in

GSPANN is a global IT services and consultancy provider headquartered in Milpitas, California (U.With five global delivery centers across the globe, GSPANN provides digital solutions that support t...Show moreLast updated: 7 days ago

Promoted

Site Reliability Engineer

Xebiahyderabad, telangana, in

AWS Engineer with strong Python development and Chaos Engineering expertise.The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault toler...Show moreLast updated: 26 days ago

Promoted

Senior Site Reliability Engineer

AutoRABIThyderabad, telangana, in

AutoRABIT is the leader in DevSecOps for SaaS platforms such as Salesforce.Its unique metadata-aware capability makes Release Management, Version Control, and Backup & Recovery complete, reliable, ...Show moreLast updated: 16 days ago

Promoted

Site Reliability Engineer - Observability Services

TeamWare SolutionsHyderabad

Role Summary : We are seeking a highly skilled Site Reliability Engineer (SRE) with a strong focus on observability.The ideal candidate will have 5-8 years of experie...Show moreLast updated: 30+ days ago

Promoted

Engineer, Site Reliability [T500-20266]

ANSRHyderabad, Telangana, India

ANSR is hiring for one of its clients.NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flags...Show moreLast updated: 14 days ago

Promoted

Engineer, Site Reliability [T500-20504]

ANSRhyderabad, telangana, in

Promoted

Engineer, Site Reliability [T500-20515]

ANSRhyderabad, telangana, in

Promoted

Engineer, Site Reliability [T500-20521]

ANSRhyderabad, telangana, in

Promoted

Sr Engineer, Site Reliability Engineer [T500-20464]

ANSRhyderabad, telangana, in

Promoted

Lead - Site Reliability Engineer

VXI Global Solutionshyderabad, telangana, in

We are looking for a Lead - Site Reliability Engineer with 8+ years for Experience into design, implement, and manage robust observability solutions across our cloud infrastructure and applications...Show moreLast updated: 25 days ago

Promoted

Engineer, Site Reliability [T500-20518]

ANSRhyderabad, telangana, in

Promoted

Site Reliability Engineer

ConcordHyderabad, IN

Engineers (Individual Contributors).Strong SRE (Site Reliability Engineering).CI / CD, monitoring, automation, infrastructure as code, etc.Show moreLast updated: 17 days ago

Promoted

Senior Site Reliability Engineer

WSO2hyderabad, telangana, in

Founded in 2005, WSO2 is the largest independent software vendor providing open-source API management, integration, and identity and access management (IAM) to thousands of enterprises in over 90 c...Show moreLast updated: 7 days ago

Promoted

Engineer, Site Reliability [T500-20519]

ANSRhyderabad, telangana, in

Promoted

Engineer, Site Reliability [T500-20520]

ANSRHyderabad, Telangana, India

Promoted

Site Reliability Engineer

Amicon Hub Serviceshyderabad, telangana, in

Manage and scale production systems hosted on.Automate operational tasks using.Improve system reliability and reduce manual interventions through automation. Collaborate with development teams to en...Show moreLast updated: 5 days ago

Promoted

Engineer, Site Reliability [T500-20503]

ANSRHyderabad, Telangana, India

Promoted

Engineer, Site Reliability [T500-20517]

ANSRhyderabad, telangana, in