Position : Lead Software Engineer, SRE, Order Management Systems
Key Responsibilities :
Ensure high availability and reliability of mission-critical systems like OMS, Drop Ship, Microservices, and supply chain integrations.
Provide technical guidance to the team on IBM Sterling order management solutions. It includes providing support on new feature development for order fulfillment like buy online ship to store, buy online ship from store.
Works with Care center associate to understand issue faced by them and customer for better order management / capture / tracking process. Working on solutions for those issue though UI technology like angular JS and back-end technology microservices, interop servlet etc.
Reduce technical debt, stabilize performance for Sterling Order management system. It will include fine tuning SQL queries, having extensive review for code and configuration. Works with different engineering team on recommendation of fixes to make sure system code is bug free and system performance can handle peak load without issue.
Develop and implement observability solutions using tools such as OpenSearch, APPD, Grafana, Prometheus, and New Relic.
Optimize system performance by analyzing trends, KPIs, and historical incidents.
Work with engineering teams to improve system architecture for reliability and fault tolerance.
Enhance system alerting and proactive monitoring to identify issues before they impact operations.
Develop automation scripts and tools to eliminate manual interventions in system operations.
Build self-healing and auto-remediation workflows to minimize human intervention in production issues.
Lead incident response efforts, ensuring quick resolution and root cause analysis.
Establish and enhance post-mortem and preventive action processes to drive continuous improvements.
Create custom dashboards and analytics solutions to provide visibility into system performance.
Implement intelligent alerting mechanisms that reduce noise and improve incident detection accuracy.
Define and track critical business and system SLIs, SLOs, and error budgets.
Essential skills in data analysis, including SQL queries, Excel / Google Sheets, critical thinking, and data visualization.
Ability to leverage data-driven insights to optimize system performance and reliability.
Experience presenting analytical findings to technical and non-technical stakeholders.
Understand how to use the power of data to drive decision-making and solve complex problems.
Identify operational bottlenecks and implement best practices for incident resolution.
Work closely with Vendors, business, and engineering teams to enhance SOPs and operational workflows.
Improve collaboration and communication across engineering, support, and product teams.
Mentor and guide team members in scaling their expertise and becoming SMEs in key areas.
Lead Software Engineer • pune, India