DevOps Engineer

OctoposKanpur, IN

12 hours ago

Job description

We're seeking an experienced DevOps / Site Reliability Engineer to join our team and take ownership of our testing, deployment, and infrastructure operations for Octopos, our multi-platform point-of-sale SaaS solution. You'll be responsible for building robust CI / CD pipelines, managing our database infrastructure, and ensuring high availability for our retail customers who depend on us 24 / 7. This is a full REMOTE position.

CI / CD & Deployment Pipeline

Design and implement comprehensive CI / CD pipelines for our diverse tech stack (React, Laravel, Node.js, React Native)
Manage multi-platform deployments including web, Android (Capacitor), Windows (Electron)
Manage Google Play Store releases including APK / AAB uploads, versioning, and staged rollouts
Handle App Store submissions and TestFlight distributions
Create and maintain staging environments that accurately mirror production
Implement automated testing strategies across all applications
Establish deployment rollback procedures and blue-green deployment strategies

Infrastructure & Monitoring

Implement and maintain comprehensive monitoring using Grafana dashboards and alerting

Set up centralized logging infrastructure (ELK stack or similar) for all applications

Monitor and maintain production servers ensuring 99.9% uptime for POS operations

Design custom metrics and KPIs specific to POS operations (transaction success rates, hardware connectivity)

Manage incident response and on-call rotations

Optimize application performance and resource utilization

Ensure infrastructure security and PCI compliance requirements

Database Management

Design and implement multi-node MySQL cluster for high availability

Create and manage automated backup strategies with point-in-time recovery

Monitor database performance and implement optimization strategies

Plan and execute database migrations with zero downtime

Implement disaster recovery procedures

Testing & Quality Assurance

Build automated testing frameworks for React, Laravel, and Node.js applications

Implement E2E testing for critical POS workflows including payment processing

Create testing strategies for hardware integration (payment terminals, printers, scanners)

Establish code quality gates and coverage requirements

Documentation & Knowledge Transfer

Create and maintain comprehensive documentation for all infrastructure, deployment processes, and runbooks

Develop disaster recovery playbooks and incident response procedures

Document monitoring alerts, thresholds, and escalation procedures

Maintain architectural diagrams and system dependencies documentation

Create video tutorials and guides for common operational tasks

Required QualificationsTechnical Skills

3+ years of DevOps / SRE experience with production systems

Strong experience with CI / CD tools (GitHub Actions, GitLab CI, Jenkins)

Hands-on experience with Grafana, Prometheus, and alerting systems

Experience with centralized logging solutions (ELK, Splunk, or similar)

Proficiency in containerization (Docker) and orchestration (Kubernetes / Docker Compose)

Expertise in MySQL administration including replication and clustering

Experience with Infrastructure as Code (Terraform, Ansible, or similar)

Solid understanding of Linux system administration

Proficiency in scripting (Bash, Python, or similar)

Application-Specific Experience

Experience deploying React / Node.js applications at scale

Familiarity with Laravel deployment and optimization

Experience managing mobile app releases and versioning strategies

Understanding of Electron app packaging and distribution

Knowledge of WebSocket implementations and real-time systems

Soft Skills

Excellent technical writing and documentation skills

Experience training and mentoring junior engineers

Strong communication skills for cross-functional collaboration

Ability to explain complex technical concepts to non-technical stakeholders

Work Schedule & On-Call RequirementsCore Hours

Must be available during US Pacific Time business hours (9 AM - 5 PM PST / PDT)

This is a full remote position

On-Call Responsibilities

As our POS platform serves retail businesses operating 7 days a week, this role includes participation in an on-call rotation to ensure 24 / 7 system reliability.

On-Call Structure :

Participate in rotating on-call schedule

Response time : 15-minute acknowledgment, 30-minute engagement during on-call periods

Average incident volume : 1 Incident every 2 months.

Severity-based response (P1 : immediate, P2 : 30 minutes, P3 : next business day)

On-Call Compensation :

Standby Pay : Additional compensation for on-call availability (paid whether or not incidents occur)

Incident Response Pay : 1.5x hourly rate for incident response during nights / weekends

Compensatory Time : Time off provided after significant weekend incidents

Company-provided phone and laptop dedicated for on-call use

Post-incident review process to minimize repeat issues and alert fatigue

Support Structure :

Comprehensive runbooks and automated remediation for common issues

Clear escalation procedures to senior leadership and vendor support

Robust monitoring to minimize false positives

Regular rotation reviews to ensure fair distribution

What We Offer

Opportunity to architect infrastructure for a growing SaaS platform

Work with diverse, modern technology stack

Direct impact on system reliability affecting thousands of daily transactions

Competitive on-call compensation package

Professional development budget for certifications and training

12LPA plus salary

Requirements

Strong written and verbal communication skills

Demonstrated experience in creating technical documentation

Ability to work during US Pacific Time business hours

Willingness to participate in compensated on-call rotation

Self-motivated with excellent troubleshooting skills

Experience working in fast-paced, agile environments

Commitment to knowledge sharing and team development

Create a job alert for this search

Engineer • Kanpur, IN