Talent.com
This job offer is not available in your country.
Software Platform Engineer

Software Platform Engineer

Data DynamicsPune, Maharashtra, India
28 days ago
Job description

Overview

We are seeking a skilled Platform Engineer to join our team and drive the development, deployment, and supportability of our Kubernetes-based microservices platform, deployed on-premises by customers. You will build comprehensive observability, enable log and report extraction for service cases without real-time access, and optimize our overuse of Kafka by integrating Redis and batch processing. This role requires expertise in Kubernetes, Azure DevOps, C++ support, deployment sizing, and designing for reliability, availability, and serviceability (RAS).

Responsibilities

  • Build Comprehensive Observability : Implement centralized metrics, logging, and tracing (e.g., Prometheus, Fluentd, OpenTelemetry) for .NET, Python, Java, C++, Kafka, and Redis, ensuring supportability in on-premises environments.
  • Enable Log / Report Extraction : Design customer-facing tools (e.g., CLI scripts, Helm chart options) to collect and export logs / metrics from on-premises deployments for service cases, without real-time access.
  • Optimize Kafka Usage : Audit and optimize Kafka configurations (e.g., topics, partitions, compression) to reduce metadata streaming overhead, monitored with Prometheus or Azure Monitor.
  • Implement Alternatives : Integrate Redis (e.g., Azure Cache for Redis) for metadata caching / pub-sub and batch processing (e.g., Azure Data Factory, Kubernetes Jobs) for high-volume data, reducing Kafka dependency.
  • Troubleshoot Customer Environments : Debug issues in on-premises customer deployments for services (C++, .NET, Python, Java), Kafka, and Redis, using exported logs and metrics.
  • Enhance Product Supportability : Build Azure DevOps pipelines and installers (e.g., Helm charts) for consistent, supportable deployments, with documentation for customer support.
  • Contribute to RAS : Own serviceability by building observability and diagnostic tools; support reliability / availability via Kubernetes optimization, autoscaling, and fault-tolerant designs.
  • Enforce Standards : Implement and enforce structured logging (e.g., JSON with correlation IDs) and resource sizing standards via Azure DevOps pipelines.
  • Optimize Deployment Sizing : Set Kubernetes resource requests / limits and autoscaling policies (e.g., HPA, VPA) for services, Kafka, Redis, and batch jobs, based on profiling.
  • Evaluate Service Meshes : Assess service meshes (e.g., Linkerd) for improving microservice and data platform observability and communication.
  • Support C++ Services : Assist developers in containerizing, deploying, and debugging C++ services, ensuring integration with observability, Kafka, Redis, or batch workflows.
  • Automate with Azure DevOps : Build CI / CD pipelines in Azure DevOps for automated builds, tests, and deployments, integrating with AKS, Kafka, and Redis.

Qualifications

  • Experience : 3–5 years with Kubernetes, Azure DevOps (AKS, pipelines), and Kafka administration.
  • Technical Skills :
  • Expert in Kubernetes (CKA / CKAD preferred) and Azure DevOps (YAML pipelines, AKS integration).
  • Proficient in observability tools (e.g., Prometheus, Grafana, Fluentd, OpenTelemetry, Azure Monitor) for metrics, logs, and tracing.
  • Experience with on-premises Kubernetes deployments and log / report extraction for service cases.
  • Proficient in Kafka optimization (e.g., topic management, consumer groups) and monitoring.
  • Knowledge of Redis (e.g., Azure Cache for Redis, pub / sub) and batch processing (e.g., Azure Data Factory, Kubernetes Jobs).
  • Familiarity with C++ build systems (e.g., CMake) and debugging (e.g., gdb) in Kubernetes.
  • Proficiency in Kubernetes resource management and autoscaling (e.g., HPA, VPA).
  • Scripting skills (e.g., Python, Bash) for automation, diagnostics, and log extraction.
  • Customer Focus : Proven ability to troubleshoot on-premises customer environments and build supportable deployment and observability tools.
  • Standards Enforcement : Experience enforcing logging, sizing, and data platform standards via Azure DevOps pipelines.
  • RAS Expertise : Ability to design for serviceability (observability, diagnostics) and contribute to reliability / availability through platform optimization.
  • Nice-to-Haves

  • Experience with service meshes (e.g., Linkerd, Istio) and their integration with Azure.
  • Familiarity with .NET, Python, or Java for developer collaboration.
  • Knowledge of air-gapped Kubernetes deployments (e.g., Kubeadm, K3s).
  • Create a job alert for this search

    Software Engineer • Pune, Maharashtra, India