Job Summary :
We are seeking an experienced Messaging Middleware Administrator with strong expertise in Kafka (Apache, Confluent, MSK) and RabbitMQ to join our infrastructure and platform team. The ideal candidate will have a proven background in administering, optimizing, and troubleshooting messaging platforms in production environments. This role involves ensuring high availability, scalability, and security of messaging systems while working closely with cross-functional teams. The candidate will also play a key role in incident resolution, monitoring, and mentoring junior team members.
Key Responsibilities :
- Kafka & RabbitMQ Administration : Install, configure, upgrade, and maintain Kafka clusters (Apache, Confluent, MSK) and RabbitMQ in production and non-production environments.
- System Monitoring & Optimization : Monitor cluster health, latency, throughput, and resource utilization. Proactively tune configurations to optimize performance and reliability.
- Incident Management : Diagnose and resolve production issues related to brokers, queues, topics, zookeeper / KRaft, schema registry, and connectors. Provide on-call support as required.
- High Availability & Disaster Recovery : Implement and manage backup, failover, and disaster recovery strategies for messaging systems.
- Security & Compliance : Ensure secure access controls, encryption, authentication (AuthN), and authorization (AuthZ) for Kafka and RabbitMQ.
- Automation & CI / CD Integration : Automate provisioning, scaling, and monitoring tasks using scripting (Python, Bash, Ansible, or Terraform) and integrate with CI / CD pipelines.
- Collaboration : Partner with developers, DevOps, and cloud engineers to design and deliver reliable messaging solutions.
- Documentation : Maintain detailed technical documentation, standard operating procedures, and architecture diagrams.
- Mentorship : Guide and mentor junior engineers in best practices for managing messaging middleware.
Must-Have Qualifications & Skills :
Overall 7+ years of IT experience with 3+ years dedicated to Kafka and RabbitMQ administration.Strong hands-on experience with Apache Kafka, Confluent Kafka, Amazon MSK, and RabbitMQ in large-scale production setups.In-depth knowledge of Kafka components : brokers, zookeeper / KRaft, schema registry, connectors, partitions, and replication.Proven expertise in monitoring tools (Prometheus, Grafana, ELK, Datadog, etc.) and middleware performance metrics.Experience troubleshooting message delivery, consumer lag, cluster instability, and connectivity issues.Familiarity with cloud environments (AWS / Azure / GCP) and managed messaging services.Strong problem-solving skills with a proactive and analytical mindset.Excellent communication skills with the ability to collaborate across engineering and operations teams.Good-to-Have Skills :
Experience with other messaging systems (ActiveMQ, Pulsar, or IBM MQ).Scripting & automation knowledge (Python, Bash, Ansible, or Terraform).Exposure to DevOps practices, CI / CD pipelines, and Kubernetes.Knowledge of security protocols (TLS, SSL, Kerberos, SASL, OAuth).Prior experience in capacity planning and large-scale cluster migrations.Personal Attributes :
Strong ownership mindset with attention to detail.Ability to work effectively under pressure and manage critical incidents.Passion for continuous learning and keeping up with evolving technologies.Collaborative attitude and willingness to mentor and support team members(ref : hirist.tech)