Staff Engineer - Reliability and Developer Experience Platforms- Software Engineering
Who we are
Wayfair is seeking a passionate and driven Staff Engineer to join our “Reliability and Developer Experience Platforms” team. In this pivotal role, you will be instrumental in shaping and executing our strategy for Reliability, Observability, and Platform Insights across Wayfair's extensive distributed systems. You will be at the forefront of building and engineering 'Platform as a Product' offerings and driving enablement initiatives that empower our engineering organization to build and operate highly reliable and efficient systems at scale. With the broad reach of the technologies we are using, you will have the opportunity to grow your network and skills by being exposed to new people and ideas who work on a diverse set of cutting-edge technologies. If you are the type of person who is fascinated by engineering extremely large and diverse data systems and if you are passionate about troubleshooting challenging technical problems in a rapidly innovating cloud environment, you could be a great fit.
What You’ll Do :
- Play a key role in developing and driving a multi-year technology strategy for The Reliability and Developer Experience Platforms that handled Reliability Complexities across Wayfair's large scale system strategy.
- Lead multiple software development teams - architecting solutions at scale to empower the business, and owning all aspects of the SDLC : design, build, deliver, and maintain
- Indirectly manage several architects and managers by offering coaching, guidance, and mentorship to support individual and team growth .
- Inspire, coach, mentor, and support your the technology team members in order to grow & retain the engineering talents,
- Lead your team and peers by example. As a senior member of the team your methodologies, technical and operational excellence practices, and system designs will help to continuously improve our platforms,
- Identify, propose, and drive initiatives to advance the technical skills, standards, practices, architecture, and documentation of our platforms,
- Facilitate technical brainstorm and decision making with an appreciation for trade-offs,
- Continuously rethinking and pushing the status quo, even when it challenges your / our established ideas.
What You’ll Need :
As a Senior Architect or Staff Engineer, a key focus has been on collaboratively building impactful products and platforms, working closely with both Product Management and Engineering stakeholders. This collaborative approach has been driven by a deep understanding of market opportunities, enabling the strategic development of solutions designed to achieve significant adoption.A results-oriented and collaborative individual with a pragmatic approach and a continuous improvement mindset, possessing hands-on experience in driving impactful software transformations within complex, high-growth environments involving cross-continentally owned products,16+ years of experience in engineering, out of which at least 10 years spent in building and adoption of the highly performant products and platforms,Proven experience in making critical architectural and design decisions for large-scale platforms, demonstrating a strong understanding of the trade-offs between time-to-market and long-term flexibility, coupled with a track record of designing and building high-scale, generalizable products that deliver an outstanding user experience.Capability to communicate and collaborate across the wider organization, influencing decisions with and without direct authority and always with inclusive, adaptable, and persuasive communicationDemonstrated analytical and decision-making skills, effectively integrating technical feasibility with business requirements to drive informed and impactful outcomes.We Are a Match Because You Have :
Deep Expertise in Scalable, Observable Platform Engineering : Proven track record in architecting scalable backend systems using modern patterns (microservices, event-driven), with built-in observability via tools like Datadog, Prometheus, and OpenTelemetry. Experienced in building resilient, self-monitoring services and critical components like API gateways and ingestion layers.Strong SRE Mindset for Developer Platforms : Applies SRE principles (SLOs, error budgets, automation) to platform products, ensuring reliability and reducing toil for developers. Led SLO definition and implementation for CI / CD systems.Infrastructure as Code & Automation Expertise : Extensive use of Terraform and automation tools to build reusable, self-service infrastructure on platforms like Kubernetes, empowering developers with reliable, easy-to-consume environments.Developer-Centric Tools & API Design : Built intuitive APIs, CLI tools, and self-service deployment systems with deep empathy for developer workflows, significantly improving developer experience and productivity.Robust Software Engineering & Developer Journey Focus : Strong coding skills and deep understanding of the developer lifecycle, consistently driving platform features that align with developer needs and workflows.