Home

Comprehensive

Senior Site Reliability Engineering Manager, Observability

Senior Site Reliability Engineering Manager, Observability-March 2024

London

Mar 28, 2026

About Senior Site Reliability Engineering Manager, Observability

　　Who We Are

　　The name ThousandEyes was born from two big ideas: the power to see what’s not ordinarily possible, and the ability to collect intelligence from vantage points as diverse and global as the Internet. As organizations depend on cloud services, the Internet has become their defacto network connecting cloud applications to users. Our Internet and cloud intelligence platform is like a ‘Google maps of the Internet’, providing the only collectively powered view of digital experiences end-to-end. We enable our customers made up of the world’s largest and fastest-growing brands, to identify problems before they impact revenue, brand reputation, or employee productivity.

　　In August 2020, Cisco Systems completed the acquisition of ThousandEyes, which now forms the ThousandEyes Business Unit within Cisco’s Network Services Business Group, and is a foundational component of Cisco’s growing Observability business.

　　About The Role

　　This role is the Senior Site Reliability Engineering Manager for the Observability SRE team at ThousandEyes. The Observability team is responsible for providing a world class developer experience when they need to understand and observe platform behavior. In addition to visibility, this team drive visibility into action, relentlessly pursuing the goal of a platform that is resilient, fault tolerant, and self-healing.

　　What You'll Do

　　As a senior engineering manager leading the Observability team, you will be responsible for the design, development and operations of our internal observability platform. Working with a team of strong and mission focused engineers, you’ll bring a user-focused perspective to delivering observability as a platform for a team running the best observability platform in the industry.

　　Qualifications

　　Proven site reliability engineering management experience or experience delivering an internal developer platform focused on production operations, ideally managing 4+ engineers

　　Can provide strong technical vision for your team and ensure consistent delivery on objectives

　　Have experience formulating a team's technical strategy and roadmap; you've collaborated and partnered effectively with several other teams to execute on shared goals

　　5+ years of experience building and supporting missing critical services with focus on automation, observability, availability and performance

　　Experience building infrastructure and operating services in production environments which are required to have high availability and reliability

　　You have worked on large-scale distributed systems including multi-tiered architecture

　　Understand how to balance tactical needs with strategic growth and quality-based initiatives that can span multiple quarters

　　Preferred Qualifications

　　Cloud Native Observability via Kubernetes, Prometheus, OpenTelemetry, and other industry standard or CNCF technologies

　　Operated a cloud service at significant scale

　　Delivered an engineering-wide platform for service visibility

　　Owned incident response process, post-mortem practices, or service best practice standards

　　Cisco is an Affirmative Action and Equal Opportunity Employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, gender, sexual orientation, national origin, genetic information, age, disability, veteran status, or any other legally protected basis.

　　Cisco will consider for employment, on a case by case basis, qualified applicants with arrest and conviction records.

Previous page： Adrc Professional Next page： Senior Account Rep, Account Rep, East China Regional Team

Comments

Welcome to zdrecruit comments! Please keep conversations courteous and on-topic. To fosterproductive and respectful conversations, you may see comments from our Community Managers.