Home
/
Comprehensive
/
Principal Site Reliability Engineering Manager- Viva Engage
Principal Site Reliability Engineering Manager- Viva Engage-September 2024
San Francisco
Sep 4, 2025
About Principal Site Reliability Engineering Manager- Viva Engage

  What is Viva Engage?

  Viva Engage is the industry-defining social network for the enterprise. We provide a platform for millions of employees, including those from 85% of Fortune 500 companies, to build community and culture, share knowledge, and connect with their leaders and each other.

  Why Viva Engage?

  Acquired by Microsoft in 2012, Viva Engage combines the benefits of a startup - rapid innovation, cutting-edge technology, outsized individual impact - with the advantages of working for one of the most successful software companies in the world. We believe in mission-driven work and our platform has become more indispensable than ever as it fosters connection and a sense of belonging among remote teams. #VivaEngage

  You will have:

  Autonomy and freedom to innovate

  Choice of the best of open source and Microsoft-internal technology

  The ability to experiment, A/B test, and make data-driven decisions

  Tons of opportunity for outsized impact as part of a small but mighty team on a rapidly-growing product needed now more than ever

  As Principal Site Reliability Engineering Manager in Viva Engage , you will have two critical accountabilities:

  The first is leading efforts to fully embrace site reliability engineering principals while building critical infrastructure, optimizing existing systems, and eliminating toil. You will oversee efforts that combine software and systems engineering to build, scale and operate the large-scale conversation platform that powers Viva Engage experiences. With our origins as a startup but now part of Microsoft, your purview spans our own open-source-based tech stack, Azure managed services, and M365 technology.

  The second expectation is to improve overall reliability for Viva Engage. This means guiding engineering teams to develop missing capabilities, and driving changes to our culture and processes to make reliability a critical aspect of how we work. We have been growing rapidly to become a critical workload for many of the world’s largest organizations and are looking for you to help us get to the next level.

  You should have a well-established playbook developed through years of experience operating world-class systems on a huge scale. You should be able to paint a vision of the future and build consensus across the organization while still being able to dive into details. The day-to-day responsibilities include a blend of technical, hands-on leadership with demonstrated people management and partnership skills.

  Location: By applying to this U.S. based position, relocation does not apply/is not provided for the role.

  Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

  Responsibilities

  Mentor engineers within the infrastructure team and in partner teams in improving service reliability and evangelize reliability practices across the organization

  Drive accountability across the entire engineering organization with well-defined processes, metrics, and goals for reliability. This may include retooling existing rituals and creating new ones.

  Collaborate across various teams to provide input into capacity planning; failure/reliability analysis; performance analysis; security and customer privacy analysis

  Participate in the incident manager on-call rotation to co-ordinate responses to Service Level Agreement (SLA) impacting incidents. Keeping relevant stakeholders and leadership apprised of details related to incident impact and status of resolution

  In addition, you have people management responsibilities including driving employee growth and development, executing projects, and managing performance, while continuing to evolve our infrastructure

  Embody our culture (https://careers.microsoft.com/v2/global/en/culture)  and values (https://www.microsoft.com/en-us/about/corporate-values)  

  Qualifications

  Required/Minimum Qualifications:

  8+ years technical experience in software engineering, network engineering, systems administration, or Site Reliability Engineeringo OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience in software engineering, network engineering, systems administration, or Site Reliability Engineering

  o OR Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, systems administration, or Site Reliability Engineering

  o OR Doctorate Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, systems administration, or Site Reliability Engineering

  3+ years of people management experience leading Site Reliability Engineers or livesite teams.

  6+ years of experience in a Site Reliability Engineering role building and operating systems with world-class reliability at huge scale (100m+ Monthly Active Usage).

  6+ years technical engineering experience with building large scale distributed systems using, but not limited to Golang, Java, Python, containers and container orchestration systems (such as Docker, Kubernetes, Apache Mesos), infrastructure as code (such as Terraform), databases (such as Postgres, data sharding), and Cloud Platforms (such as Microsoft Azure, Amazon Web Services, Google Cloud Platform).

  Additional/Preferred Qualifications:

  Demonstrated experience growing and coaching people, and acts as a role model for others.

  6+ years technical engineering experience with coding in languages including, but not limited to Golang, Java, or Python.

  6+ Experience with containers and container orchestration systems

  6+ Experience operating and evolving large-scale distributed systems in a cloud infrastructure (such as Kubernetes, Apache Mesos, Docker)

  6+ Experience with Infrastructure as code (Terraform)

  6+ Experience with large scale databases (Postgress, data sharding)

  6+ Experience with Linux, Ubuntu, Microsoft Azure, Amazon Web Services, Google Cloud Platform is preferred.

  Site Reliability Engineering M5 - The typical base pay range for this role across the U.S. is USD $133,600 - $256,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $173,200 - $282,200 per year.

  Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay

  Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations (https://careers.microsoft.com/v2/global/en/accessibility.html) .

Comments
Welcome to zdrecruit comments! Please keep conversations courteous and on-topic. To fosterproductive and respectful conversations, you may see comments from our Community Managers.
Sign up to post
Sort by
Show More Comments
SIMILAR JOBS
Senior System Engineer
Senior System EngineerJob Locations US-CO-Colorado Springs ID2023-4397 CategorySoftware Development TypeFull Time Hours/Week40 Location : NameColorado Springs, CO MinUSD $100,000.00/Yr. MaxUSD $180,0
Cardiology Coder
Description Introduction Are you passionate about the patient experience? At HCA Healthcare, we are committed to caring for patients with purpose and integrity. We care like family! Jump-start your c
Neuro - Hospitalist
Job Description: Neuro Hospitalist (Neurology) Intermountain Healthcare (Legacy SCL Health Medical Group)Your day. Join a rapidly growing group currently consisting of 4 Outpatient Neurologistsinclud
Client Solutions Manager
About Us Freeman is a global leader in events, on a mission to redefine live for a new era. With a data-driven approach and the industry’s largest network of experts, Freeman’s insights shape exhibit
Tax Senior Associate, Accounting Methods (AMCS)
Business Title: Tax Senior Associate, Accounting Methods (AMCS) Requisition Number: 109782 - 42 Function: Tax Services Area of Interest: State: MD City: Baltimore Description: At KPMG, you can become
Facilities Associate Principal (Facility Condition Assessments)
What You'll DoPrimary responsibilities include providing technical expertise in Facility Condition Assessments (FCAs) with knowledge of FCA methodology and ASTM uniformat II classification system, an
AsstDir-Finance & Accounting
Job Number 24010880 Job Category Finance & Accounting Location San Juan Marriott Resort & Stellaris Casino, 1309 Ashford Avenue, San Juan, Puerto Rico, United States Schedule Full-Time Locate
Application Developer: Open Source
Introduction In this role, you'll work in our IBM Client Innovation Center (CIC), where we deliver deep technical and industry expertise to a wide range of public and private sector clients around th
Nurse Practitioner
Reference #: FAMPRNURSEPRA03202301Nurse Practitioner    Description/Job Summary Upholds the health center mission by assuring that patients receive health care that is competent, caring, cost-effecti
Direct Support Staff Needed!
Direct Support Staff Needed! Branford, CT, USA ● Bridgeport, CT, USA ● East Haven, CT, USA ● Middletown, CT, USA ● Milford, CT, USA ● New Haven, CT, USA ● North Haven, CT 06473, USA ● Prospect, CT 06
Copyright 2023-2025 - www.zdrecruit.com All Rights Reserved