Home
/
Comprehensive
/
Principal Site Reliability Engineering Manager- Viva Engage
Principal Site Reliability Engineering Manager- Viva Engage-February 2024
San Francisco
Feb 10, 2026
About Principal Site Reliability Engineering Manager- Viva Engage

  What is Viva Engage?

  Viva Engage is the industry-defining social network for the enterprise. We provide a platform for millions of employees, including those from 85% of Fortune 500 companies, to build community and culture, share knowledge, and connect with their leaders and each other.

  Why Viva Engage?

  Acquired by Microsoft in 2012, Viva Engage combines the benefits of a startup - rapid innovation, cutting-edge technology, outsized individual impact - with the advantages of working for one of the most successful software companies in the world. We believe in mission-driven work and our platform has become more indispensable than ever as it fosters connection and a sense of belonging among remote teams. #VivaEngage

  You will have:

  Autonomy and freedom to innovate

  Choice of the best of open source and Microsoft-internal technology

  The ability to experiment, A/B test, and make data-driven decisions

  Tons of opportunity for outsized impact as part of a small but mighty team on a rapidly-growing product needed now more than ever

  As Principal Site Reliability Engineering Manager in Viva Engage , you will have two critical accountabilities:

  The first is leading efforts to fully embrace site reliability engineering principals while building critical infrastructure, optimizing existing systems, and eliminating toil. You will oversee efforts that combine software and systems engineering to build, scale and operate the large-scale conversation platform that powers Viva Engage experiences. With our origins as a startup but now part of Microsoft, your purview spans our own open-source-based tech stack, Azure managed services, and M365 technology.

  The second expectation is to improve overall reliability for Viva Engage. This means guiding engineering teams to develop missing capabilities, and driving changes to our culture and processes to make reliability a critical aspect of how we work. We have been growing rapidly to become a critical workload for many of the world’s largest organizations and are looking for you to help us get to the next level.

  You should have a well-established playbook developed through years of experience operating world-class systems on a huge scale. You should be able to paint a vision of the future and build consensus across the organization while still being able to dive into details. The day-to-day responsibilities include a blend of technical, hands-on leadership with demonstrated people management and partnership skills.

  Location: By applying to this U.S. based position, relocation does not apply/is not provided for the role.

  Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

  Responsibilities

  Mentor engineers within the infrastructure team and in partner teams in improving service reliability and evangelize reliability practices across the organization

  Drive accountability across the entire engineering organization with well-defined processes, metrics, and goals for reliability. This may include retooling existing rituals and creating new ones.

  Collaborate across various teams to provide input into capacity planning; failure/reliability analysis; performance analysis; security and customer privacy analysis

  Participate in the incident manager on-call rotation to co-ordinate responses to Service Level Agreement (SLA) impacting incidents. Keeping relevant stakeholders and leadership apprised of details related to incident impact and status of resolution

  In addition, you have people management responsibilities including driving employee growth and development, executing projects, and managing performance, while continuing to evolve our infrastructure

  Embody our culture (https://careers.microsoft.com/v2/global/en/culture)  and values (https://www.microsoft.com/en-us/about/corporate-values)  

  Qualifications

  Required/Minimum Qualifications:

  8+ years technical experience in software engineering, network engineering, systems administration, or Site Reliability Engineeringo OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience in software engineering, network engineering, systems administration, or Site Reliability Engineering

  o OR Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, systems administration, or Site Reliability Engineering

  o OR Doctorate Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, systems administration, or Site Reliability Engineering

  3+ years of people management experience leading Site Reliability Engineers or livesite teams.

  6+ years of experience in a Site Reliability Engineering role building and operating systems with world-class reliability at huge scale (100m+ Monthly Active Usage).

  6+ years technical engineering experience with building large scale distributed systems using, but not limited to Golang, Java, Python, containers and container orchestration systems (such as Docker, Kubernetes, Apache Mesos), infrastructure as code (such as Terraform), databases (such as Postgres, data sharding), and Cloud Platforms (such as Microsoft Azure, Amazon Web Services, Google Cloud Platform).

  Additional/Preferred Qualifications:

  Demonstrated experience growing and coaching people, and acts as a role model for others.

  6+ years technical engineering experience with coding in languages including, but not limited to Golang, Java, or Python.

  6+ Experience with containers and container orchestration systems

  6+ Experience operating and evolving large-scale distributed systems in a cloud infrastructure (such as Kubernetes, Apache Mesos, Docker)

  6+ Experience with Infrastructure as code (Terraform)

  6+ Experience with large scale databases (Postgress, data sharding)

  6+ Experience with Linux, Ubuntu, Microsoft Azure, Amazon Web Services, Google Cloud Platform is preferred.

  Site Reliability Engineering M5 - The typical base pay range for this role across the U.S. is USD $133,600 - $256,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $173,200 - $282,200 per year.

  Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay

  Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations (https://careers.microsoft.com/v2/global/en/accessibility.html) .

Comments
Welcome to zdrecruit comments! Please keep conversations courteous and on-topic. To fosterproductive and respectful conversations, you may see comments from our Community Managers.
Sign up to post
Sort by
Show More Comments
SIMILAR JOBS
Area Supervisor
Our values start with our people, join a team that values you! We are the nation’s largest off-price retailer with over 2,000 stores, and a strong track record of success and growth. Our focus has al
Commercial Loan Closing Associate
As a Commercial Loan Closing Associate in Deal Set Up and Funding, your primary focus will be supporting the facilitation of syndicated loans of Fortune 500 companies through the deal closing process
Catering Services Worker - USC-Beaufort
Job Description The Catering Services Worker supports the event operations team in running successful catering orders and events within the location. The Catering Services Worker supports and carries
Citizens Banker
Description Starting Salary: $20 / hour and up Citizens Bankers devote their time getting to know their customers and building lasting relationships by providing trusted advice. Using your strong com
Hospital Service Technician With
JOB REQUIREMENTS: No weekends and immediate offers made, forpermanent/non-seasonal career opportunity with paid training available!At Stericycle, we deliver solutions that protect the environment,peo
Retail Sales - Women's Apparel - Natick Mall
Job Description The ideal Salesperson is passionate about fashion and styling and has the ability to cultivate and grow a customer following, both digitally and in-store. A day in the life… Set and a
Area Supervisor
Our values start with our people, join a team that values you! We are the nation’s largest off-price retailer with over 2,000 stores, and a strong track record of success and growth. Our focus has al
Senior Finance Manager, FP&A
Newell Brands is a leading $8.5B consumer products company with a portfolio of iconic brands such as Graco®, Coleman®, Oster®, Rubbermaid® and Sharpie®, and 28,000 talented employees around the world
Principal Software Engineer
Job Description Design, develop, troubleshoot, and debug software programs for databases, applications, tools, networks etc. As a member of the software engineering division, you will take an active
Aesthetician(Casual) - Kilolani Spa at Grand Wailea, A Waldorf Astoria Resort
An Aesthetician is responsible for providing professional facial, cosmetic and hair removal services for guests in the hotel's continuing effort to deliver outstanding guest service and financial pro
Copyright 2023-2026 - www.zdrecruit.com All Rights Reserved