Home
/
Comprehensive
/
Principal Site Reliability Engineer
Principal Site Reliability Engineer-March 2024
Vancouver
Mar 19, 2026
About Principal Site Reliability Engineer

  What is Viva Engage?

  Viva Engage is the industry-defining social network for the enterprise. We provide a platform for millions of employees, including those from 85% of Fortune 500 companies, to build community and culture, share knowledge, and connect with their leaders and each other.

  Why Viva Engage?

  Acquired by Microsoft in 2012, Viva Engage combines the benefits of a startup - rapid innovation, cutting-edge technology, outsized individual impact - with the advantages of working for one of the most successful software companies in the world. We believe in mission-driven work and in this post-Covid world, our platform has become more indispensable than ever as it fosters connection and a sense of belonging among remote teams. #VivaEngage

  You will have:

  Autonomy and freedom to innovate

  Choice of the best of open source and Microsoft-internal technology

  The ability to experiment, A/B test, and make data-driven decisions

  Opportunity for outsized impact as part of a small but mighty team on a rapidly-growing product needed now more than ever.

  As a Principal Site Reliability Engineer in Viva Engage, you will have two critical accountabilities:

  The first is driving efforts to fully embrace site reliability engineering principals while building critical infrastructure, optimizing existing systems, and eliminating toil. You will lead efforts that combine software and systems engineering to build, scale and operate the large-scale conversation platform that powers Viva Engage experiences.

  The second expectation is to improve overall reliability for Viva Engage. This means guiding and influencing peers to develop missing capabilities, and driving changes to our culture and processes to make reliability a critical aspect of how we work. We have been growing rapidly to become a critical workload for many of the world’s largest organizations and are looking for you to help us get to the next level.

  Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

  Responsibilities

  Develop and execute on the observability and telemetry strategy

  Own the telemetry and monitoring infrastructure

  Continually seek deeper insights into the performance, reliability & scalability of our systems

  Improve service reliability for the entire Yammer team, by reducing mean time to recovery (MTTR)

  Help all of Yammer prevent service incidents altogether

  Qualifications

  Required/Minimum Qualifications

  8+ years technical experience in software engineering, network engineering, or systems administration

  OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience in software engineering, network engineering, or systems administration

  OR Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration

  OR Doctorate Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration.

  6+ years of experience building large scale distributed systems.

  6+ years of experience in a Site Reliability Engineering role building and operating systems with world-class reliability at huge scale

  Preferred Qualifications/Attributes

  Knowledge of log and metrics pipelines (ELK stack or cloud services)

  Troubleshooting skills and ability to trace request through an entire stack.

  Micro services development, deployment, and monitoring.

  Curious about reliability and performance, in all levels of the stack 

  Experience with large datasets and data migrations

  Azure | AWS | GCP automation 

  Site Reliability Engineering IC5 - The typical base pay range for this role across Canada is CAD $132,800 - CAD $247,200 per year.

  Find additional pay information here:

  https://careers.microsoft.com/v2/global/en/canada-pay-information.html

  Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations (https://careers.microsoft.com/v2/global/en/accessibility.html) .

Comments
Welcome to zdrecruit comments! Please keep conversations courteous and on-topic. To fosterproductive and respectful conversations, you may see comments from our Community Managers.
Sign up to post
Sort by
Show More Comments
SIMILAR JOBS
Talent Solutions Manager
Randstad, the world’s leading partner for talent, is hiring a Talent Solutions Manager to ensure customer satisfaction by selling Randstad’s staffing solutions to prospective clients, expand business
Guest Service Agent (Part-Time) *free parking*
Earn paid time off from day 1 Free parking and free meals Option to be paid daily Discounted hotel stays for team members and family friends. Debt free educationWhat will I be doing? As a Guest Servic
UKG Dimensions Solution Architect (Integrations Consultant)
You are : An experienced UKG Dimensions Integration Consultant with a drive to succeed, a desire to learn, that will develop and grow our TO UKG business. The Work: The UKG (Legacy Kronos) Dimensions
HEDIS Business Information Consultant
HEDIS Business Information Consultant Location: This position will take part in Elevance Health's hybrid workforce strategy which includes virtual work and 1-2 days physically in office per week . Ass
Community Relations Representative III
Community Relations Representative III Location: Kansas, this role is open to any Kansas state resident who meets the minimum requirements regardless of your location within the state of Kansas. This
Operational Efficiency Leader
At Trane Technologies TM and through our businesses including Trane ® and Thermo King ® , we create innovative climate solutions for buildings, homes, and transportation that challenge what’s possible
Subcontract Management Staff
Description: You will be the Subcontract Management Staff who will support the Multi Domain Missile Systems Program Area HELLFIRE and JAGM Program. What You Will Be Doing As the Subcontract Management
Backend Developer
Introduction Seeking new possibilities and always staying curious, we are a team dedicated to creating the world's leading AI-powered, cloud-native software solutions for our customers. Our renowned l
Workday Certified Financials R2R Consultant – Education & Government
What’s in it for YouHelp HR and Finance leaders define and execute their strategy and give them the adaptability they need in a fast-changing world.To help you take your skillset beyond Workday techno
Alternate Transportation Coordinator (HHG Military Move)
Title: Alternate Transportation Coordinator (HHG Military Move) HomeSafe Alliance is the single global household goods movement manager of over 300,000 Military Household Goods moves per year for USTR
Copyright 2023-2026 - www.zdrecruit.com All Rights Reserved