Do you want to work on the cutting edge of distributed systems, storage, networking & cluster technologies and be part of the infrastructure for private and hybrid cloud? Do you want to contribute to the critical solutions on which Enterprise Cloud Deployments depend, while being a part of a fast-paced and energetic team? Our products like Azure Stack-HCI (https://azure.microsoft.com/en-us/overview/azure-stack/hci/) enable innovative approaches to building hybrid datacenters that are resilient to hardware and software faults, scale to leverage the capabilities of the latest hardware, and allow you to build highly performant solutions that are best-in-class, and yet low cost.
Some of the opportunities you will have by being part of our journey include: making our software stack resilient to faults to meet service availability expectations, leverage new classes of hardware to drive up performance, instrument product code to gather data-driven insights to ensure we are continuing to delight our customers. You will be part of a team that is delivering solutions that allow hosting solution providers to handle the ever growing storage capacities and increased performance expectations while reducing their maintenance costs through improved diagnostics. If you want to learn about storage, high availability, file management, file systems, caching, performance, multi-tiered storage hierarchies and of course OS internals-this is the place for you. Cloud Scale, Virtualization, elasticity, flexible provisioning, containers – building a platform for the next decade of cloud services is an exciting adventure that we would love for you to be a part of.
Team/ Work Details / Job Summary
Our team's work directly stems from the #1 priority of the overall E+P Azure organization, which is
Azure, and the OS Platforms powering it, is the world's most stable, reliable, secure, and performant cloud for Mission Critical workloads. Quality is the feeling of craftsmanship that gives customers an innate sense of confidence they can trust their businesses with our services. H+S builds systems and platforms which assist the entire quality lifecycle, from shifting left in design, development, and testing, through deployment and operations.
The position is for Reliability Engineering in the Server Fundamentals Team
Top level work - Ensure Windows Server vNext is reliable, at scale, for customers. The position will have special focus on ARM64 Server as this will be our first time offering the OS on Azure as a guest OS in VMs
The team's core work is around characterizing Reliability, identifying and strengthening areas of weakness, and building resiliency in the Core Platform. The team builds an arsenal of workloads, tools, tests and monitoring mechanisms to do this effectively and at scale. Careful analysis of Reliability issues seen in customer deployments, both in Azure and on-prem fuels this work. Debugging issues identified by our workloads and validation systems, working with peer engineering teams to prioritize and address core issues is critical
Create automated infrastructure to do the above work at scale.
Create automated infrastructure to catch such reliability issues as close to development as possible
Analysis and investigations, development and coding, and data sleuthing and data engineering are the most common activities of the role. Exact splits are tough to estimate and will vary depending on phase of the project cycle and where we are in the release cycle. A rough approximation would be
Development and testing: 25%
Data Science: 25%
Analysis: 50%
Team members get exposed to various aspects of Windows Server, Core OS and integration with Azure
Responsibilities
Responsibilities
In our team you will be exposed to cutting edge development from user-mode to kernel-mode in C/C++/C#
We pride ourselves in developing ingenious validation tools to deliver on reliability, stress, and fault-injection (to name a few) metrics and perform work load modelling and analysis. We invest heavily in telemetry, and you will be responsible for the entire data pipeline, from instrumenting our product code, to mining big data smartly
You will be responsible for defining the metrics and measures of product success
You will code and implement trouble shooters that ship in the box, deeply understand our platform, instrumenting them, and then ultimately deriving insights that in turn will directly help shape what gets built in iterative cycles
Your work will directly help gauge product quality in the brave new world of Windows as a service, validate and inform design choices, in understanding how our customers use and deploy our solutions and help shape our future product investments
Qualifications
Required Qualifications
Bachelor’s degree or higher in Computer Science, EE, or other technical/engineering discipline
4+ years of experience in software development experience with solid design and coding skills
Team player, highly effective collaboration skill cross teams and groups
Passion for learning and implementing the newest technologies
Preferred Qualifications
Strong Computer science background in Operating Systems and Data Structures
Knowledge of Storage Technologies
Data Analytics, Big Data, and Telemetry towards providing deeper understanding of End to End Customer Experiences
Solid interpersonal skills with a strong desire to work in a collaborative environment on a fast paced team
Motivation, Passion and Drive to build high quality products that delight our customers
#IDCAzureEPHiring
Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations (https://careers.microsoft.com/v2/global/en/accessibility.html) .