Home
/
Comprehensive
/
Software Engineering Manager - Cloud Infrastructure Services, DGX Cloud
Software Engineering Manager - Cloud Infrastructure Services, DGX Cloud-March 2024
Santa Clara
Mar 27, 2026
About Software Engineering Manager - Cloud Infrastructure Services, DGX Cloud

NVIDIA's invention of the GPUs ignited modern AI — the next era of computing — with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world. Today, we are increasingly known as “the AI computing company”. We are looking to grow our company, and grow our teams with the smartest people in the world. We are looking for you. NVIDIA's GPU is hitting in market for Deep learning which is used in the research community and in industry to help solve many big data problems such as computer vision, speech recognition Translation, Life Science, Image recognition, and natural language processing. NVIDIA GPU Cloud (NGC) is a GPU-accelerated platform that runs everywhere. Data scientists and researchers can now rapidly build, train, and deploy neural network models to address some of the most complicated AI challenges. In this Environment, NVIDIA GPU Cloud computing team is looking for leaders to work for world class Deep learning platform.

As a Site Reliability Engineering leader you will manage the operations of our observability platform focused on multi-colo distributed NVIDIA GPU cloud clusters. You will be the leader for all aspects of cluster operational excellence planning and grow your team. You thrive in a fast-paced iterative engineering environment and have experience delivering scalable distributed systems. Most importantly, you will have a track record of having past teams respect you as both a technical leader and manager. NVIDIA DGX Cloud Computing team is responsible to work all across the company, in areas such as information retrieval, artificial intelligence, natural language processing, distributed computing, large-scale system design, Life science, Image Processing; the list goes on and is growing every day in Machine Learning. Operating with scale and speed, our world-class software engineers are just getting started -- and as a manager, you guide the way to solve reliability both our internally critical and our externally-visible systems.

What you'll be doing:

Manage a team of Site Reliability engineers, including task planning and code reviews.

Define team strategy and roadmap, and drive adoption of test infrastructure across several product areas in DGX Cloud Computing environment.

Drive technical projects and provide leadership in an innovative and fast-paced environment.

Be responsible for the overall planning, actioning and success of technical projects.

Work closely with product management teams to ensure best-in-class product development.

Contribute technically to the technical projects for DGX Cloud Computing Services.

Interact with key internal stakeholders to provide operational and financial clarity on technical spend

Drive Decision making, visibility and operational rigor across business analytic initiatives such as budget and project portfolio reporting. Lead efforts related to executive reporting, dashboards, and operational CTO metrics focusing on continuous improvement and evolution to maximize decision making and executive visibility.

What we need to see:

10+ overall years of Experience in engineering. 3+ years of leadership. Bachelor / Master degree in Computer Science, or equivalent experience.

Experience in Containers / Virtualization environments/ Cluster solutions Experience in managing Technical Support / DevOps teams. Comfortable to Commit to Excellence and deliver projects in tight deadlines.

Strong Knowledge in Unix/Linux. Experience in a minimum of two of the following programming languages. Perl, Python, GoLang.

Experience implementing tools, process, internal instrumentation, methodologies and resolving blockages

Experience in designing and implementing large-scale distributed systems.

Demonstrated people management and leadership skills, the proven track record of mentoring and coaching team members.

Ability to quickly learn and evaluate new technologies.

Ability to influence and establish relationships with other software and IT functional groups such as development, server, storage and security teams.

Ways to stand out from the crowd:

Experience in using or running large private and public cloud systems based on Kubernetes, OpenStack and Docker.

Experience running Grafana, OpenTelemetry, Prometheus, and similar observability focused tools

Interest in crafting, analyzing and fixing large-scale distributed systems

We have some of the most forward-thinking and hardworking people in the world working for us and, due to unprecedented growth, our exclusive engineering teams are rapidly growing. If you're a creative and autonomous engineer with a real passion for technology, we want to hear from you!

The base salary range is 200,000 USD - 385,250 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits . NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Comments
Welcome to zdrecruit comments! Please keep conversations courteous and on-topic. To fosterproductive and respectful conversations, you may see comments from our Community Managers.
Sign up to post
Sort by
Show More Comments
SIMILAR JOBS
Dispatcher
Dispatcher Location19 Natalie Way Plymouth, Massachusetts 02360 USPhone NumberCategoriesOperation SupportReq IDJR1372 Dispatcher (Open) First for a reasonFirst Student is the largest school transport
Intermediate Compliance Specialist (Hybrid Work Option)
36718BR Requisition ID: 36718BR Business Unit: COR Job Description: CDM Smith is seeking an Intermediate Compliance Specialist to join our Corporate Compliance Team. This individual performs basic to
Pharmacy Technician
Bring your heart to CVS Health. Every one of us at CVS Health shares a single, clear purpose: Bringing our heart to every moment of your health. This purpose guides our commitment to deliver enhanced
L2 Customer Technical Support Analyst - MICROS - Simphony/Ebusiness
Job Description L2 Customer Technical Support Analyst - MICROS - Simphony/Ebusiness Location: Orlando, FL or Columbia, MD highly preferred No visa sponsorship is available for this position. As a mem
Water / Wastewater Engineering Manager 6 - US Hybrid
Water / Wastewater Engineering Manager 6 - US Hybrid Date: Jan 23, 2024 Location: US Company: Black & Veatch Family of Companies Together, we own our company, our future, and our shared success.
Phlebotomist
Description Want to Expand your career-development potential, your ability to help donors and patients, and your access to professional opportunities? We’re growing fast. [You can, too!] There are so
Creative Lead / Designer II
Company Summary DISH, an EchoStar Company, is a Fortune 250 that is reimagining the future of connectivity. For over 40 years, we’ve been challenging the status quo and evolving our company to antici
Health Technician (Dietetic)
Summary The Dayton Veterans Affairs Medical Center (VAMC) is recruiting for a Health Technician. The Health Technician will function with in Nutrition and Food Services. The primary purpose of the po
Senior Member of Technical Staff (JoinOCI-SDE)
Job Description We are seeking a strong engineer to join our team which is focused on building and maintaining a scalable software control platform for Compute Infrastructure. Major focus areas of so
Hotel Laundry Attendant
Req ID: 431582 Address: 7101 W. Sundust Rd Chandler, AZ, 85226 Welcome to Love’s! * * Where People are the Heart of Our Success * * Hotel Laundry Attendant Laundry Attendants are expected to maintain
Copyright 2023-2026 - www.zdrecruit.com All Rights Reserved