This job was posted by https://illinoisjoblink.illinois.gov : For moreinformation, please see:https://illinoisjoblink.illinois.gov/jobs/11604160 Department
BSD CTD - Platform Engineering - Ops
About the Department
The Center for Translational Data Science (CTDS) at the University ofChicago is a research center whose mission is to develop the disciplineof translational data science to impactful problems in biology,medicine, healthcare, and the environment. We envision a world in whichresearchers have ready access to the data needed and the tools requiredto make data driven discoveries that increase our scientific knowledgeand improve the quality of life. We architect ecosystems of large-scalecommons of research data, computing resources, applications, tools, andservices for the broader research community to use data at scale topursue scientific inquiry and accelerate discovery. Learn more athttps://gdc.cancer.gov/, https://gen3.org/, https://stats.gen3.org/, andhttps://ctds.uchicago.edu/.
This at-will position is wholly or partially funded by contractual grantfunding which is renewed under provisions set by the grantor of thecontract. Employment will be contingent upon the continued receipt ofthese grant funds and satisfactory job performance.
Job Summary
As the Lead Cloud Operations Engineer, you will play a pivotal role indesigning, configuring, managing, and supporting our expansive cloudcomputing infrastructure. You will provide technical leadership to ateam responsible for overseeing the operations of 25,000+ cores and 15+PB of storage of controlled and open access cancer genomics data.Innovation is key, as you lead efforts to optimize infrastructure setup,configuration, and refresh, fostering collaboration and efficiencyacross various subsystems. Serving as both a technical leader andproject manager, you will guide the administration of operating systems,implement upgrades, and maintain security measures. Your advancedexperience in infrastructure operations, system administration willdrive the success of our dynamic and expanding infrastructure.
Responsibilities
Lead the design, configuration, management, and support of ourlarge-scale cloud computing infrastructure.Oversee operations of 25,000+ cores and 15+ PB of storage, builtprimarily on commodity hardware running GNU/Linux.Track, implement and lead security and compliance activities incooperation with information security team.Respond to operational incidents promptly, identifying operationalrisks and addressing them effectively, processing RMAs and reportingoperational data and statistics regularly.Innovate and foster innovation with them team and within theinfrastructure, including optimizing our infrastructure setup,refresh, and configuration to improve our operational efficienciesacross various subsystems.Lead support of the rapid growth of our existing large physicalinfrastructure in a hybrid model with public clouds in a secure,stable, and maintainable manner.Maintains broad technical knowledge of existing and emergingtechnologies, including developments in hardware offerings andpublic cloud offerings from Amazon Web Services, Microsoft Azure,and Google Cloud.Ensure and optimize operational efficiencies through automation,innovation, and inter- and intra-team collaboration.Lead and implement design, set up, provisioning, and deployment ofnew systems to support multi- and hybrid- cloud architecture andexpansion.Lead enhancement of infrastructure and application monitoring.Mentor, coach, and train other members on the team, serving as theirsource of technical leadership.Lead technical aspects of projects for systems administration team,including delegation of tasks and organizing and managing mee ings.Serve as scrum master/project manager for day-to-day work, preparingteam sprints, tracking velocity, notifying partners of changes anddeviations, etc.Solves complex problems to configure, install, upgrade, and maintainserver applications and hardware. Works to safeguard the integrityof computer software. Implements operating system enhancements toimprove the reliability and performance of the system.Guides the administration of operating systems, maintains security,and implements backup procedures for the organization\'s informationsystems and peripheral equipment, such as servers, desktops,printers, and storage devices.Provides expertise in planning and installing necessary patches andupgrades for servers and their associated storage, network,communications, and peripheral sub-systems. Installs and maintainsan appropriate level of intrusion detection, monitoring, andauditing software as required.Tracks compliance and maintains documentation for hardware,software, and service inventories for management reports.Performs other related work as needed.Minimum Qualifications
Education:
Minimum requirements include a college or university degree in relatedfield.
---
Work Exp