This job was posted by https://illinoisjoblink.illinois.gov : For moreinformation, please see:https://illinoisjoblink.illinois.gov/jobs/11604166 Department
BSD CTD - User Services - GDC
About the Department
The Center for Translational Data Science (CTDS) at the University ofChicago is a research center whose mission is to develop the disciplineof translational data science to impactful problems in biology,medicine, healthcare, and the environment. We envision a world in whichresearchers have ready access to the data needed and the tools requiredto make data driven discoveries that increase our scientific knowledgeand improve the quality of life. We architect ecosystems of large-scalecommons of research data, computing resources, applications, tools, andservices for the
broader research community to use data at scale to pursue scientificinquiry and accelerate discovery. Learn more at https://gdc.cancer.gov/,https://gen3.org/, https://stats.gen3.org/, andhttps://ctds.uchicago.edu/.
This at-will position is wholly or partially funded by contractual grantfunding which is renewed under provisions set by the grantor of thecontract. Employment will be contingent upon the continued receipt ofthese grant funds and satisfactory job performance.
Job Summary
The Lead Data Quality Engineer is a problem solver with an extensivebackground working in data integrity and testing to ensure high qualitydata and metadata is distributed to the cancer research community. Thisis an opportunity to elevate your leadership skills working with one ofthe world\'s largest collections of harmonized cancer genomic data. Thisrole focuses on the Genomic Data Commons, which is at the forefront ofboth cutting edge research and production systems supporting cancerresearch. Your role will be as the lead engineer for data quality andintegrity, joining a team of engineers developing innovativetechnologies in the pursuit of discovery through data-driven cancerresearch. You will focus on leading data quality efforts related to dataintegration, higher level data products, and distribution to the cancerresearch community, working as a leader across multiple teams to buildand automate frameworks such as anomaly detection, reporting, andalerting to ensure data quality. You will be the subject matter expertnot only in the data itself, but the systems as well to interrogate thedata and understand gaps in data quality. Data and metadata quality hasa broad scope, so you are expected to work collaboratively and exemplifyleadership across teams to determine priorities and best methods forachieving objectives.
Responsibilities
Lead the design of the data QA infrastructure and execution of testingprotocols to validate pipelines, integrated datasets, and data products.
Use a combination of exploratory, regression, and automated testing toensure data quality standards. Assess appropriate inclusion/exclusion ofdata based on project requirements.
Lead team in evaluation, maintenance, and development of datadictionaries and utilize data specification and code to validate data asit relates to quality.
Lead team in data release planning and implementation based on sponsorand collaborator requirements and data availability.
Proactively identify potential data issues and downstream impact.Identify existing data issues and perform research and root causeanalyses to determine resolution. Work collaboratively with softwareengineers, bioinformaticians, and partners to achieve and verifyresolution.
Establish and maintain processes and standards to improve data qualityassurance and implement efficiencies in data management.
Define measurements and metrics to conduct and present routine datareports to the project team and partners.
Lead the data acquisition and integration planning efforts includingdata modeling, data dictionary definit ons, and data harmonizationpipeline development.
Maintain a deep understanding of multiple genomic datasets and thetechnical data management software and processes of the underlyingsystem.
Define data quality and integrity criteria and implement a comprehensivedata quality management plan to lead key data QC efforts through teamcollaboration for all phases of the data management life cycle.
Technical Writing - Use knowledge and expertise to create, edit, andenhancesystem documentation, user documentation, scientific manuscripts,reporting, grant proposals and reports, and presentation materials. Stayabreast of broad knowledge of existing and emerging technologies and QCtools in the cancer genomics space.
Leads in the development of new systems, features, and tools. Solvescomplex problems and identifies opportunities for technical improvementand performance optimization. Reviews and tests code to ensureappropriate standards are met.
Utilizes in-depth technical knowledge of existing and emergingtechnologies, including public cloud offerings from Amazon Web Services,Microsoft Azure, and Google Cloud.
Acts as a technical consultant and resource for faculty research,teaching, and/or administrative projects.
\<