R01HG011787
Project Grant
Overview
Grant Description
A Unified Quantitative Modeling Strategy for Multiplex Assays of Variant Effect - Project Summary / Abstract
A central goal of genomics is to understand the relationship between genotype and phenotype. In recent years, the ability to quantitatively study genotype-phenotype maps has been revolutionized by the development of multiplex assays of variant effect (MAVEs), which measure molecular phenotypes for thousands to millions of genotypic variants in parallel. MAVE is an umbrella term that includes massively parallel reporter assays for studies of DNA or RNA regulatory sequences, as well as deep mutational scanning assays of proteins or structural RNAs.
The rapid adoption of MAVE techniques across multiple genomic disciplines has created an acute need for computational methods that can robustly and reproducibly infer quantitative genotype-phenotype (G-P) maps from the large datasets that MAVEs produce. Here, we propose a unified conceptual and computational framework for quantitatively modeling G-P maps from MAVE data. This proposal is motivated by our realization that accounting for the noise and nonlinearities that are omnipresent in MAVE experiments requires explicit modeling of both the MAVE measurement process and the G-P map of interest. This joint inference strategy is more computationally demanding than most MAVE analysis methods, but it is feasible using modern deep learning frameworks.
Our extensive preliminary data show that this modeling strategy is able to recover high-precision G-P maps even in the presence of major confounding effects, and thus has the potential to benefit MAVE studies in multiple areas of genomics.
Aim 1 will develop methods for modeling the measurement processes that arise in diverse MAVE experimental designs.
Aim 2 will develop general methods for modeling genetic interactions within G-P maps and will use these methods in conjunction with new experiments to elucidate the molecular mechanism of a recently approved drug that targets alternative mRNA splicing.
Aim 3 will develop methods for inferring G-P maps that reflect biophysical models of gene regulation, including both thermodynamic (i.e., quasi-equilibrium) and kinetic (i.e., non-equilibrium steady-state) models. These methods will then be used, in conjunction with new MAVE experiments, to develop a biophysical model for how a pleiotropic transcription factor regulates gene expression throughout the Escherichia coli genome.
Aim 4 will study and develop methods for treating gauge freedoms and sloppy modes in the above classes of models, thereby facilitating the comparison, interpretation, and exploration of inferred G-P maps.
All of the computational techniques we develop will be incorporated into a robust and easy-to-use Python package called MAVE-NN. We will benchmark MAVE-NN on a diverse array of MAVE datasets, including published datasets and data generated as part of this project.
In all, this work will fill a major need in the analysis of MAVE experiments, yielding a robust, flexible, and scalable computational platform that will help accelerate the use of MAVEs for understanding the effects of human genetic variation at the genomic scale.
A central goal of genomics is to understand the relationship between genotype and phenotype. In recent years, the ability to quantitatively study genotype-phenotype maps has been revolutionized by the development of multiplex assays of variant effect (MAVEs), which measure molecular phenotypes for thousands to millions of genotypic variants in parallel. MAVE is an umbrella term that includes massively parallel reporter assays for studies of DNA or RNA regulatory sequences, as well as deep mutational scanning assays of proteins or structural RNAs.
The rapid adoption of MAVE techniques across multiple genomic disciplines has created an acute need for computational methods that can robustly and reproducibly infer quantitative genotype-phenotype (G-P) maps from the large datasets that MAVEs produce. Here, we propose a unified conceptual and computational framework for quantitatively modeling G-P maps from MAVE data. This proposal is motivated by our realization that accounting for the noise and nonlinearities that are omnipresent in MAVE experiments requires explicit modeling of both the MAVE measurement process and the G-P map of interest. This joint inference strategy is more computationally demanding than most MAVE analysis methods, but it is feasible using modern deep learning frameworks.
Our extensive preliminary data show that this modeling strategy is able to recover high-precision G-P maps even in the presence of major confounding effects, and thus has the potential to benefit MAVE studies in multiple areas of genomics.
Aim 1 will develop methods for modeling the measurement processes that arise in diverse MAVE experimental designs.
Aim 2 will develop general methods for modeling genetic interactions within G-P maps and will use these methods in conjunction with new experiments to elucidate the molecular mechanism of a recently approved drug that targets alternative mRNA splicing.
Aim 3 will develop methods for inferring G-P maps that reflect biophysical models of gene regulation, including both thermodynamic (i.e., quasi-equilibrium) and kinetic (i.e., non-equilibrium steady-state) models. These methods will then be used, in conjunction with new MAVE experiments, to develop a biophysical model for how a pleiotropic transcription factor regulates gene expression throughout the Escherichia coli genome.
Aim 4 will study and develop methods for treating gauge freedoms and sloppy modes in the above classes of models, thereby facilitating the comparison, interpretation, and exploration of inferred G-P maps.
All of the computational techniques we develop will be incorporated into a robust and easy-to-use Python package called MAVE-NN. We will benchmark MAVE-NN on a diverse array of MAVE datasets, including published datasets and data generated as part of this project.
In all, this work will fill a major need in the analysis of MAVE experiments, yielding a robust, flexible, and scalable computational platform that will help accelerate the use of MAVEs for understanding the effects of human genetic variation at the genomic scale.
Awardee
Funding Goals
NHGRI SUPPORTS THE DEVELOPMENT OF RESOURCES AND TECHNOLOGIES THAT WILL ACCELERATE GENOME RESEARCH AND ITS APPLICATION TO HUMAN HEALTH AND GENOMIC MEDICINE. A CRITICAL PART OF THE NHGRI MISSION CONTINUES TO BE THE STUDY OF THE ETHICAL, LEGAL AND SOCIAL IMPLICATIONS (ELSI) OF GENOME RESEARCH. NHGRI ALSO SUPPORTS THE TRAINING AND CAREER DEVELOPMENT OF INVESTIGATORS AND THE DISSEMINATION OF GENOME INFORMATION TO THE PUBLIC AND TO HEALTH PROFESSIONALS. THE SMALL BUSINESS INNOVATION RESEARCH (SBIR) PROGRAM IS USED TO INCREASE PRIVATE SECTOR COMMERCIALIZATION OF INNOVATIONS DERIVED FROM FEDERAL RESEARCH AND DEVELOPMENT, TO INCREASE SMALL BUSINESS PARTICIPATION IN FEDERAL RESEARCH AND DEVELOPMENT, AND TO FOSTER AND ENCOURAGE PARTICIPATION OF SOCIALLY AND ECONOMICALLY DISADVANTAGED SMALL BUSINESS CONCERNS AND WOMEN-OWNED SMALL BUSINESS CONCERNS IN TECHNOLOGICAL INNOVATION. THE SMALL BUSINESS TECHNOLOGY TRANSFER (STTR) PROGRAM IS USED TO FOSTER SCIENTIFIC AND TECHNOLOGICAL INNOVATION THROUGH COOPERATIVE RESEARCH AND DEVELOPMENT CARRIED OUT BETWEEN SMALL BUSINESS CONCERNS AND RESEARCH INSTITUTIONS, TO FOSTER TECHNOLOGY TRANSFER BETWEEN SMALL BUSINESS CONCERNS AND RESEARCH INSTITUTIONS, TO INCREASE PRIVATE SECTOR COMMERCIALIZATION OF INNOVATIONS DERIVED FROM FEDERAL RESEARCH AND DEVELOPMENT, AND TO FOSTER AND ENCOURAGE PARTICIPATION OF SOCIALLY AND ECONOMICALLY DISADVANTAGED SMALL BUSINESS CONCERNS AND WOMEN-OWNED SMALL BUSINESS CONCERNS IN TECHNOLOGICAL INNOVATION.
Grant Program (CFDA)
Awarding / Funding Agency
Place of Performance
Cold Spring Harbor,
New York
117242209
United States
Geographic Scope
Single Zip Code
Related Opportunity
Analysis Notes
Amendment Since initial award the total obligations have increased 309% from $787,904 to $3,225,506.
Cold Spring Harbor Laboratory was awarded
Unified Quantitative Modeling Strategy Multiplex Assays of Variant Effect
Project Grant R01HG011787
worth $3,225,506
from National Human Genome Research Institute in June 2022 with work to be completed primarily in Cold Spring Harbor New York United States.
The grant
has a duration of 4 years 9 months and
was awarded through assistance program 93.172 Human Genome Research.
The Project Grant was awarded through grant opportunity NIH Research Project Grant (Parent R01 Clinical Trial Not Allowed).
Status
(Ongoing)
Last Modified 5/5/25
Period of Performance
6/15/22
Start Date
3/31/27
End Date
Funding Split
$3.2M
Federal Obligation
$0.0
Non-Federal Obligation
$3.2M
Total Obligated
Activity Timeline
Transaction History
Modifications to R01HG011787
Additional Detail
Award ID FAIN
R01HG011787
SAI Number
R01HG011787-2479717293
Award ID URI
SAI UNAVAILABLE
Awardee Classifications
Other
Awarding Office
75N400 NIH National Human Genome Research Institute
Funding Office
75N400 NIH National Human Genome Research Institute
Awardee UEI
GV31TMFLPY88
Awardee CAGE
0DHK5
Performance District
NY-03
Senators
Kirsten Gillibrand
Charles Schumer
Charles Schumer
Budget Funding
| Federal Account | Budget Subfunction | Object Class | Total | Percentage |
|---|---|---|---|---|
| National Human Genome Research Institute, National Institutes of Health, Health and Human Services (075-0891) | Health research and training | Grants, subsidies, and contributions (41.0) | $1,593,389 | 100% |
Modified: 5/5/25