U24HG012343
Cooperative Agreement
Overview
Grant Description
A Comprehensive Genomic Community Resource of Transcriptional Regulation - Project Summary/Abstract
The Human Genome Project (HGP) completed the first draft of the human genome sequence two decades ago. The HGP revealed that human complexity arises from only approximately 20,000 coding genes, roughly the same number as much simpler organisms such as nematodes. Intricate patterns of transcriptional regulation mediated by non-coding regulatory elements specify the myriad cell types and states required for human complexity. Genome-wide association studies have subsequently identified thousands of disease-associated variants, many of which interrupt the function of these non-coding elements to disrupt transcriptional regulation. Thus, in order to better understand human physiology and pathophysiology, comprehensive atlases of regulatory elements are essential.
Many previous efforts, including the International Human Epigenome Consortium (IHEC), the FANTOM Consortium, the Roadmap Epigenomics Project, and the ENCODE Project, have aimed to build comprehensive collections of regulatory elements, as well as computational models to better predict regulatory activity and understand the sequence features underlying regulatory function. ENCODE (2003-2022) is a large-scale consortium effort which aims to annotate every functional non-coding element of the human genome. During our work on the project, we built a registry of approximately 1 million human candidate cis-regulatory elements (CCREs). We further developed deep-learning approaches which model the transcription factor motif syntax that underlies element function at base-pair resolution and built two web-based resources, SCREEN and FACTORBOOK, to make our results accessible to the scientific community.
Here, we propose to extend this framework to build the Community Resource for Transcriptional Regulation (CRTR), a comprehensive atlas of non-coding regulatory elements and machine-learning models which will encompass community and consortium deep-sequencing data, both bulk and single cell, across a broad array of cell types and states. Our project has five aims.
First, we aim to curate community and consortium data for inclusion in CRTR and perform uniform processing and quality control. Second, we aim to train deep-learning sequence models on bulk epigenetic datasets to identify transcription factor motif syntax driving regulatory element activity in distinct tissues and cell types. Third, we aim to train sequence models on single cell datasets to identify transcription factor motif syntax driving transcriptional regulation in high-resolution cell states and during cell state transitions. Fourth, we aim to use the aforementioned results to build comprehensive benchmark datasets and machine-learning model collections, which will aid future analysts in designing new models to predict regulatory readouts. Fifth, we aim to build a state-of-the-art web-based user interface to enable users to perform integrative analyses and in silico experimentation with CRTR, and hold workshops and other outreach to maximize the impact of the resource and its accessibility to the broader scientific community.
The Human Genome Project (HGP) completed the first draft of the human genome sequence two decades ago. The HGP revealed that human complexity arises from only approximately 20,000 coding genes, roughly the same number as much simpler organisms such as nematodes. Intricate patterns of transcriptional regulation mediated by non-coding regulatory elements specify the myriad cell types and states required for human complexity. Genome-wide association studies have subsequently identified thousands of disease-associated variants, many of which interrupt the function of these non-coding elements to disrupt transcriptional regulation. Thus, in order to better understand human physiology and pathophysiology, comprehensive atlases of regulatory elements are essential.
Many previous efforts, including the International Human Epigenome Consortium (IHEC), the FANTOM Consortium, the Roadmap Epigenomics Project, and the ENCODE Project, have aimed to build comprehensive collections of regulatory elements, as well as computational models to better predict regulatory activity and understand the sequence features underlying regulatory function. ENCODE (2003-2022) is a large-scale consortium effort which aims to annotate every functional non-coding element of the human genome. During our work on the project, we built a registry of approximately 1 million human candidate cis-regulatory elements (CCREs). We further developed deep-learning approaches which model the transcription factor motif syntax that underlies element function at base-pair resolution and built two web-based resources, SCREEN and FACTORBOOK, to make our results accessible to the scientific community.
Here, we propose to extend this framework to build the Community Resource for Transcriptional Regulation (CRTR), a comprehensive atlas of non-coding regulatory elements and machine-learning models which will encompass community and consortium deep-sequencing data, both bulk and single cell, across a broad array of cell types and states. Our project has five aims.
First, we aim to curate community and consortium data for inclusion in CRTR and perform uniform processing and quality control. Second, we aim to train deep-learning sequence models on bulk epigenetic datasets to identify transcription factor motif syntax driving regulatory element activity in distinct tissues and cell types. Third, we aim to train sequence models on single cell datasets to identify transcription factor motif syntax driving transcriptional regulation in high-resolution cell states and during cell state transitions. Fourth, we aim to use the aforementioned results to build comprehensive benchmark datasets and machine-learning model collections, which will aid future analysts in designing new models to predict regulatory readouts. Fifth, we aim to build a state-of-the-art web-based user interface to enable users to perform integrative analyses and in silico experimentation with CRTR, and hold workshops and other outreach to maximize the impact of the resource and its accessibility to the broader scientific community.
Funding Goals
NHGRI SUPPORTS THE DEVELOPMENT OF RESOURCES AND TECHNOLOGIES THAT WILL ACCELERATE GENOME RESEARCH AND ITS APPLICATION TO HUMAN HEALTH AND GENOMIC MEDICINE. A CRITICAL PART OF THE NHGRI MISSION CONTINUES TO BE THE STUDY OF THE ETHICAL, LEGAL AND SOCIAL IMPLICATIONS (ELSI) OF GENOME RESEARCH. NHGRI ALSO SUPPORTS THE TRAINING AND CAREER DEVELOPMENT OF INVESTIGATORS AND THE DISSEMINATION OF GENOME INFORMATION TO THE PUBLIC AND TO HEALTH PROFESSIONALS. THE SMALL BUSINESS INNOVATION RESEARCH (SBIR) PROGRAM IS USED TO INCREASE PRIVATE SECTOR COMMERCIALIZATION OF INNOVATIONS DERIVED FROM FEDERAL RESEARCH AND DEVELOPMENT, TO INCREASE SMALL BUSINESS PARTICIPATION IN FEDERAL RESEARCH AND DEVELOPMENT, AND TO FOSTER AND ENCOURAGE PARTICIPATION OF SOCIALLY AND ECONOMICALLY DISADVANTAGED SMALL BUSINESS CONCERNS AND WOMEN-OWNED SMALL BUSINESS CONCERNS IN TECHNOLOGICAL INNOVATION. THE SMALL BUSINESS TECHNOLOGY TRANSFER (STTR) PROGRAM IS USED TO FOSTER SCIENTIFIC AND TECHNOLOGICAL INNOVATION THROUGH COOPERATIVE RESEARCH AND DEVELOPMENT CARRIED OUT BETWEEN SMALL BUSINESS CONCERNS AND RESEARCH INSTITUTIONS, TO FOSTER TECHNOLOGY TRANSFER BETWEEN SMALL BUSINESS CONCERNS AND RESEARCH INSTITUTIONS, TO INCREASE PRIVATE SECTOR COMMERCIALIZATION OF INNOVATIONS DERIVED FROM FEDERAL RESEARCH AND DEVELOPMENT, AND TO FOSTER AND ENCOURAGE PARTICIPATION OF SOCIALLY AND ECONOMICALLY DISADVANTAGED SMALL BUSINESS CONCERNS AND WOMEN-OWNED SMALL BUSINESS CONCERNS IN TECHNOLOGICAL INNOVATION.
Grant Program (CFDA)
Awarding / Funding Agency
Place of Performance
Worcester,
Massachusetts
01655
United States
Geographic Scope
Single Zip Code
Related Opportunity
Analysis Notes
Amendment Since initial award the total obligations have increased 357% from $833,468 to $3,806,210.
University Of Massachusetts Medical School was awarded
A Comprehensive Genomic Community Resource of Transcriptional Regulation
Cooperative Agreement U24HG012343
worth $3,806,210
from National Human Genome Research Institute in June 2022 with work to be completed primarily in Worcester Massachusetts United States.
The grant
has a duration of 4 years 9 months and
was awarded through assistance program 93.172 Human Genome Research.
The Cooperative Agreement was awarded through grant opportunity Genomic Community Resources (U24 Clinical Trial Not Allowed).
Status
(Ongoing)
Last Modified 5/5/25
Period of Performance
6/1/22
Start Date
3/31/27
End Date
Funding Split
$3.8M
Federal Obligation
$0.0
Non-Federal Obligation
$3.8M
Total Obligated
Activity Timeline
Subgrant Awards
Disclosed subgrants for U24HG012343
Transaction History
Modifications to U24HG012343
Additional Detail
Award ID FAIN
U24HG012343
SAI Number
U24HG012343-3466235988
Award ID URI
SAI UNAVAILABLE
Awardee Classifications
Public/State Controlled Institution Of Higher Education
Awarding Office
75N400 NIH National Human Genome Research Institute
Funding Office
75N400 NIH National Human Genome Research Institute
Awardee UEI
MQE2JHHJW9Q8
Awardee CAGE
6R004
Performance District
MA-02
Senators
Edward Markey
Elizabeth Warren
Elizabeth Warren
Budget Funding
Federal Account | Budget Subfunction | Object Class | Total | Percentage |
---|---|---|---|---|
National Human Genome Research Institute, National Institutes of Health, Health and Human Services (075-0891) | Health research and training | Grants, subsidies, and contributions (41.0) | $1,845,671 | 100% |
Modified: 5/5/25