R01ES032470
Project Grant
Overview
Grant Description
Data Science Tools to Identify Robust Exposure-Phenotype Associations for Precision Medicine - Project Summary/Abstract
Phenotypic variability across demographically diverse populations is driven by environmental factors. The overall goal of this proposal is to deploy data science approaches to drive the discovery of associations between exposures (E) and phenotypes (P) in demographic diverse populations. We lack data science methods to associate, replicate, and prioritize exposure variables of the exposome (E) in phenotypes (P) and disease incidence (D), required for the delivery of precision medicine.
Observational studies are fraught with four unsolved data science challenges. First, E-based studies are limited to associating a few hypothesized exposure-phenotype pairs (E-P) at a time, leading to a fragmented literature of environmental associations. Machine learning (ML) approaches for feature selection and prediction hold promise; however, most extant E-based cohorts contain missing data, challenging the use of ML to detect complex E-P associations. Third, biases such as confounding and study design influence associations and hinder translation. Fourth, there are few well-powered data resources that systematically document longitudinal E-P and E-D associations across massive precision medicine. It is a challenge to systematically associate a number of exposures in multiple phenotypes and replicate these associations across cohorts (Aim 1).
The "vibration of effects," or the degree to which associations change as a function of study design (e.g., analytic method, sample size), and model choice is a hidden bias in observational studies (Aim 2). Third, an outstanding question is the degree to which environmental differences lead to health disparities. To address these challenges and gaps, we propose to:
Aim 1: Develop and test machine learning methods to associate multiple environmental exposure indicators with multiple phenotypes: EP-WAS. We hypothesize that exposures will explain a significant amount of variation in phenotype in populations and will deposit all data and models in a novel EP-WAS catalog.
Aim 2: Quantitate how study design influences associations between exposure biomarkers and phenotype. We will scale up, extend, and test a method called "vibration of effects" (VOE) to measure how study criteria influence the stability of associations (how reproducible associations are as a function of analytic choice).
Aim 3: Leverage EP-WAS and VOE to disentangle biological, demographic, and environmental influences of phenotypic disparities in hypercholesterolemia. We will deploy EP-WAS and VOE packaged libraries in the largest cohort study to partition phenotypic variation across demographic groups in factors for hypercholesterolemia. We will equip the biomedical community with data science approaches for robust data-driven discovery and interpretation of exposure-phenotype factors in observational datasets, required for the identification of environmental health disparities.
For the first time, investigators will ascertain the collective role of the environment in heart disease at scale just in time for the All of Us program.
Phenotypic variability across demographically diverse populations is driven by environmental factors. The overall goal of this proposal is to deploy data science approaches to drive the discovery of associations between exposures (E) and phenotypes (P) in demographic diverse populations. We lack data science methods to associate, replicate, and prioritize exposure variables of the exposome (E) in phenotypes (P) and disease incidence (D), required for the delivery of precision medicine.
Observational studies are fraught with four unsolved data science challenges. First, E-based studies are limited to associating a few hypothesized exposure-phenotype pairs (E-P) at a time, leading to a fragmented literature of environmental associations. Machine learning (ML) approaches for feature selection and prediction hold promise; however, most extant E-based cohorts contain missing data, challenging the use of ML to detect complex E-P associations. Third, biases such as confounding and study design influence associations and hinder translation. Fourth, there are few well-powered data resources that systematically document longitudinal E-P and E-D associations across massive precision medicine. It is a challenge to systematically associate a number of exposures in multiple phenotypes and replicate these associations across cohorts (Aim 1).
The "vibration of effects," or the degree to which associations change as a function of study design (e.g., analytic method, sample size), and model choice is a hidden bias in observational studies (Aim 2). Third, an outstanding question is the degree to which environmental differences lead to health disparities. To address these challenges and gaps, we propose to:
Aim 1: Develop and test machine learning methods to associate multiple environmental exposure indicators with multiple phenotypes: EP-WAS. We hypothesize that exposures will explain a significant amount of variation in phenotype in populations and will deposit all data and models in a novel EP-WAS catalog.
Aim 2: Quantitate how study design influences associations between exposure biomarkers and phenotype. We will scale up, extend, and test a method called "vibration of effects" (VOE) to measure how study criteria influence the stability of associations (how reproducible associations are as a function of analytic choice).
Aim 3: Leverage EP-WAS and VOE to disentangle biological, demographic, and environmental influences of phenotypic disparities in hypercholesterolemia. We will deploy EP-WAS and VOE packaged libraries in the largest cohort study to partition phenotypic variation across demographic groups in factors for hypercholesterolemia. We will equip the biomedical community with data science approaches for robust data-driven discovery and interpretation of exposure-phenotype factors in observational datasets, required for the identification of environmental health disparities.
For the first time, investigators will ascertain the collective role of the environment in heart disease at scale just in time for the All of Us program.
Funding Goals
TO FOSTER UNDERSTANDING OF HUMAN HEALTH EFFECTS OF EXPOSURE TO ENVIRONMENTAL AGENTS IN THE HOPE THAT THESE STUDIES WILL LEAD TO: THE IDENTIFICATION OF AGENTS THAT POSE A HAZARD AND THREAT OF DISEASE, DISORDERS AND DEFECTS IN HUMANS, THE DEVELOPMENT OF EFFECTIVE PUBLIC HEALTH OR DISEASE PREVENTION STRATEGIES, THE OVERALL IMPROVEMENT OF HUMAN HEALTH EFFECTS DUE TO ENVIRONMENTAL AGENTS, THE DEVELOPMENT OF PRODUCTS AND TECHNOLOGIES DESIGNED TO BETTER STUDY OR AMELIORATE THE EFFECTS OF ENVIRONMENTAL AGENTS, AND THE SUCCESSFUL TRAINING OF RESEARCH SCIENTISTS IN ALL AREAS OF ENVIRONMENTAL HEALTH RESEARCH. SUPPORTED GRANT PROGRAMS FOCUS ON THE FOLLOWING AREAS: (1) UNDERSTANDING BIOLOGICAL RESPONSES TO ENVIRONMENTAL AGENTS BY DETERMINING HOW CHEMICAL AND PHYSICAL AGENTS CAUSE PATHOLOGICAL CHANGES IN MOLECULES, CELLS, TISSUES, AND ORGANS, AND BECOME MANIFESTED AS RESPIRATORY DISEASE, NEUROLOGICAL, BEHAVIORAL AND DEVELOPMENTAL ABNORMALITIES, CANCER, AND OTHER DISORDERS, (2) DETERMINING THE MECHANISMS OF TOXICITY OF UBIQUITOUS AGENTS LIKE METALS, NATURAL AND SYNTHETIC CHEMICALS, PESTICIDES, AND MATERIALS SUCH AS NANOPARTICLES, AND NATURAL TOXIC SUBSTANCES, AND THEIR EFFECTS OF ON VARIOUS HUMAN ORGAN SYSTEMS, ON METABOLISM, ON THE ENDOCRINE AND IMMUNE SYSTEMS, AND ON OTHER BIOLOGICAL FUNCTIONS, (3) DEVELOPING AND INTEGRATING SCIENTIFIC KNOWLEDGE ABOUT POTENTIALLY TOXIC AND HAZARDOUS CHEMICALS BY CONCENTRATING ON TOXICOLOGICAL RESEARCH, TESTING, TEST DEVELOPMENT, VALIDATION AND RISK ESTIMATION, (4) IDENTIFYING INTERACTIONS BETWEEN ENVIRONMENTAL STRESSORS AND GENETIC SUSCEPTIBILITY AND UNDERSTANDING BIOLOGIC MECHANISMS UNDERLYING THESE INTERACTIONS, INCLUDING THE STUDY OF ENVIRONMENTAL INFLUENCES ON EPIGENOMICS AND TRANSCRIPTIONAL REGULATION, (5) CONDUCTING ENVIRONMENTAL PUBLIC HEALTH RESEARCH, INCLUDING IN AREAS OF ENVIRONMENTAL JUSTICE AND HEALTH DISPARITIES, THAT REQUIRES COMMUNITIES AS ACTIVE PARTICIPANTS IN ALL STAGES OF RESEARCH, DISSEMINATION, AND EVALUATION TO ADVANCE BOTH THE SCIENCE AND THE DEVELOPMENT OF PRACTICAL MATERIALS FOR USE IN COMMUNITIES, WITH A FOCUS ON TRANSLATING RESEARCH FINDINGS INTO TOOLS, MATERIALS, AND RESOURCES THAT CAN BE USED TO PREVENT, REDUCE, OR ELIMINATE ADVERSE HEALTH OUTCOMES CAUSED BY ENVIRONMENTAL EXPOSURES, (6) EXPANDING AND IMPROVING THE SBIR PROGRAM, TO INCREASE PRIVATE SECTOR COMMERCIALIZATION OF INNOVATIONS DERIVED FROM FEDERAL RESEARCH AND DEVELOPMENT, TO INCREASE SMALL BUSINESS PARTICIPATION IN FEDERAL RESEARCH AND DEVELOPMENT, AND TO FOSTER AND ENCOURAGE PARTICIPATION OF SOCIALLY AND ECONOMICALLY DISADVANTAGED SMALL BUSINESS CONCERNS AND WOMEN-OWNED SMALL BUSINESS CONCERNS IN TECHNOLOGICAL INNOVATION, (7) EXPANDING AND IMPROVING THE STTR PROGRAM TO STIMULATE AND FOSTER SCIENTIFIC AND TECHNOLOGICAL INNOVATION THROUGH COOPERATIVE RESEARCH AND DEVELOPMENT CARRIED OUT BETWEEN SMALL BUSINESS CONCERNS AND RESEARCH INSTITUTIONS, TO FOSTER TECHNOLOGY TRANSFER BETWEEN SMALL BUSINESS CONCERNS AND RESEARCH INSTITUTIONS, TO INCREASE PRIVATE SECTOR COMMERCIALIZATION OF INNOVATIONS DERIVED FROM FEDERAL RESEARCH AND DEVELOPMENT, AND TO FOSTER AND ENCOURAGE PARTICIPATION OF SOCIALLY AND ECONOMICALLY DISADVANTAGED SMALL BUSINESS CONCERNS AND WOMEN-OWNED SMALL BUSINESS CONCERNS IN TECHNOLOGICAL INNOVATION, (8) PROVIDING SUPPORT FOR BROADLY BASED MULTI-DISCIPLINARY RESEARCH AND TRAINING PROGRAMS IN ENVIRONMENTAL HEALTH .THESE PROGRAMS INCLUDE THE ENVIRONMENTAL HEALTH SCIENCES CORE CENTERS , WHICH SERVE AS NATIONAL FOCAL POINTS AND RESOURCES FOR RESEARCH AND MANPOWER DEVELOPMENT. THROUGH THESE PROGRAMS, NIEHS EXPECTS TO ACHIEVE THE LONG-RANGE GOAL OF DEVELOPING NEW CLINICAL AND PUBLIC HEALTH APPLICATIONS TO IMPROVE DISEASE PREVENTION, DIAGNOSIS, AND THERAPY. ADDITIONAL CENTERS PROGRAMS DEVELOPED IN RECENT YEARS, INCLUDE THE CENTERS FOR OCEANS AND HUMAN HEALTH (CO-FUNDED WITH NSF), CHILDREN'S ENVIRONMENTAL HEALTH CENTERS (CO-FUNDED WITH US EPA) AND THE AUTISM CENTERS OF EXCELLENCE (CO-FUNDED WITH OTHER NIH INSTITUTES), AND THE HUMAN HEALTH EXPOSURE ANALYSIS RESOURCE (HHEAR) PROGRAM, (9) SUPPORTING RESEARCH TRAINING PROGRAMS WHICH SERVE TO INCREASE THE POOL OF TRAINED RESEARCH MANPOWER WITH NEEDED EXPERTISE IN THE ENVIRONMENTAL HEALTH SCIENCES THROUGH SUPPORT OF INDIVIDUAL AND INSTITUTIONAL NATIONAL RESEARCH SERVICE AWARDS (NRSAS), (10) THE OUTSTANDING NEW ENVIRONMENTAL SCIENTIST PROGRAM WHICH PROVIDES FIRST TIME RESEARCH GRANT FUNDING TO OUTSTANDING JUNIOR SCIENTISTS IN THE FORMATIVE STAGES OF THEIR CAREER WHO ARE PROPOSING TO MAKE A LONG TERM COMMITMENT TO ENVIRONMENTAL HEALTH SCIENCES RESEARCH AND TO ADDRESS THE ADVERSE EFFECTS ON ENVIRONMENTAL EXPOSURES ON HUMAN BIOLOGY, HUMAN PATHOPHYSIOLOGY AND HUMAN DISEASE.
Grant Program (CFDA)
Awarding / Funding Agency
Place of Performance
Boston,
Massachusetts
021156030
United States
Geographic Scope
Single Zip Code
Related Opportunity
Analysis Notes
Amendment Since initial award the total obligations have increased 401% from $697,752 to $3,495,686.
President And Fellows Of Harvard College was awarded
Data Science for Robust Exposure-Phenotype Associations
Project Grant R01ES032470
worth $3,495,686
from the National Institute of Environmental Health Sciences in September 2021 with work to be completed primarily in Boston Massachusetts United States.
The grant
has a duration of 4 years 9 months and
was awarded through assistance program 93.113 Environmental Health.
The Project Grant was awarded through grant opportunity Research Project Grant (Parent R01 Clinical Trial Not Allowed).
Status
(Ongoing)
Last Modified 8/20/25
Period of Performance
9/10/21
Start Date
6/30/26
End Date
Funding Split
$3.5M
Federal Obligation
$0.0
Non-Federal Obligation
$3.5M
Total Obligated
Activity Timeline
Transaction History
Modifications to R01ES032470
Additional Detail
Award ID FAIN
R01ES032470
SAI Number
R01ES032470-224768928
Award ID URI
SAI UNAVAILABLE
Awardee Classifications
Private Institution Of Higher Education
Awarding Office
75NV00 NIH National Institute of Enviromental Health Sciences
Funding Office
75NV00 NIH National Institute of Enviromental Health Sciences
Awardee UEI
JDLVAVGYJQ21
Awardee CAGE
3Q2L2
Performance District
MA-07
Senators
Edward Markey
Elizabeth Warren
Elizabeth Warren
Budget Funding
Federal Account | Budget Subfunction | Object Class | Total | Percentage |
---|---|---|---|---|
National Institute of Environmental Health Sciences, National Institutes of Health, Health and Human Services (075-0862) | Health research and training | Grants, subsidies, and contributions (41.0) | $1,512,736 | 100% |
Modified: 8/20/25