2311632
Project Grant
Overview
Grant Description
GOALI: Frameworks: At-Scale Heterogeneous Data Based Adaptive Development Platform for Machine-Learning Models for Material and Chemical Discovery - This project seeks to establish a new technological paradigm and the software infrastructure necessary for the development of machine learning (ML) models capable of predicting the properties of unseen molecular and materials systems/structures, thus enabling modeling of atomic behavior and the computational discovery of new molecules and materials at significantly higher throughput than afforded by existing first principles (quantum) methods.
ML-enabled materials discovery is poised to play a critical role in addressing modern societal challenges such as energy sustainability and, as such, the technology and infrastructure developed by this project are expected to have a transformative impact across many scientific and engineering domains. The platform facilitates access, sharing, and discovery of vast amounts of first principles and experimental data, removing inefficiencies and accelerating scientific discovery by enabling the development of ML models on a scale previously inaccessible.
To achieve these goals, this project is carried out in partnership with Amazon Web Services (AWS), providing the necessary know-how for the development of specialized open-source tools for training ML models at scale. This project is committed to the advancement of diversity, equity, and inclusiveness in higher education, and as such it incorporates a variety of mechanisms to include underrepresented and low-income students (high-school and undergraduate) in its research activities across the four participating universities (New York University, University of Minnesota, University of Florida, and Brigham Young University), in addition to the mentoring of graduate students, the development of teaching materials, and workshops aimed at industrial outreach and training.
To assure alignment between the platform/software and community needs, this project is supported by an advisory board of experts in cyberinfrastructure development, machine learning, material and chemical sciences, and STEM outreach who evaluate and provide strategic advice to the PIs.
The key technological advance that serves as the basis of this work are foundation models, an approach for building ML systems in which a model trained on extremely large amounts of diverse and easily available data can be adapted to diverse applications with a small amount of additional model fitting (fine-tuning). This project thus focuses on the development of a foundation model, called FERMAT, for molecular and material property prediction, and ML interatomic potentials for modeling atomic behavior. FERMAT is to be delivered via an integrated adaptive platform in the form of a software package and an online framework for developing and deploying specialized ML models for materials and chemistry applications, called FERMAT APPS.
In collaboration with AWS, this project seeks to develop open-source software for training foundation models like FERMAT at scale on large amounts of highly heterogeneous and multi-modal data. The high data needs will be met by leveraging and significantly expanding the ColabFit Exchange, an online repository of first principles and experimental data optimized for training of ML models, in cooperation with a large number of materials and molecular data repositories, standards organizations, and existing cyberinfrastructures.
FERMAT and any ML model derived from it is designed to support uncertainty quantification (based on information geometry, Bayesian, and frequentist approaches) to ensure the robustness of predictions. As guiding target applications, this project considers two problems of scientific interest: 2D material-driven catalysis and the prediction of molecular crystal polymorphs.
This award by the Office of Advanced Cyberinfrastructure is jointly supported by the Division of Materials Research within the Directorate for Mathematical and Physical Sciences. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria. - Subawards are planned for this award.
ML-enabled materials discovery is poised to play a critical role in addressing modern societal challenges such as energy sustainability and, as such, the technology and infrastructure developed by this project are expected to have a transformative impact across many scientific and engineering domains. The platform facilitates access, sharing, and discovery of vast amounts of first principles and experimental data, removing inefficiencies and accelerating scientific discovery by enabling the development of ML models on a scale previously inaccessible.
To achieve these goals, this project is carried out in partnership with Amazon Web Services (AWS), providing the necessary know-how for the development of specialized open-source tools for training ML models at scale. This project is committed to the advancement of diversity, equity, and inclusiveness in higher education, and as such it incorporates a variety of mechanisms to include underrepresented and low-income students (high-school and undergraduate) in its research activities across the four participating universities (New York University, University of Minnesota, University of Florida, and Brigham Young University), in addition to the mentoring of graduate students, the development of teaching materials, and workshops aimed at industrial outreach and training.
To assure alignment between the platform/software and community needs, this project is supported by an advisory board of experts in cyberinfrastructure development, machine learning, material and chemical sciences, and STEM outreach who evaluate and provide strategic advice to the PIs.
The key technological advance that serves as the basis of this work are foundation models, an approach for building ML systems in which a model trained on extremely large amounts of diverse and easily available data can be adapted to diverse applications with a small amount of additional model fitting (fine-tuning). This project thus focuses on the development of a foundation model, called FERMAT, for molecular and material property prediction, and ML interatomic potentials for modeling atomic behavior. FERMAT is to be delivered via an integrated adaptive platform in the form of a software package and an online framework for developing and deploying specialized ML models for materials and chemistry applications, called FERMAT APPS.
In collaboration with AWS, this project seeks to develop open-source software for training foundation models like FERMAT at scale on large amounts of highly heterogeneous and multi-modal data. The high data needs will be met by leveraging and significantly expanding the ColabFit Exchange, an online repository of first principles and experimental data optimized for training of ML models, in cooperation with a large number of materials and molecular data repositories, standards organizations, and existing cyberinfrastructures.
FERMAT and any ML model derived from it is designed to support uncertainty quantification (based on information geometry, Bayesian, and frequentist approaches) to ensure the robustness of predictions. As guiding target applications, this project considers two problems of scientific interest: 2D material-driven catalysis and the prediction of molecular crystal polymorphs.
This award by the Office of Advanced Cyberinfrastructure is jointly supported by the Division of Materials Research within the Directorate for Mathematical and Physical Sciences. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria. - Subawards are planned for this award.
Awardee
Funding Goals
THE GOAL OF THIS FUNDING OPPORTUNITY, "CYBERINFRASTRUCTURE FOR SUSTAINED SCIENTIFIC INNOVATION", IS IDENTIFIED IN THE LINK: HTTPS://WWW.NSF.GOV/PUBLICATIONS/PUB_SUMM.JSP?ODS_KEY=NSF22632
Grant Program (CFDA)
Awarding / Funding Agency
Place of Performance
New York,
New York
10012-1019
United States
Geographic Scope
Single Zip Code
Related Opportunity
Analysis Notes
Amendment Since initial award the total obligations have increased 100% from $2,250,000 to $4,500,000.
New York University was awarded
ML-Enabled Adaptive Development Platform Material Chemical Discovery
Project Grant 2311632
worth $4,500,000
from the NSF Office of Advanced Cyberinfrastructure in October 2023 with work to be completed primarily in New York New York United States.
The grant
has a duration of 5 years and
was awarded through assistance program 47.070 Computer and Information Science and Engineering.
The Project Grant was awarded through grant opportunity Cyberinfrastructure for Sustained Scientific Innovation.
Status
(Ongoing)
Last Modified 9/22/23
Period of Performance
10/1/23
Start Date
9/30/28
End Date
Funding Split
$4.5M
Federal Obligation
$0.0
Non-Federal Obligation
$4.5M
Total Obligated
Activity Timeline
Subgrant Awards
Disclosed subgrants for 2311632
Transaction History
Modifications to 2311632
Additional Detail
Award ID FAIN
2311632
SAI Number
None
Award ID URI
SAI EXEMPT
Awardee Classifications
Private Institution Of Higher Education
Awarding Office
490509 OFC OF ADV CYBERINFRASTRUCTURE
Funding Office
490509 OFC OF ADV CYBERINFRASTRUCTURE
Awardee UEI
NX9PXMKW5KW8
Awardee CAGE
72061
Performance District
NY-10
Senators
Kirsten Gillibrand
Charles Schumer
Charles Schumer
Budget Funding
| Federal Account | Budget Subfunction | Object Class | Total | Percentage |
|---|---|---|---|---|
| Research and Related Activities, National Science Foundation (049-0100) | General science and basic research | Grants, subsidies, and contributions (41.0) | $4,500,000 | 100% |
Modified: 9/22/23