2411188
Project Grant
Overview
Grant Description
Frameworks: Globus Search: Search and analysis over globally distributed data
Increasing data sizes, greater hardware specialization, faster networks, and larger collaborative teams result in modern research employing ever-more distributed cyberinfrastructure (CI).
It is now commonplace for data to be produced in multiple locations (e.g., in research laboratories or on supercomputers), analyzed in others (e.g., local, campus, or national CI), and shared, published, or archived in yet others.
This increasingly distributed CI is enabling exciting discoveries in many domains, but also leads to difficulties for researchers who must manage, discover, and act upon large volumes of distributed data.
Growing amounts of valuable research time is spent on mundane but necessary data management tasks; crucial data are lost; important provenance information cannot be determined; and analyses are repeated.
To tackle these problems, this project will build Globus Search, a new capability integrated into the widely used Globus platform, that will enable the creation of, and search within and across, distributed Globus collections.
By thus allowing researchers to easily discover data regardless of location, group data into “virtual” collections, and act on virtual collections irrespective of where individual files reside, Globus Search will allow even the largest and most distributed teams to organize, navigate, and operate on their data.
Globus has emerged as an essential tool for alleviating the numerous frictions associated with managing, accessing, moving, and sharing data within and across the many distinct data collections that constitute the modern CI experience.
However, an implicit assumption has always been that researchers know where data reside: an assumption that becomes increasingly untenable as data and CI grow in complexity.
This project will implement a suite of new capabilities including methods to crawl parallel and distributed storage systems; capture events (e.g., file creation, modification, deletion) from these storage systems; extract metadata from within diverse scientific file formats; communicate events securely and reliably to the cloud-hosted service; index files and metadata in a secure and accessible manner; and develop new interfaces for navigating distributed data collections, creating virtual collections, and acting on these virtual collections.
Leveraging the hybrid cloud/local service deployment approach that has proven so successful for other Globus services, Globus Search will build on powerful, scalable, and robust cloud-hosted search services to deliver a rich search experience to users via the Globus web app, command line interface, and Python and JavaScript libraries.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the foundation's intellectual merit and broader impacts review criteria.
Subawards are not planned for this award.
Increasing data sizes, greater hardware specialization, faster networks, and larger collaborative teams result in modern research employing ever-more distributed cyberinfrastructure (CI).
It is now commonplace for data to be produced in multiple locations (e.g., in research laboratories or on supercomputers), analyzed in others (e.g., local, campus, or national CI), and shared, published, or archived in yet others.
This increasingly distributed CI is enabling exciting discoveries in many domains, but also leads to difficulties for researchers who must manage, discover, and act upon large volumes of distributed data.
Growing amounts of valuable research time is spent on mundane but necessary data management tasks; crucial data are lost; important provenance information cannot be determined; and analyses are repeated.
To tackle these problems, this project will build Globus Search, a new capability integrated into the widely used Globus platform, that will enable the creation of, and search within and across, distributed Globus collections.
By thus allowing researchers to easily discover data regardless of location, group data into “virtual” collections, and act on virtual collections irrespective of where individual files reside, Globus Search will allow even the largest and most distributed teams to organize, navigate, and operate on their data.
Globus has emerged as an essential tool for alleviating the numerous frictions associated with managing, accessing, moving, and sharing data within and across the many distinct data collections that constitute the modern CI experience.
However, an implicit assumption has always been that researchers know where data reside: an assumption that becomes increasingly untenable as data and CI grow in complexity.
This project will implement a suite of new capabilities including methods to crawl parallel and distributed storage systems; capture events (e.g., file creation, modification, deletion) from these storage systems; extract metadata from within diverse scientific file formats; communicate events securely and reliably to the cloud-hosted service; index files and metadata in a secure and accessible manner; and develop new interfaces for navigating distributed data collections, creating virtual collections, and acting on these virtual collections.
Leveraging the hybrid cloud/local service deployment approach that has proven so successful for other Globus services, Globus Search will build on powerful, scalable, and robust cloud-hosted search services to deliver a rich search experience to users via the Globus web app, command line interface, and Python and JavaScript libraries.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the foundation's intellectual merit and broader impacts review criteria.
Subawards are not planned for this award.
Awardee
Funding Goals
THE GOAL OF THIS FUNDING OPPORTUNITY, "CYBERINFRASTRUCTURE FOR SUSTAINED SCIENTIFIC INNOVATION", IS IDENTIFIED IN THE LINK: HTTPS://WWW.NSF.GOV/PUBLICATIONS/PUB_SUMM.JSP?ODS_KEY=NSF22632
Grant Program (CFDA)
Awarding / Funding Agency
Place of Performance
Chicago,
Illinois
60637-5418
United States
Geographic Scope
Single Zip Code
Related Opportunity
University Of Chicago was awarded
Globus Search: Empowering Distributed Data Discovery and Management
Project Grant 2411188
worth $3,594,788
from the NSF Office of Advanced Cyberinfrastructure in August 2024 with work to be completed primarily in Chicago Illinois United States.
The grant
has a duration of 4 years and
was awarded through assistance program 47.070 Computer and Information Science and Engineering.
The Project Grant was awarded through grant opportunity Cyberinfrastructure for Sustained Scientific Innovation.
Status
(Ongoing)
Last Modified 7/23/24
Period of Performance
8/1/24
Start Date
7/31/28
End Date
Funding Split
$3.6M
Federal Obligation
$0.0
Non-Federal Obligation
$3.6M
Total Obligated
Activity Timeline
Additional Detail
Award ID FAIN
2411188
SAI Number
None
Award ID URI
SAI EXEMPT
Awardee Classifications
Private Institution Of Higher Education
Awarding Office
490509 OFC OF ADV CYBERINFRASTRUCTURE
Funding Office
490509 OFC OF ADV CYBERINFRASTRUCTURE
Awardee UEI
ZUE9HKT2CLC9
Awardee CAGE
5E688
Performance District
IL-01
Senators
Richard Durbin
Tammy Duckworth
Tammy Duckworth
Modified: 7/23/24