DESC0025059
Project Grant
Overview
Grant Description
Red-teaming and evaluating foundation models for safety, trust, and effective artificial intelligence with application of retrieval augmented generation.
Awardee
Grant Program (CFDA)
Awarding Agency
Funding Agency
Place of Performance
Blacksburg,
Virginia
24060-6305
United States
Geographic Scope
Single Zip Code
Related Opportunity
Harmonia Holdings Group was awarded
Project Grant DESC0025059
worth $199,797
from the Office of Science in July 2024 with work to be completed primarily in Blacksburg Virginia United States.
The grant
has a duration of 9 months and
was awarded through assistance program 81.049 Office of Science Financial Assistance Program.
The Project Grant was awarded through grant opportunity FY 2024 Phase I Release 2.
SBIR Details
Research Type
SBIR Phase I
Title
Red-Teaming and Evaluating Foundation Models for Safety, Trust, and Effective Artificial Intelligence with Application of Retrieval Augmented Generation
Abstract
This work fulfills the need for developing the measures, tooling, and methodologies to enable organizations building static and dynamic artificial intelligence (AI) systems to combine human-in-the- loop review and an AI judge in an experimentation framework (XF) to evaluate the accuracy and soundness of AI system responses, and that an AI system is trustworthy and secure from attack methods (e.g., prompt injection, jailbreaking, data poisoning, backdoor attacks). Organizations implementing AI systems cannot easily answer, ôHave we done enough evaluation to establish trustworthiness?ö Establishing those qualities is a prerequisite to AI deployment in use cases for reducing global nuclear threats, as well as for business, education, and personal AI use cases. Todayĺs approach includes traditional software testing methods (e.g., unit, integration, system, performance) testing, human-guided ethical testing, and red-teaming where a tester thinks like an adversary. But the probabilistic nature of multimodal foundation models (MMFMs) using generative AI (GenAI), and pitfalls such as oversampling and bias for model training require new methods. We create an experimentation framework (XF) focusing on GenAI, which represents the most MFMSs. We distinguish between testing LLMs only and testing LLMs used in conjunction with Retrieval Augmented Generation (RAG), the process of extracting relevant information from grounding documents to feed to the LLM for customized content generation. LLMS + RAG can be multi-modal by extracting relevant information from videos, audio, text, and images to provide to the LLM. We provide a RAG solution that is agnostic to the choice of LLMs, identify composite metrics scores (e.g., harmonic mean of Faithfulness, Answer Relevancy, Context Precision and Context Recall metrics), and research how to use human review and automation in tandem, which can be transformative. Our solution works stand-alone in an air-gapped suitable for classified proliferation detection use cases. In response to the gravity of AI trustworthiness for society and security, the U.S. Government is driving requirements associated with AI starting with Executive Order (EO) 14110 on the Safe, Secure, and Trustworthy Development and Use of AI. Our XF provides a way to operationalize evaluating AI systems for parts of EO 14110, such as Section 4, ôEnsuring the Safety and Security of AI Technology.ö XF, when realized through Phases II and II, contributes tooling and methodologies to mitigate in AI the repeated history of powerful new technologies bringing both good use with dangers of unintended consequences and misuse. A broad range of groups within both the Federal Government and commercial industry will benefit, including organizations applying MMFMs (LLMs and RAG). XFĺs benefit is to enable those organizations to measure when their AI system are trustworthy to deploy to support their critical organization missions; for example, to establish effectiveness in analyzing images, video, documents, and data that are increasingly streamed or ingested by organizations to derive insights, indicators (e.g., proliferation evidence), and summaries for intelligence, military, agricultural, financial, and other use; and to assist Section 508 accessibility of media.
Topic Code
C58-04a
Solicitation Number
DE-FOA-0003202
Status
(Complete)
Last Modified 8/27/24
Period of Performance
7/22/24
Start Date
4/21/25
End Date
Funding Split
$199.8K
Federal Obligation
$0.0
Non-Federal Obligation
$199.8K
Total Obligated
Activity Timeline
Additional Detail
Award ID FAIN
DESC0025059
SAI Number
None
Award ID URI
SAI EXEMPT
Awardee Classifications
Small Business
Awarding Office
892430 SC CHICAGO SERVICE CENTER
Funding Office
892401 SCIENCE
Awardee UEI
SJHANNQ8XZT6
Awardee CAGE
4UPA9
Performance District
VA-09
Senators
Mark Warner
Timothy Kaine
Timothy Kaine
Modified: 8/27/24