Search Contract Opportunities

Tools for Test & Evaluation on Incompletely-Labeled Satellite Imagery Datasets

ID: OSD252-D03 • Type: SBIR / STTR Topic • Match:  85%
Opportunity Assistant

Hello! Please let me know your questions about this opportunity. I will answer based on the available opportunity documents.

Please sign-in to link federal registration and award history to assistant. Sign in to upload a capability statement or catalogue for your company

Some suggestions:
Please summarize the work to be completed under this opportunity
Do the documents mention an incumbent contractor?
Does this contract have any security clearance requirements?
I'd like to anonymously submit a question to the procurement officer(s)
Loading

Description

OUSD (R&E) CRITICAL TECHNOLOGY AREA(S): Trusted AI and Autonomy OBJECTIVE: Develop tooling for rapid performance analysis of diverse AI/ML models on overhead imagery datasets with label noise, over wide ranges of environmental conditions. DESCRIPTION: ** All work will be conducted on classified data at the TS/SCI level.** Deployment and maintenance of AI/ML models at scale requires frequent, reliable test and evaluation. The trustworthiness of such evaluation depends on two factors: well-defined performance metrics and well-defined test data. Performance metrics are extensively studied, as are techniques to split datasets into training and test partitions, so the emphasis of this topic is the development of tools to support the use of such test partitions. A key, unresolved issue in the use of test data is the presence of label noise, such as missing or incomplete annotations, poorly sized, oriented or aligned localizations (bounding boxes), or mislabelled annotation names (misclassifications). Such noise is accepted to be present in most AI/ML datasets, and it has been determined to have potentially significant impacts on evaluation quality for a range of tasks and fields [1, 2], but few methods exist to mitigate these impacts. Tools to identify and correct label noise are available, but require significant human effort for large-scale data, making them insufficient for most applications. Forgoing a labeled test set entirely and estimating performance based on training data or unlabeled data may also be viable [3, 4, 5], but this has not been extensively studied, particularly in the domain of overhead imagery where errors may be highly correlated with environmental conditions (for example, difficulties in annotating cloudy imagery). A flexible system to reliably estimate true AI/ML model performance without a need for perfect test data would fill a significant gap in capability for organizations working with large numbers of models and large amounts of data and (such as NGA) as well as organizations without the time or human capital to exhaustively assess datasets used for AI/ML development (such as academic labs and smaller commercial entities). Such a system would greatly enhance organizations' confidence in post-deployment model performance without incurring additional cost and delay. The primary interest of this topic is in tools for evaluation of object detection and automated target recognition tasks, but other computer vision tasks such as image classification, semantic segmentation, change detection, etc. are within scope. PHASE I: Demonstrate a basic software package for the estimation of AI/ML model performance on computer vison detector tasks for overhead satellite imagery. Methods should be robust to noise or absence of test data. Prototype metrics to quantify the accuracy of the developed methodology (that is, the gap between observed performance on imperfect data and the actual performance of the model). Methods should generalize to multiple real overhead imagery datasets over multiple architectures. Deliver all software packages and a summary report of performance investigations. PHASE II: Apply methods and tooling of Phase I to overhead imagery. Provide more detailed analyses of performance, such evaluations for various combinations of collection geometries and environmental conditions. Develop techniques to extrapolate performance estimates to novel data which may be unlabeled, partially labeled, and/or labeled under a different ontology. Software from Phase II should be compatible with multiple overhead imagery datasets over multiple model architectures. Demonstrate tooling and methodologies on novel detector development without preexisting large datasets on at least three challenge problems developed in consort with NGA sponsor. Deliver all software packages and a summary report of performance investigations. Performers are expected to accomplish this work at the TS/SCI level. PHASE III DUAL USE APPLICATIONS: Enhance software from Phase II to function with minimal human intervention on large-scale datasets and AI/ML models in line with commercial applications. Extensively document usage and functionality of all developed software. Deliver all software packages and a summary report of performance investigations. REFERENCES: 1. Northcutt, Curtis G., Anish Athalye, and Jonas Mueller. "Pervasive label errors in test sets destabilize machine learning benchmarks." arXiv preprint arXiv:2103.14749 (2021). 2. Wang, Yueye, et al. "Impact of Gold-Standard Label Errors on Evaluating Performance of Deep Learning Models in Diabetic Retinopathy Screening: Nationwide Real-World Validation Study." Journal of medical Internet research 26 (2024): e52506. 3. Deng, Weijian, and Liang Zheng. "Are labels always necessary for classifier accuracy evaluation?." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021. 4. Okanovic, Patrik, et al. "All models are wrong, some are useful: Model Selection with Limited Labels." arXiv preprint arXiv:2410.13609 (2024). 5. Bia ek, Jakub, et al. "Estimating Model Performance Under Covariate Shift Without Labels." KEYWORDS: test and evaluation; machine learning; artificial intelligence; computer vision; trusted ai.

Overview

Response Deadline
May 21, 2025 Past Due
Posted
April 3, 2025
Open
April 3, 2025
Set Aside
Small Business (SBA)
Place of Performance
Not Provided
Source
Alt Source

Program
SBIR Phase I / II
Structure
Contract
Phase Detail
Phase I: Establish the technical merit, feasibility, and commercial potential of the proposed R/R&D efforts and determine the quality of performance of the small business awardee organization.
Phase II: Continue the R/R&D efforts initiated in Phase I. Funding is based on the results achieved in Phase I and the scientific and technical merit and commercial potential of the project proposed in Phase II. Typically, only Phase I awardees are eligible for a Phase II award
Duration
6 Months - 1 Year
Size Limit
500 Employees
On 4/3/25 Office of the Secretary of Defense issued SBIR / STTR Topic OSD252-D03 for Tools for Test & Evaluation on Incompletely-Labeled Satellite Imagery Datasets due 5/21/25.

Documents

Posted documents for SBIR / STTR Topic OSD252-D03

Question & Answer

The AI Q&A Assistant has moved to the bottom right of the page

Contract Awards

Prime contracts awarded through SBIR / STTR Topic OSD252-D03

Incumbent or Similar Awards

Potential Bidders and Partners

Awardees that have won contracts similar to SBIR / STTR Topic OSD252-D03

Similar Active Opportunities

Open contract opportunities similar to SBIR / STTR Topic OSD252-D03

Experts for Tools for Test & Evaluation on Incompletely-Labeled Satellite Imagery Datasets

Recommended subject matter experts available for hire