Level 2 (Self)
TECHNOLOGY AREAS
Chem Bio Defense
MODERNIZATION PRIORITIES
Biotechnology
KEYWORDS
Host-pathogen interaction, protein-protein interaction prediction, protein language model, biological threat characterization, computational biology, drug repurposing, biodefense
OBJECTIVE
Develop and demonstrate a capability to rapidly characterize host pathogen interactions from pathogen protein sequence alone, enabling timely medical countermeasure prioritization and force health protection against novel or emerging biological threats.
DESCRIPTION
When novel or emerging pathogens (bacteria, viruses, parasites) are encountered, characterization of their interactions with human hosts currently requires weeks to months of experimental work, often yielding incomplete understanding. This capability gap limits rapid therapeutic response and countermeasure development. Recent advances in protein language models and large-scale protein-protein interaction (PPI) prediction make computational threat characterization feasible. This topic seeks to develop and validate an operationally deployable capability that can characterize any pathogen naturally emerging, accidentally released, or engineered from protein sequence data alone. The system must: (1) predict host-pathogen protein interactions with high accuracy across viral, bacterial, and parasitic pathogen classes; (2) demonstrate zero-shot prediction capability on previously unseen pathogens; (3) provide comprehensive functional annotation of both pathogen and host proteins; (4) generate ranked mechanistic hypotheses about infection pathways through automated analysis; and (5) complete core predictions within 15 minutes and full characterization reports within one hour on standard computing hardware. Proposers must demonstrate rigorous evaluation methods to ensure the system generalizes to unseen pathogens rather than memorizing training data. Performance must be benchmarked against established protein interaction databases and validated experimentally using standard binding assay techniques. The end-state capability enables rapid biological threat characterization to support medical countermeasure prioritization and force health protection.
PHASE I
This topic is soliciting Direct to Phase II proposals only.Feasibility Requirements: Proposers must demonstrate that Phase I feasibility has been achieved through prior work. Required documentation includes: Benchmark Performance Data: Quantified PPI prediction results on at least one pathogen class with rigorous data separation methods Zero-Shot Validation: Demonstrated recovery of known host-pathogen interactions without training on that specific pathogen system Pipeline Demonstration: At least one complete end-to-end run from pathogen sequence input to mechanistic characterization report meeting timing requirements Functional Annotation Capability: Operational tools for protein functional prediction including gene ontology terms, subcellular localization, and pathway enrichment analysis
PHASE II
DP2 Program Structure DP2 Base Period (9 months): Scale and validate the computational pipeline across expanded pathogen coverage including higher-consequence agents. Deliver comprehensive experimental validation of novel predicted interactions. DP2 Option Period (9 months): Complete transition-ready software delivery with full documentation, demonstrate drug repurposing capability, and provide final performance characterization across the full threat spectrum. Phase II represents a major research and development effort that scales the validated Phase I pipeline into a deployable threat-characterization capability, with comprehensive experimental validation, druggability and drug-repurposing demonstration, and extension to higher-consequence pathogens. The Phase II effort culminates in a well-defined deliverable prototype an end-to-end software pipeline and accompanying validation dataset that can be transitioned to an operational user. Phase II fixed payable milestones for this program should include: DP2 Base Period Month 2: Updated system architecture report and expanded pathogen coverage plan Month 4: Evaluation dataset acquisition report covering higher-consequence pathogens and biological toxins Month 6: Interim performance report with comprehensive benchmarking results Month 9: Base period final report with experimental validation of =25 novel interactions (=30% hit rate) and drug repurposing demonstration DP2 Option Period Month 12: Live demonstration to DARPA with prospective characterization run on Government-selected pathogen Month 15: Final software delivery with source code and documentation Month 18: Final report with transition plan and performance characterization
PHASE III DUAL USE APPLICATIONS
Military Applications: Biosurveillance and rapid threat characterization, medical countermeasure prioritization, force health protection for deployed personnel, and intelligence analysis support. Commercial Applications: Drug discovery and repurposing, vaccine target identification, diagnostic biomarker development, veterinary and agricultural biosecurity, and integration with existing bioinformatics platforms.
REFERENCES
Hallee, L. et al. "Protein-Protein Interaction Prediction is Achievable with Large Language Models . bioRxiv 2023.06.07.544109; doi: https://doi.org/10.1101/2023.06.07.544109
Gleghorn, J.P. et al. University of Delaware, Biomedical Engineering host-pathogen biophysical characterization publications.
Park, Y., Marcotte, E.M. "Flaws in evaluation schemes for pair-input computational predictions" (C3 data stratification standard for PPI evaluation), Nature Methods, 2012.
Evans, R. et al. "Protein complex prediction with AlphaFold-Multimer," bioRxiv, 2022 (structural baseline for benchmarking).
PHI-base (pathogen-host interactions database); VirHostNet; HPIDB; IntAct.
Gordon, D.E. et al. "A SARS-CoV-2 protein interaction map reveals targets for drug repurposing," Nature, 2020 (reference interactome for zero-shot validation).
Reactome, KEGG, Gene Ontology Consortium pathway enrichment reference databases.