OUSD (R&E) MODERNIZATION PRIORITY: Artificial Intelligence/Machine Learning, Network Command TECHNOLOGY AREA(S): Sensors OBJECTIVE: Military operation simulator with high speed Multi-Domain capabilities and AI/ML, XR interfaces in which AI-enabled Command and Control (C2) agents shall learn by executing simulated Multi Domain Operations (MDO). DESCRIPTION: Recent assessments by US Army's Future Study Program have shown that there are no capable simulation systems that meet the requirements of AI for C2 in MDO. Although there are government-owned and commercial C2 simulation systems available, none of them offer the necessary combination of very high speed execution, multi-domain richness, and specialized interfaces for AI/ML applications. The speed and complexity of MDO against a peer adversary are likely to exceed the cognitive abilities of a human command staff in conventional, largely manual C2 processes. At the same time, emerging applications of Artificial Intelligence (AI) techniques such as Deep Reinforcement Learning (DRL) [1] [2] begin to suggest the potential to support C2 of MDO. Recently there has been a growing interest in the DOD community, including military departments, unified combatant commands and defense agencies like DARPA to research and develop C2 AI techniques, specifically DRL based techniques that can learn to seek, create, and jointly exploit Windows of Superiority (WoS), a key element of the MDO paradigm. To converge multi-domain friendly assets on a WoS, the C2 agents will learn to perform complex (re)planning on shortened timelines, quickly offer suggestions, and test alternative Course of Actions (COAs). Developing these agents will require a simulator engine(s) of appropriate fidelity since DRL-derived policies are fundamentally limited to the experiences that is available. This topic looks at developing a simulation environment that can generate scenarios which cover all relevant domains/ capabilities that an AI-enabled C2 system is expected to manage, rapidly produce large amounts of training data for ML algorithms, run much faster than real-time and support massive parallelization in order to make the learning process tractable within operational timelines. From an operational perspective for future MDO, it is envisioned that a comprehensive AI-based C2 system will create high-fidelity simulations of combat scenarios within a short duration of time. AI agents will be trained in the simulator and deployed on the field to generate predictions, decisions, and commands at multiple levels of abstraction. These AI-enabled solutions will also work collaboratively with humans within command posts to ensure that data collection, processing, exploitation, and dissemination is efficient and timely to enable rapid and accurate decision-making. Currently, the C2 simulation environments such as OpSim [3], DXTRS [4], OneSAF [5] mostly provide war gaming, Course of Action (CoA) implementation in the traditional physical domains and are not tailored towards developing AI applications. They do not have the provision to communicate/interface with AI algorithms, adjust resources, scale the computation to generate experiences and incorporate humans into the AI-C2 loop. In summary, the goal of the SBIR is to research and develop an integrated simulated battle space that address current limitations in training and testing AI systems for C2 with and without human-in-the-loop. PHASE I: The Phase I research effort shall focus on conceptualizing a brigade level model-based C2 simulation environment prototype with Land, Air, and Sea domains that runs 1,000 times faster than real-time/actual mission time. This simulation environment will consist of both a stochastic simulator based on a provided CoA and an OpenAI gym compatible iterative interface for training DRL algorithms that allows every entity in a simulation to be controlled as a separate agent. The vendor shall allow the user to modify observations, actions, rewards, metrics and interactions produced by the simulator. The software shall be designed to execute multiple independent instances on each node of a multi-node system and collect experience through parallel data collection. A typical unit of measurement for evaluating C2 environment performance is the amount of time required to perform a C2 function or known as a Boyd's Observe-Orient-Decide-Act (OODA) loop, is the OODA time. An OODA military task using training data suggest that various C2 environments will execute a task between 5 to 30 seconds depending on the complexity of the task and the C2 environment [6][7]. The performer shall develop proof of concept that AI agents trained in the simulator shall produce similar or improved OODA time. Further, the performers shall produce experimental/analytical results to emonstrate the ability of AI agents trained in the simulator to produce improved values for intermediate goals such as casualties, fuel and ammunition consumption, movement when compared to the CoA designed by expert CoA designers. In addition, the deliverable for Phase I shall include detail documentation on problem description, current limitations, conceptual design, architectural overview, methodology, modules, analytical/experimental critical function and a detailed prototype development plan for Phase II (TRL 2). PHASE II: The initial part of Phase II shall involve building a prototype based on the concept/methodology conceived in Phase I and meeting the performance criteria described in Phase 1. Further, the simulator will be extended to cyber, electronic warfare (EW) and space domains with the ability to depict communication and information flows at very high resolution. The overall fidelity and realism in simulation will be increased by incorporating weather, sensors, terrain interactions, and environmental attributions. Phase II shall also involve development of a next-generation XR user interface that can alter the battlespace by receiving input from the human user for handling human-in/on-the-loop interactions. The user latency of the interface will be less than 7 ms. The Phase II deliverable shall be an end-to-end software prototype of a multi-domain high-fidelity simulation environment, AI interface, and low latency XR user interface. At the end of phase II, DRL based agents shall be implemented in the simulation and at least 70% of AI re-planning recommendations on scenarios jointly developed by concept writers and stakeholders shall be assessed as reasonable by expert human jurors (TRL 5). PHASE III DUAL USE APPLICATIONS: The software shall be extended to improve the run time to 10,000 times faster than real-time. The integrated system shall have the capability to simulate MDO at multiple echelons including squad, platoon, brigade, division and corp. The simulation system shall be implemented on DOD's advanced supercomputing capability and evaluated using DRL algorithms and human participants on scenarios jointly developed by concept writers and stakeholders. In terms of the Army's modernization priorities, this software infrastructure will contribute to the three core tenets of multi-domain operations calibrated force posture, multidomain formations, and convergence and is critical for multiple Cross Functional Teams (CFTs) including Network Command, Control, Communication, and Intelligence (C3I), Next Generation Combat Vehicle (NGCV) and Air and Missile Defense (AMD). Other commercial application include R & D and operational simulation infrastructure for planning and decision making during humanitarian assistance, disaster response and emergency management. The C2 simulation environments could also be used for improving training for pilots, air traffic controllers and other complex data intensive professions involving civilian safety and lives. REFERENCES: Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemyslaw Debiak, Christy Dennison et. al., Dota 2 with Large Scale Deep Reinforcement Learning , arXiv:1912.06680 Oriol Vinyals, Igor Babuschkin, Wojciech M. Czarnecki, Micha l Mathieu, Andrew Dudzik, Junyoung Chung, David H. Choi, Richard Powell, Timo Ewalds, Petko Georgiev et. al., Grandmaster level in StarCraft II using multi-agent reinforcement learning , Nature volume 6 575, pages350 354(2019) Surdu, John R., Gary D. Haines, and Udo W. Pooch. "OpSim: A purpose-built distributed simulation for the mission operational environment." SIMULATION SERIES 31 (1999): 69-74 https://usacac.army.mil/sites/default/files/documents/cact/august%20newsletter.pdf https://asc.army.mil/web/portfolio-item/peo-stri-one-semi-automated-forces-pdm-onesaf/ J. R. Boyd, The essence of winning and losing, 1996. KEYWORDS: Command and Control, Simulator, Artificial Intelligence, Machine Learning, Human-in-the-Loop