Large scale data storage and processing systems are needed to record, store, distribute, and process data from nuclear physics experiments conducted at large facilities in the US, such as Brookhaven National Laboratory's (BNL) Relativistic Heavy Ion Collider (RHIC), the Thomas Jefferson National Accelerator Facility (TJNAF) Continuous Electron Beam Accelerator (CEBAF), and for the Facility for Rare Isotope Beams (FRIB) at MSU. The electron ion collider (EIC), undergoing design and construction at BNL is anticipated to produce data at rates that will also challenge current computing and storage resources. Experiments at such facilities are extremely complex, involving thousands of detector elements that produce raw experimental data at rates in excess of several tens of GB/sec, resulting in an anticipated annual production of raw data sets of size 50 to 100 Petabytes (PB) at RHIC now, and the EIC in the future. A single experiment can produce reduced data sets of many 100s of Terabytes (TB) which are then distributed to institutions worldwide for analysis, and with the increasing data generation rates at these facilities, multi-PB reduced datasets will soon be common. Increased adoption and implementation of streaming readout protocols will only accelerate the data acquisition rates and the resulting volume of raw data. Besides accelerator-based experiments, next-generation neutrino experiments are also anticipated to produce data at rates and volumes that begin to challenge current computing and storage resource capabilities. Similarly, high-performance computing (HPC) simulations are essential to develop the theory needed to guide and interpret nuclear physics experiments. These theoretical experiments can also generate hundreds of TB of raw data, which needs to be analyzed by means of custom software pipelines and archived for future analysis. These simulations include models of nuclear collisions and the astrophysical events that are responsible for the creation of atomic nuclei, as well as calculations of structure of atomic nuclei and nucleons. Specific use examples are in Ref 1. Research on the management and storage of such large data sets will be required to support these large scale nuclear physics activities. All applications must explicitly show relevance to the DOE Nuclear Physics (NP) program and must be informed by the state of the art in nuclear physics applications, commercially available products, and emerging technologies. An application based on merely incremental improvements or little innovation will be considered non-responsive unless context is supplied that convincingly shows its potential for significant impact or value to the DOE nuclear physics program. Applications which are largely duplicative of previously funded research by the Office of Nuclear Physics and/or the Office of Advanced Scientific Computing Research will be considered nonresponsive to this topic. Applicants are strongly encouraged to review recent SBIR/STTR awards from the Office of Nuclear Physics. Those awards can be found at https://science.osti.gov/sbir/Awards (Release 1, DOE Funding Program: Nuclear Physics or Advanced Scientific Computing Research). The subtopics below refer to innovations that will advance our nation's capability to perform nuclear physics research, and more specifically to improve DOE NP Facilities and the wider NP community's programs. Although applicants may wish to gather information from and collaborate with experts at DOE National Laboratories to establish feasibility for their innovations, DOE expects all applicants to address commercialization opportunities for their product or service in adjacent markets such as medicine, homeland security, the environment and industry. Applications using the resources of a third party (such as a DOE laboratory) must include in the application, a letter of certification from an authorized official of that organization. Please note: following award, all DOE SBIR/STTR grant projects requiring high performance computing support are eligible to apply to use the DOE National Energy Research Scientific Computing Center (NERSC) resources. NERSC is the primary scientific computing facility for the DOE. If you think you will need to use the computing capabilities of NERSC during your Phase I or Phase II project, you may be eligible for this free resource. Learn more about NERSC and how to apply for NERSC resources following the award of a Phase I or Phase II project at http://www.nersc.gov/users/accounts/allocations/request-form/. Applications are sought only in the following subtopics: a. Tools for Large Scale, Widely Distributed Nuclear Physics Data Processing A trend in nuclear physics is to maximize the use of distributed storage and computing resources by constructing end-to-end data handling and distribution systems, with the aim of achieving fast data processing and/or increased data accessibility across many disparate computing facilities. Such facilities include local computing resources, university-based clusters, major DOE funded computing resources, and commercial cloud offerings. Applications are sought for: 1. Software techniques to improve the effectiveness of storing, retrieving, and moving such large volumes of data (> 1 PB/day), possibly including but not limited to automated data replication, data transfers from multiple sources, or network bandwidth scheduling to achieve the lowest wait-time or fastest data processing; 2. Effective new approaches to data mining or data analysis through data discovery or restructuring. Examples of such approaches might include fast information retrieval through advanced metadata searches or in-situ data reduction and repacking for remote analysis and data access; Open-source software solutions are strongly encouraged. Applications must clearly indicate how Phase I research and development will result in a working prototype or method that will be completed by the end of Phase II. The prototype or method must be suitable for testing in a nuclear physics application and/or at a nuclear physics facility. Applications not meeting this requirement will be considered nonresponsive and declined without review. Questions Contact: Michelle Shinn, Michelle.Shinn@science.doe.gov or the NP SBIR/STTR Topic Associate for Software and Data Management: Gulshan Rai, Gulshan.Rai@science.doe.gov. b. Applications of AI/ML to Nuclear Physics As discussed above, analysis of experimental, theoretical, and simulation data is a central task in the NP community. In the case of medium scale experiments, data sets will be collected with each event having a large number of independent parameters or attributes. The manipulation of these complex datasets into summaries suitable for the extraction of physics parameters and model comparison is a difficult and timeconsuming task. Currently, both the national laboratory and university-based groups carrying out experimental and simulation analyses maintain local computing clusters running domain specific software, often written by nuclear physicists. Likewise, theoretical groups, after generating data on a multitude of HPC platforms, perform analysis on the HPC resources and analysis machines at computing facility and on local clusters. Concurrently, the data science community has developed tools and techniques to apply machine learning (ML) and artificial intelligence (AI) for pattern finding and classification in large datasets, promising new avenues for analyses of this kind. These tools are generally open-source and can be effectively deployed on platforms ranging from distributed computing resources provided by commercial cloud services to leadership computing facilities. Application of these new ML and AI technologies to the analysis of nuclear physics data requires the development of domain specific tools. Such tools include the application of specific AI algorithms and techniques for the preparation and staging of large training sets. Sources of such data are described in Topic C55-24 (Nuclear Instrumentation, Detection Systems and Techniques). Applications are sought to develop: 1. ML and AI technologies to address a specific application domain in experimental, simulation, and/or theoretical nuclear physics data analysis. Applications should address performance and plan to demonstrate feasibility to non-experts in computer systems with working prototypes and comprehensive tutorials and/or documentation. 2. ML and AI technologies implemented within high-performance computing simulations to encapsulate essential physics to increase the fidelity of simulations and/or reduce the time to solution. Applicants are strongly encouraged to consult the references and open literature to best understand the tools already in use by the community. Open-source software solutions are strongly encouraged. Applications must clearly indicate how Phase I research and development will result in a working prototype or method that will be completed by the end of Phase II. The prototype or method must be suitable for testing in a nuclear physics application and/or at a nuclear physics facility. Applications not meeting this requirement will be considered nonresponsive and declined without review. Questions Contact: Michelle Shinn, Michelle.Shinn@science.doe.gov or the NP SBIR/STTR Topic Associate for Software and Data Management: Gulshan Rai, Gulshan.Rai@science.doe.gov. c. Heterogeneous Concurrent Computing Computationally demanding theory calculations as well as detector simulations and data analysis tasks are significantly accelerated through the use of general-purpose Graphics Processing Units (GPUs). The use of Field Programmable Gate Arrays (FPGAs) based computing is also being explored by the community. The ability to exploit these hardware solutions for concurrent computing has been significantly constrained by the effort required to port the software to these computing environments. Applications are sought to develop: 1. Cross compilation or source-to-source translation tools that are able to convert conventional as well as very complicated templatized code into high performance implementations for heterogeneous architectures. Utilizing High Performance Computing (HPC) and Leadership Computing Facilities (LCFs) is of growing relevance and importance to experimental NP as well. Most HPC and LCF facilities are evolving toward hybrid CPU and GPU architectures oriented towards machine learning. Existing analysis codes do not sufficiently reveal the concurrency necessary to exploit the high performance of the architectures in these systems. NP analysis problems do have the potential data concurrency needed to perform well on multi- and many-core architectures, but currently struggle to achieve high efficiency in both thread-scaling and in vector utilization. NP experimental groups are increasingly invited and encouraged to use such facilities, and DOE is assessing the needs of computationally demanding experimental activities such as data analysis, detector simulation, and error estimation in projecting their future computing requirements. Applications are sought to develop: 1. tools and technologies that can facilitate efficient use of large-scale CPU-GPU hybrid systems for the data-intensive workflows characteristic of experimental NP. Such tools can provide capability to utilize the heterogeneous LCF architectures and/or they can mask the embarrassingly parallel nature of the analysis needs and support the simultaneously scheduling of GPU-intense and CPU-only workflows on the same nodes with sophisticated workflow management tools. Ideally, tools should be designed, and interfaces constructed, in such a way to abstract low-level computational performance details away from users who are not computer scientists, while providing users who wish to perform optimizations effective and expressive application programming interfaces to accomplish this. Open-source software solutions are strongly encouraged. Applications must clearly indicate how Phase I research and development will result in a working prototype or method that will be completed by the end of Phase II. The prototype or method must be suitable for testing in a nuclear physics application and/or at a nuclear physics facility. Applications not meeting this requirement will be considered nonresponsive and declined without review. Questions Contact: Michelle Shinn, Michelle.Shinn@science.doe.gov or the NP SBIR/STTR Topic Associate for Software and Data Management: Gulshan Rai, Gulshan.Rai@science.doe.gov. d. Other In addition to the specific subtopics listed above, the Department invites applications in other areas that fall within the scope of the general description at the beginning of this topic.