Search Contract Opportunities

Open-Source and User-Friendly Record Linkage/De-duplication Tool

ID: CDC/NCBDDD 020 • Type: SBIR / STTR Topic • Match:  85%
Opportunity Assistant

Hello! Please let me know your questions about this opportunity. I will answer based on the available opportunity documents.

Please sign-in to link federal registration and award history to assistant. Sign in to upload a capability statement or catalogue for your company

Some suggestions:
Please summarize the work to be completed under this opportunity
Do the documents mention an incumbent contractor?
I'd like to anonymously submit a question to the procurement officer(s)
Loading

Description

Phase I SBIR proposals will be accepted. Fast-Track proposals will not be accepted. Phase I clinical trials will not be accepted. Number of anticipated awards: 1 Budget (total costs): Phase I: up to $243,500 for up to 6 months; Phase II of up to $1,000,000 and a Phase II duration of up to 2 years PROPOSALS THAT EXCEED THE BUDGET OR PROJECT DURATION LISTED ABOVE MAY NOT BE FUNDED. Page 120 Background Record linkage (or de-duplication) is an essential component of many CDC-supported projects and programs. If an individual is reported as a case by more than one data source, or reported at multiple times, it is vital to link records so that an individual will not be counted as multiple incident cases. There are powerful algorithms that can automatically detect matches in many situations. However, these software tools are often proprietary or require programming/coding skills that may not be available in every state or jurisdiction. A free and easy-to-use solution would strengthen public health expertise, as the same tools could be used across programs, and users who cannot write code could use the same underlying packages and algorithms as more technically inclined users. Motivating example: CDC's Autism and Developmental Disabilities Monitoring (ADDM) Network currently supports autism surveillance in different states. States receive information from various medical and educational providers, and states must link records to ensure each child is counted once and that all critical data elements are linked to the child's record. The ADDM surveillance program uses The Link King , a SAS-based record linkage program, for data linkages. There are several beneficial attributes of this tool: it uses high-performing algorithms, is free (but requires a paid SAS subscription), and it has a graphical user interface that allows easy use by non-coders. However, it is no longer actively supported or developed (the team received permission to host an archival copy at www.the-link-king.party). Future updates to SAS, Microsoft Windows, or any dependency could jeopardize the functioning of the tool, and therefore the surveillance program. Project Goals Short term project goals Understand basic needs and use cases for record linkage in public health applications Develop an R package that provides an R Shiny front-end to a high-performance record linkage package (such as fastLink, RecordLinkage, or csvdedupe) Functionality should include the ability to facilitate linkage parameters (select variables used for linkages), identify data sets to be used, manually verify and review results, and export the resulting matched and non-matched data. Create documentation to instruct users on its use (such as a getting started vignette) Create a public GitHub repository for the code, as well as for tracking issues and feature requests from end-users Phase I Activities and Expected Deliverables During the Phase I period, the activities can include, but are not limited to: The following deliverables should be produced by the end of the project period: - R Package providing interface to record linkage/de-duplication program(s) - Includes documentation (built into package, and vignette) - Package and materials hosted on CRAN - Source code maintained on a public GitHub repository - Demonstration to CDC/public health community - Summary of potential enhancements and community feedback/requests Impact This project could have both long- and short-term impact on CDC surveillance programs and other projects. Most immediately, it will provide a sustainable solution for the ADDM Network, as the current record linkage software is effectively abandonware and requires SAS licenses. Other free options (summarized here and here) often lack easy-to-use interfaces, are not updated, or are only available in programming languages that would add complexity to (or be incompatible with) a public health program. Commercial tools could be expensive (as shown here) or require uploading sensitive data to a cloud-based service, which might violate public health data privacy requirements. Proprietary software could also be custom-tailored to each surveillance system and include this functionality. For example, the ADDM Network discontinued a $500,000 annual contract to build and maintain a proprietary data system that included rudimentary record linkage functionality. Other customizable products have linkage/de-duplication functionality, such as Conduent's Maven software, but can be expensive and encourage fragmentation between different systems by virtue of requiring software licenses/contracts. Page 121 More broadly, this tool could fill similar gaps in functionality in other CDC and public health programs without having to resort to custom-developed software. There are already thousands of R users at CDC, and they would be able to easily integrate this tool into other systems that could benefit, such as during Epi Aids, when simple tools are needed immediately. When we designed our current data system, we spoke with other surveillance programs and often heard that record linkage / de-duplication processes were lacking in performance (such as when a basic matching algorithm is integrated into custom software) or were deemed responsibilities that were left up to the states to complete without explicit support from CDC. If selected, this project would have a high likelihood of success, as the core record linkage algorithms are already available this project would make them easier to use by non-programmers and better integrate them into typical public health / surveillance workflows. Commercialization Potential Many open source software projects have successful commercial models through selling professional services, including enhanced support, customized features, consultation, training, or analytic capacity. This record linkage tool could become part of a suite of widely-used data management and analytic tools that are commonly deployed in the public health community. The developer would be well-positioned to offer premium support and technical services to programs that use the tools or need custom solutions built upon an open-source platform.

Overview

Response Deadline
Oct. 28, 2021 Past Due
Posted
July 12, 2021
Open
July 12, 2021
Set Aside
Small Business (SBA)
Place of Performance
Not Provided
Source
Alt Source

Program
SBIR Phase I / II
Structure
Contract or Grant
Phase Detail
Phase I: Establish the technical merit, feasibility, and commercial potential of the proposed R/R&D efforts and determine the quality of performance of the small business awardee organization.
Phase II: Continue the R/R&D efforts initiated in Phase I. Funding is based on the results achieved in Phase I and the scientific and technical merit and commercial potential of the project proposed in Phase II. Typically, only Phase I awardees are eligible for a Phase II award
Duration
6 Months - 1 Year
Size Limit
500 Employees
On 7/12/21 Centers for Disease Control and Prevention issued SBIR / STTR Topic CDC/NCBDDD 020 for Open-Source and User-Friendly Record Linkage/De-duplication Tool due 10/28/21.

Documents

Posted documents for SBIR / STTR Topic CDC/NCBDDD 020

Question & Answer

The AI Q&A Assistant has moved to the bottom right of the page

Contract Awards

Prime contracts awarded through SBIR / STTR Topic CDC/NCBDDD 020

Incumbent or Similar Awards

Potential Bidders and Partners

Awardees that have won contracts similar to SBIR / STTR Topic CDC/NCBDDD 020

Similar Active Opportunities

Open contract opportunities similar to SBIR / STTR Topic CDC/NCBDDD 020