Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Skip to content

precisionFDA Truth Challenge V2: Calling variants from short- and long-reads in difficult-to-map regions

Metadata Updated: July 29, 2022

The precisionFDA Truth Challenge V2 aimed to assess the state-of-the-art of variant calling in difficult-to-map regions and the Major Histocompatibility Complex (MHC). Starting with FASTQ files, 20 challenge participants applied their variant calling pipelines and submitted 64 variant callsets for one or more sequencing technologies (~35X Illumina, ~35X PacBio HiFi, and ~50X Oxford Nanopore Technologies). Submissions were evaluated following best practices for benchmarking small variants with the new GIAB benchmark sets and genome stratifications. Challenge submissions included a number of innovative methods for all three technologies, with graph-based and machine-learning methods scoring best for short-read and long-read datasets, respectively. New methods out-performed the 2016 Truth Challenge winners, and new machine-learning approaches combining multiple sequencing technologies performed particularly well. Recent developments in sequencing and variant calling have enabled benchmarking variants in challenging genomic regions, paving the way for the identification of previously unknown clinically relevant variants. This dataset includes the fastq files provided to participants, the submitted variant callset as vcfs, and the benchmarking results, along with challenge submission metadata.

Access & Use Information

Public: This dataset is intended for public access and use. License: See this page for license information.

Downloads & Resources

References

https://doi.org/10.1101/2020.11.13.380741

Dates

Metadata Created Date March 11, 2021
Metadata Updated Date July 29, 2022
Data Update Frequency irregular

Metadata Source

Harvested from NIST

Additional Metadata

Resource Type Dataset
Metadata Created Date March 11, 2021
Metadata Updated Date July 29, 2022
Publisher National Institute of Standards and Technology
Maintainer
Identifier ark:/88434/mds2-2336
Data First Published 2020-12-29
Language en
Data Last Modified 2021-01-26 00:00:00
Category Bioscience:Genomic measurements
Public Access Level public
Data Update Frequency irregular
Bureau Code 006:55
Metadata Context https://project-open-data.cio.gov/v1.1/schema/data.json
Schema Version https://project-open-data.cio.gov/v1.1/schema
Catalog Describedby https://project-open-data.cio.gov/v1.1/schema/catalog.json
Harvest Object Id d9f50fc1-b2ed-4437-ab83-efa8c56efa57
Harvest Source Id 74e175d9-66b3-4323-ac98-e2a90eeb93c0
Harvest Source Title NIST
Homepage URL https://data.nist.gov/od/id/mds2-2336
License https://www.nist.gov/open/license
Program Code 006:045
Related Documents https://doi.org/10.1101/2020.11.13.380741
Source Datajson Identifier True
Source Hash 24bc420f27c78b3e3735fd0b2846041fa2436478
Source Schema Version 1.1

Didn't find what you're looking for? Suggest a dataset here.