DNA Databases
  • Post last modified:2023-12-09

DNA databases, also known as DNA databanks, are stored sets of genetic profiles that can be used for various purposes. These databases may be public or private. They can be used in the analysis of genetic diseases, genetic fingerprinting for criminology, or genetic genealogy, among several uses.

 

DNA databases

 

International Nucleotide Sequence Database Collaboration

International Nucleotide Sequence Database Collaboration (INSDC) involves the following three primary databases. New and updated data are synchronized on a daily basis.  They collaborate with Sequence Read Archive (SRA), which archives raw reads from high-throughput sequencing instruments.

GenBank (National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health) funded by the government of the United States.

European Nucleotide Archive (European Bioinformatics Institute).

DNA Data Bank of Japan (National Institute of Genetics).

 

Other DNA databases

RefSeq (NCBI Reference Sequence Database) is a comprehensive, integrated, non-redundant, well-annotated set of reference sequences including genomic, transcript, and protein.

Ensembl,  based at the European Molecular Biology Laboratory’s European Bioinformatics Institute, is a genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation. Ensembl annotate genes, computes multiple alignments, predicts regulatory function and collects disease data. Ensembl tools include BLAST, BLAT, BioMart and the Variant Effect Predictor (VEP) for all supported species.

The International Genome Sample Resource (IGSR) created the largest public catalogue of human variation and genotype data citing common human genetic variation from the 1000 Genomes Project and following data.

EggNOG v5.0  is a database of orthology relationships, functional annotation, and gene evolutionary histories; a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses.

OMIM®: Online Mendelian Inheritance in Man® is an online daily updated catalog of human genes and genetic disorders, with a particular focus on the gene-phenotype relationship.

GENCODE project aims to identify and classify all gene features in the human and mouse genomes with high accuracy based on biological evidence, and to release these annotations for the benefit of biomedical research and genome interpretation.

GeneCards, of the Weizmann Institute of Science, is a searchable integrative database that provides comprehensive, user-friendly information on all annotated and predicted human genes. The knowledgebase automatically integrates gene-centric data from ~150 web sources, including genomic, transcriptomic, proteomic, genetic, clinical and functional information.

Dfam database is an open collection of Transposable Element DNA sequence alignments, consensus sequences, and genome annotations.

NDB (Nucleic Acid Database) contains information about experimentally determined nucleic acids and complex assemblies.  Use the NDB to perform searches based on annotations relating to sequence, structure, and function, and to download, analyze, and learn about nucleic acids.

GIX (Gene Information eXtension) is a browser extension that allows you to retrieve information about a gene product directly on any webpage simply by double clicking an official gene name, synonym or supported accession.

BioGRID ORCS (BioGRID Open Repository of CRISPR Screens) is a curated repository of CRISPR screens.

KEGG (Kyoto Encyclopedia of Genes and Genomics) is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies.

 

See all Databases