Biological Databases – Introduction
  • Post last modified:2023-12-09

Biological Databases

Bioinformatics databases or biological databases are computerized and organized storehouses of biological information that provides a standardized way for searching and updating data. They can be defined as libraries containing data collected from scientific experiments, published literature and computational analysis. These are the databases consisting of biological data like protein sequencing, molecular structure, DNA sequences, etc. in an organized form. They are convenient system to properly store, search and retrieve data. Biological databases are free to use and contain a huge collection of a variety of biological data.

 

Database Management

Online biological databases provide an interface to facilitate easy and efficient recording, storing, analyzing and retrieval of biological data.  This is achieved through the application of computer software (Database Management System, DBMS). Computer tools are there to manipulate the biological data, update, delete, insert, etc. Scientists, researchers from all over the world, enter their experiment data and results in a biological database so that it is available to a wider audience.

 

Uses of Biological Databases

Biological databases act as a storage of information. They help remove the redundancy of data. They help scientists to understand the concepts of biological phenomena. Using the proper tools, existing databases can be used to establish new data, e.g., predicting protein structure by artificial intelligence. Biological databases help the researchers to make new discoveries and achievements in medicine, agriculture, etc.

 

Types of Biological Databases

Biological databases are different types based on nature of information and manner (complexity) of data storage.  Data comes in several different formats like text, sequence data, structure, links, etc. and these needs to be considered while creating the databases.

There are basically 3 types of biological databases:

1- Primary databases are archival data bases. They archive the experimental results submitted by the scientists. The primary database is populated with experimentally derived data like genome sequence, macromolecular structure, etc. The data entered remains uncurated (no modifications are performed over the data). It is made accessible to users without any change.

The data are given accession numbers when they are entered into the database. The same data can later be retrieved using the accession number. Accession number identifies each data uniquely and it never changes.

Examples of primary databases: nucleic acid databases like GenBank and DDBJ and protein databases like Protein Data Bank (PDB).

2- Secondary databases contain data that are analysis results of the primary databases. Computational algorithms are applied to the primary database and meaningful and informative data is stored inside the secondary database. The data here are highly curated (processed before presented in the database). A secondary database contains more valuable knowledge compared to the primary database.

Examples of secondary databases: InterPro (protein families, motifs, and domains) and UniProt Knowledgebase (sequence and functional information on proteins).

3- Composite databases are collections of several (usually more than two) primary database resources. This helps in the lessening the tedious task of searching through multiple databases referring to the same data. The approach used, for instance, the search algorithm employed, differs considerably in every composite database. For example, DrugBank offers details on drug and their targets, BioGraph incorporates assorted knowledge of biomedical science and BioModels is a repository of mathematical models of biological and biomedical systems. There are many composite databases which provide users with various tools and software for analysis of data.

The diversity of databases makes it challenging to identify which database should be used to solve a particular problem because database nomenclature is not standardized, and data formats are also varying.  The choice of a particular database is based on the purpose of usage and needs a lot of training.

 

For more details: Bioinformatics Database Resources, DOI:10.4018/978-1-5225-1871-6.ch004

 

 

See various Databases