Storage & redundancy Records in sequence databases are deposited from a wide range of sources, from individual researchers to large genome sequencing centers. As a result, the sequences themselves, and especially the biological
annotations attached to these sequences, may vary in quality. There is much
redundancy, as multiple labs may submit numerous sequences that are identical, or nearly identical, to others in the databases. Many
sequence annotations are based not on laboratory experiments, but on the results of
sequence similarity searches for previously annotated sequences. Once a sequence has been annotated based on similarity to others, and itself deposited in the database, it can also become the basis for future annotations. This can lead to a
transitive annotation problem because there may be several such annotation transfers by sequence similarity between a particular database record and actual
wet lab experimental information. Therefore, care must be taken when interpreting the annotation data from sequence databases.
Scoring methods Most of the current database search algorithms rank
alignment by a score, which is usually a particular scoring system. The solution towards solving this issue is found by making a variety of scoring systems available to suit to the specific problem.
Alignment statistics Searching algorithm often produce an ordered list lacking biological significance. ==See also==