Digital genetic sequences may be analyzed using the tools of
bioinformatics to attempt to determine its function.
Genetic testing The DNA in an organism's
genome can be analyzed to
diagnose vulnerabilities to inherited
diseases, and can also be used to determine a child's paternity (genetic father) or a person's
ancestry. Normally, every person carries two variations of every
gene, one inherited from their mother, the other inherited from their father. The
human genome is believed to contain around 20,000–25,000 genes. In addition to studying
chromosomes to the level of individual genes, genetic testing in a broader sense includes
biochemical tests for the possible presence of
genetic diseases, or mutant forms of genes associated with increased risk of developing genetic disorders. Genetic testing identifies changes in chromosomes, genes, or proteins. Usually, testing is used to find changes that are associated with inherited disorders. The results of a genetic test can confirm or rule out a suspected genetic condition or help determine a person's chance of developing or passing on a genetic disorder. Several hundred genetic tests are currently in use, and more are being developed.
Sequence alignment In bioinformatics, a sequence alignment is a way of arranging the sequences of
DNA,
RNA, or
protein to identify regions of similarity that may be due to functional,
structural, or
evolutionary relationships between the sequences. If two sequences in an alignment share a common ancestor, mismatches can be interpreted as
point mutations and gaps as
insertion or
deletion mutations (
indels) introduced in one or both lineages in the time since they diverged from one another. In sequence alignments of proteins, the degree of similarity between
amino acids occupying a particular position in the sequence can be interpreted as a rough measure of how
conserved a particular region or
sequence motif is among lineages. The absence of substitutions, or the presence of only very conservative substitutions (that is, the substitution of amino acids whose
side chains have similar biochemical properties) in a particular region of the sequence, suggest that this region has structural or functional importance. Although DNA and RNA
nucleotide bases are more similar to each other than are amino acids, the conservation of base pairs can indicate a similar functional or structural role.
Computational phylogenetics makes extensive use of sequence alignments in the construction and interpretation of
phylogenetic trees, which are used to classify the evolutionary relationships between homologous genes represented in the genomes of divergent species. The degree to which sequences in a query set differ is qualitatively related to the sequences' evolutionary distance from one another. Roughly speaking, high sequence identity suggests that the sequences in question have a comparatively young
most recent common ancestor, while low identity suggests that the divergence is more ancient. This approximation, which reflects the "
molecular clock" hypothesis that a roughly constant
rate of evolutionary change can be used to extrapolate the elapsed time since two genes first diverged (that is, the
coalescence time), assumes that the effects of mutation and
selection are constant across sequence lineages. Therefore, it does not account for possible differences among organisms or species in the rates of
DNA repair or the possible functional conservation of specific regions in a sequence. (In the case of nucleotide sequences, the molecular clock hypothesis in its most basic form also discounts the difference in acceptance rates between
silent mutations that do not alter the meaning of a given
codon and other mutations that result in a different
amino acid being incorporated into the protein.) More statistically accurate methods allow the evolutionary rate on each branch of the phylogenetic tree to vary, thus producing better estimates of coalescence times for genes.
Sequence motifs Frequently the primary structure encodes motifs that are of functional importance. Some examples of sequence motifs are: the C/D and H/ACA boxes of
snoRNAs,
Sm binding site found in spliceosomal RNAs such as
U1,
U2,
U4,
U5,
U6,
U12 and
U3, the
Shine-Dalgarno sequence, the
Kozak consensus sequence and the
RNA polymerase III terminator.
Sequence entropy In
bioinformatics, a sequence entropy, also known as sequence complexity or information profile, is a numerical sequence providing a quantitative measure of the local complexity of a DNA sequence, independently of the direction of processing. The manipulations of the information profiles enable the analysis of the sequences using alignment-free techniques, such as for example in motif and rearrangements detection. == See also ==