MUSCLE (alignment software)

MUltiple Sequence Comparison by Log-Expectation (MUSCLE) is a computer software for multiple sequence alignment of protein and nucleotide sequences. It is licensed as public domain. The method was published by Robert C. Edgar in two papers in 2004. The first paper, published in Nucleic Acids Research, introduced the sequence alignment algorithm. The second paper, published in BMC Bioinformatics, presented more technical details. MUSCLE up to version 3 uses a progressive-refinement method. Since version 5 it uses a hidden Markov model similar to ProbCons.

History

Robert C. Edgar Edgar graduated in 1982 from University College London, BSc in Physics, PhD in Particle physics. He pursued software development post-graduation and founded his own company, Parity Software, in 1988. From 2001-present day Edgar has contributed to or been the sole creator of multiple software programs, including MUSCLE and USEARCH. The two originally published MUSCLE papers have been cited more than 58,979 times combined. The paper “MUSCLE: multiple sequence alignment with high accuracy and high throughput” has received more than 49,052 citations, while “MUSCLE: a multiple sequence alignment method with reduced time and space complexity” has been cited over 9,936 times. Muscle Versions History == Muscle5 ==

Muscle5

Overview In late 2021, Edgar released Muscle5 (also referred to as Muscle v5), an updated version of the MUSCLE software. It introduces several innovations aimed at improving alignment accuracy and reducing bias found in other MSA algorithms. Traditional tools such as Clustal Omega, MAFFT, and earlier versions of MUSCLE rely on progressive alignment strategies that produce a single alignment. Muscle5, in contrast, generates an ensemble of high-accuracy alignments by perturbing a hidden Markov model and permuting its guide tree. At its core, the algorithm is a parallelized reimplementation of ProbCons, and is designed to scale efficiently to large datasets. Muscle5 has demonstrated improved benchmark performance compared to leading MSA methods across several datasets, including BAliBASE, BRAliBASE, and PREFAB. Ensembles A key innovation in Muscle5 is the use of alignment ensembles, which provide unbiased metrics of confidence in alignments. Each individual MSA (replicate) in the ensemble uses fixed but independently chosen parameters for the hidden Markov model and guide tree, allowing results to be averaged over a diverse set of replicates. This enables biologists to assess how sensitive their downstream analyses are to alignment uncertainty by comparing results across the ensemble. == Old algorithm==

Old algorithm

The MUSCLE algorithm (before v5) proceeds in three stages: the draft progressive, improved progressive, and refinement stage. Stage 1: Draft Progressive In this first stage, the algorithm produces a multiple alignment, emphasizing speed over accuracy. This step begins by computing the k-mer distance for every pair of input sequences to create a distance matrix. UPGMA clusters the distance matrix to produce a binary tree. From this tree a progressive alignment is constructed, beginning with the creation of profiles for each leaf of the tree. For every node in the tree, a pairwise alignment is constructed of the two child profiles, creating a new profile to be assigned to that node. This continues until there is a multiple sequence alignment of all input sequences at the root of the tree. Given N input sequences and L as the average sequence length, the time complexity of the draft progressive stage is O(N^2 \cdot L + N \cdot L^2). Here, the pairwise k-mer distance calculation is computed as O(N^2 \cdot L), and the progressive alignment steps take O(N \cdot L^2), where O denotes the asymptotic upper bound. The space complexity is O(N \cdot L) as the algorithm maintains profiles and alignments for each sequence across the tree. These two methods differ in their ability to handle low similarity sequences with the iterative method providing more accurate results. The other way methods differ is with their computational needs. Originally MUSCLE had middling CPU demands in comparison to other programs but was definitely higher than the progressive methods. Outside the alignment scores, MUSCLE was less computationally demanding in both time to execute the alignment and the memory demand. ==Integration==

Integration

MUSCLE is widely supported across multiple bioinformatics platforms. It is fully integrated into software programs such as CodonCode Aligner, DNASTAR's Lasergene, Geneious, and MacVector, and is also available as a plug-in for Sequencher, MEGA, UGENE, and AliView. Users can also access MUSCLE as a web service via the European Molecular Biology Laboratory (EMBL)-European Bioinformatics Institute (EBI) or T-Coffee. MUSCLE can also be downloaded by users on their personal devices via the . ==See also==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com