Details of implementation are important for performance, particularly for decoding. For encoding, no clear advantage is gained by using a
linked list, so using an
array to store the list is acceptable, with worst-case performance
O(nk), where n is the length of the data to be encoded and k is the number of values (generally a constant for a given implementation). The typical performance is better because frequently used symbols are more likely to be at the front and will produce earlier hits. This is also the idea behind a
Move-to-front self-organizing list. However, for decoding, we can use specialized data structures to greatly improve performance.
Python This is a possible implementation of the move-to-front algorithm in
Python. from collections.abc import Generator, Iterable class MoveToFront: """ >>> mtf = MoveToFront() >>> list(mtf.encode("Wikipedia")) [87, 105, 107, 1, 112, 104, 104, 3, 102] >>> mtf.decode([87, 105, 107, 1, 112, 104, 104, 3, 102]) 'Wikipedia' >>> list(mtf.encode("wikipedia")) [119, 106, 108, 1, 113, 105, 105, 3, 103] >>> mtf.decode([119, 106, 108, 1, 113, 105, 105, 3, 103]) 'wikipedia' """ def __init__(self, common_dictionary: Iterable[int] = range(256)): """ Instead of always transmitting an "original" dictionary, it is simpler to just agree on an initial set. Here we use the 256 possible values of a byte. """ # consume the iterable so it can be used multiple times self.common_dictionary = list(common_dictionary) def encode(self, plain_text: str) -> Generator[int]: # Changing the common dictionary is a bad idea. Make a copy. dictionary = list(self.common_dictionary) # Read in each character for c in plain_text.encode("latin-1"): # Change to bytes for 256. # Find the rank of the character in the dictionary [O(k)] rank = dictionary.index(c) # the encoded character yield rank # Update the dictionary [Θ(k) for insert] dictionary.pop(rank) dictionary.insert(0, c) def decode(self, compressed_data: Iterable[int]) -> str: """ Inverse function that recover the original text """ dictionary = list(self.common_dictionary) plain_text = [] # Read in each rank in the encoded text for rank in compressed_data: # Remove the character of that rank from the dictionary e = dictionary.pop(rank) plain_text.append(e) # Insert the character at the beginning of dictionary dictionary.insert(0, e) return bytes(plain_text).decode("latin-1") # Return original string In this example we can see the MTF code taking advantage of the three repetitive 's in the input word. The common dictionary here, however, is less than ideal since it is initialized with more commonly used
ASCII printable characters put after little-used control codes, against the MTF code's design intent of keeping what's commonly used in the front. If one rotates the dictionary to put the more-used characters in earlier places, a better encoding can be obtained: from itertools import chain def block32(x): return range(x, x+32) class MoveToFrontMoreCommon(MoveToFront): """ >>> mtf = MoveToFrontMoreCommon() >>> list(mtf.encode("Wikipedia")) [55, 10, 12, 1, 17, 9, 9, 3, 7] """ def __init__(self): super().__init__( chain( # Sort the ASCII blocks: block32(ord("a") - 1), # first lowercase, block32(ord("A") - 1), # then uppercase, block32(ord("!") - 1), # punctuation/number, block32(0), # control codes, range(128, 256), # and finally the non-ASCII stuff ) ) if __name__ == "__main__": import doctest doctest.testmod() ==Use in practical data compression algorithms==