Besides using entropy coding as a way to compress digital data, an entropy encoder can also be used to measure the amount of
similarity between
streams of data and already existing classes of data. This is done by generating an entropy coder/compressor for each class of data; unknown data is then
classified by feeding the uncompressed data to each compressor and seeing which compressor yields the highest compression. The coder with the best compression is probably the coder trained on the data that was most similar to the unknown data. This approach is grounded in the concept of
normalized compression distance, a parameter-free, universal similarity metric based on compression that approximates the uncomputable
normalized information distance. == See also ==