MarketContent sniffing
Company Profile

Content sniffing

Content sniffing, also known as media type sniffing or MIME sniffing, is the practice of inspecting the content of a byte stream to attempt to deduce the file format of the data within it. Content sniffing is generally used to compensate for a lack of accurate metadata that would otherwise be required to enable the file to be interpreted correctly. Content sniffing techniques tend to use a mixture of techniques that rely on the redundancy found in most file formats: looking for file signatures and magic numbers, and heuristics including searching for well-known representative substrings, the use of byte frequency and n-gram tables, and Bayesian inference.

Charset sniffing
Numerous web browsers use a more limited form of content sniffing to attempt to determine the character encoding of text files for which the MIME type is already known. This technique is known as charset sniffing or codepage sniffing and, for certain encodings, may be used to bypass security restrictions too. For instance, Internet Explorer 7 may be tricked to run JScript in circumvention of its policy by allowing the browser to guess that an HTML-file was encoded in UTF-7. This bug is worsened by the feature of the UTF-7 encoding which permits multiple encodings of the same text and, specifically, alternative representations of ASCII characters. Most encodings do not allow evasive presentations of ASCII characters, so charset sniffing is less dangerous in general because, due to the historical accident of the ASCII-centric nature of scripting and markup languages, characters outside the ASCII repertoire are more difficult to use to circumvent security boundaries, and misinterpretations of character sets tend to produce results no worse than the display of mojibake. == See also ==
tickerdossier.comtickerdossier.substack.com