MarketInternational Chemical Identifier
Company Profile

International Chemical Identifier

The International Chemical Identifier is a textual identifier for chemical substances, designed to provide a standard way to encode molecular information and to facilitate the search for such information in databases and on the web. Initially developed by the International Union of Pure and Applied Chemistry (IUPAC) and National Institute of Standards and Technology (NIST) from 2000 to 2005, the format and algorithms are non-proprietary. Since May 2009, it has been developed by the InChI Trust, a nonprofit charity from the United Kingdom which works to implement and promote the use of InChI.

Generation
In order to avoid generating different InChIs for tautomeric structures, before generating the InChI, an input chemical structure is normalized to reduce it to its so-called core parent structure. This may involve changing bond orders, rearranging formal charges and possibly adding and removing protons. Different input structures may give the same result; for example, acetic acid and acetate would both give the same core parent structure, that of acetic acid. A core parent structure may be disconnected, consisting of more than one component, in which case the sublayers in the InChI usually consist of sublayers for each component, separated by semicolons (periods for the chemical formula sublayer). One way this can happen is that all metal atoms are disconnected during normalization; so, for example, the InChI for tetraethyllead will have five components, one for lead and four for the ethyl groups. The first, main, layer of the InChI refers to this core parent structure, giving its chemical formula, non-hydrogen connectivity without bond order (/c sublayer) and hydrogen connectivity (/h sublayer.) The /q portion of the charge layer gives its charge, and the /p portion of the charge layer tells how many protons (hydrogen ions) must be added to or removed from it to regenerate the original structure. If present, the stereochemical layer, with sublayers b, /t, /m and /s, gives stereochemical information, and the isotopic layer /i (which may contain sublayers /h, /b, /t, /m and /s) gives isotopic information. These are the only layers which can occur in a standard InChI. If the user wants to specify an exact tautomer, a fixed hydrogen layer /f can be appended, which may contain various additional sublayers; this cannot be done in standard InChI though, so different tautomers will have the same standard InChI (for example, alanine will give the same standard InChI whether input in a neutral or a zwitterionic form.) Finally, a nonstandard reconnected /r layer can be added, which effectively gives a new InChI generated without breaking bonds to metal atoms. This may contain various sublayers, including /f. == Format and layers ==
Format and layers
Every InChI starts with the string InChI= followed by the version number, currently 1. If the InChI is standard, this is followed by the letter S for standard InChIs, which is a fully standardized InChI flavor maintaining the same level of attention to structure details and the same conventions for drawing perception. The remaining information is structured as a sequence of layers and sub-layers, with each layer providing one specific type of information. The layers and sub-layers are separated by the delimiter / and start with a characteristic prefix letter (except for the chemical formula sub-layer of the main layer). The six layers with important sublayers are: • Main layer (always present) • Chemical formula (no prefix). This is the only sublayer that must occur in every InChI. Numbers used throughout the InChI are given in the formula's element order excluding hydrogen atoms. For example, /C10H16N5O13P3 implies that atoms numbered 1–10 are carbons, 11–15 are nitrogens, 16–28 are oxygens, and 29–31 are phosphorus. • Atom connections (/c). The atoms in the chemical formula (except for hydrogens) are numbered in sequence; this sublayer describes which atoms are connected by bonds to which other ones. The type of those bonds is later specified in the stereochemical layer (/b). • Hydrogen atoms (/h). Describes how many hydrogen atoms are connected to each of the other atoms. • Charge layer • charge sublayer (/q) • proton sublayer (/p for protons) • Stereochemical layer • double bonds and cumulenes (/b). • tetrahedral stereochemistry of atoms and allenes. First /t describes the relative configuration, which implies a preference for one of the mirror forms. Then /m is used to choose whether to mirror the molecule described by /t, if an absolute configuration is requested. • type of stereochemistry information (/s). /s1 for absolute, /s2 for relative (unspecified mix of chiralities), /s3 for racemic (equal mix of both chiralities). • Isotopic layer (/i), may include sublayers: • sublayer /h for isotopic hydrogen • sublayers /b, /t, /m, /s for isotopic stereochemistry • Fixed-H layer (/f) for tautomeric hydrogens; contains some or all of the above types of layers except atom connections; may end with o sublayer. • Reconnected layer (/r); contains the whole InChI of a structure with reconnected metal atoms The delimiter-prefix format has the advantage that a user can easily use a wildcard search to find identifiers that match only in certain layers. Standard InChI adds the following constraints: • The /f, /o, and /r (sub)layers are never included in standard InChI. • If stereochemistry is specified, it can only be absolute /s1. Unknown stereo designations are treated as undefined. • Organometallic connectivity does not include bonds to the metal. == InChIKey ==
InChIKey
The condensed, 27 character InChIKey is a hashed version of the full InChI (using the SHA-256 algorithm), designed to allow for easy web searches of chemical compounds. The InChIKey currently consists of three parts separated by hyphens, of 14, 10 and one character(s), respectively, like xxxxxxxxxxxxxx-yyyyyyyyfv-p. A few additional lengths are used in RInChI: • 28 (14 × 2) bits yield a 6-character hash; only the truncated 4-character form is used. • 56 (14 × 4) bits yield a 12-character hash, the truncated form being 10 characters. • 78 (65 + 14 - 1) bits yield a 17-character hash, with one bit used twice. The first 80 bits of the SHA-256 for an empty string is e3 b0 c4 42 98 fc 1c 14 9a fb. This results in the following base26 strings for this hash: UHFF, UHFFFAOY, UHFFFADPSC, UHFFFADPSCTJ, UHFFFADPSCTJAU, UHFFFADPSCTJAUYIS. == AuxInfo ==
AuxInfo
The auxiliary information (AuxInfo) string is produced by InChI software alongside the InChI string. For example, the (±)-borneol /s2 example produces: AuxInfo=1/0/N:1,2,3,4,5,6,7,8,9,10,11/E:(1,2)/rA:13cCCCCCCCCCCOHH/rB:;;;s4;;s4s6;s6;s1s2s7;n3s5s8s9;P8;P7;s8;/rC:2.0857,-1.1788,0;3.0905,.273,0;2.6864,-1.7772,0;4.5619,-2.283,0;3.6719,-2.2295,0;5.2528,-.9411,0;4.5862,-1.4963,0;4.4381,-.864,0;3.0628,-.7814,0;3.6539,-1.3571,0;3.6343,-.1809,0;5.5343,-1.9585,0;4.8482,.1078,0; "AuxInfo contains, in particular, atom non-stereo equivalence information, mapping input atom positions to output positions, and 'reversibility' information for re-drawing the structure." The reversibility information can be used to regenerate the source structure (such as a MOLFILE with 2D or 3D coordinates) without needing an InChI. The InChI user guide describes the format in detail. The parts seen here are: • 1/0 refers to InChI version 1, normalization type 0. • /N: maps InChI's atom numbering to the input's atom numbering. • /E: describes the equivalence between atoms. • /rA: describes reversibility information for atoms. • /rB: describes reversibility information for bonds. • /rC: describes reversibility information for coordinates. Here 2D coordinates are used; a more realistic depiction for this molecule would be 3D. The full complement of tags are: 1/0/N/E/gE/it/iN/I/E/gE/it/iN/CRV/rA/rB/rC. == Derived formats ==
Derived formats
RInChI RInChI (Reaction InChI, International chemical identifier for reactions) is a standard method for using InChI to describe chemical reactions. An RInChI string consists of several sets of InChI strings for the reactants, products, and agents as well as information required to tag them as such. Example string and breakdown: As shown above, layers that do not involve InChI parts are separated with / as in InChI. Layers that do are separated with <>. Multiple InChI parts are separated with !. An example of a relatively complex (nested) Mixfile is provided below. { "mixfileVersion": 1, "name": "37% wt. Formaldehyde in Water with 10-15% Methanol", "contents": [ { "contents": [ { "name": "formaldehyde", "quantity": 37, "units": "w/w%", "inchi": "InChI=1S/CH2O/c1-2/h1H2", }, { "name": "water", "inchi": "InChI=1S/H2O/h1H2", } ] }, { "name": "methanol", "quantity": [10, 15], "units": "%", "inchi": "InChI=1S/CH4O/c1-2/h2H,1H3", } ] } The corresponding MInChI is: MInChI=0.00.1S/CH2O/c1-2/h1H2&CH4O/c1-2/h2H,1H3&H2O/h1H2/n{{1&3}&2}/g{{37wf-2&}&10:15pp0}. • The first part MInChI=0.00.1S is the version. • The second part /CH2O/c1-2/h1H2&CH4O/c1-2/h2H,1H3&H2O/h1H2 encodes the list of molecules. • The third part /n{{1&3}&2} encodes the order and nesting relation. • The final part /g{{37wf-2&}&10:15pp0} encodes the proportions. It is also possible to create mixfiles with missing chemical formulae and generate MInChI from them; the "third part" of MInChI is intended to adapt to such situations. For more examples, readers can visit the MInChI Demo page. The "Create MInChI" button generates MInChI. Right-clicking on a node and choosing "copy branch" produces its Mixfile representation in the clipboard. == History ==
History
Name The format was originally called IChI (IUPAC Chemical Identifier), then renamed in July 2004 to INChI (IUPAC-NIST Chemical Identifier), and renamed again in November 2004 to InChI (IUPAC International Chemical Identifier), a trademark of IUPAC. Continuing development Scientific direction of the InChI standard is carried out by the IUPAC Division VIII Subcommittee, and funding of subgroups investigating and defining the expansion of the standard is carried out by both IUPAC and the InChI Trust. The InChI Trust funds the development, testing and documentation of the InChI. Current extensions are being defined to handle polymers and mixtures, Markush structures, isotopologues and isotopomers, reactions, organometallics, and nanomaterials, and once accepted by the Division VIII Subcommittee will be added to the algorithm. The continuing development of the standard has been supported since 2010 by the not-for-profit InChI Trust, of which IUPAC is a member. Version 1.06 and was released in December 2020. Version history The InChI Trust has developed software to generate the InChI, InChIKey and other identifiers. The release history of this software follows. ==Adoption==
Adoption
The InChI has been adopted by many larger and smaller databases, including ChemSpider, ChEMBL, Golm Metabolome Database, and PubChem. However, the adoption is not straightforward, and many databases show a discrepancy between the chemical structures and the InChI they contain, which is a problem for linking databases. == See also ==
tickerdossier.comtickerdossier.substack.com