Overview Two limiting cases of X-ray crystallography—"small-molecule" (which includes continuous inorganic solids) and "macromolecular" crystallography—are often used. Small-molecule crystallography typically involves crystals with fewer than 100 atoms in their
asymmetric unit; such crystal structures are usually so well resolved that the atoms can be discerned as isolated "blobs" of electron density. In contrast, macromolecular crystallography often involves tens of thousands of atoms in the unit cell. Such crystal structures are generally less well-resolved; the atoms and chemical bonds appear as tubes of electron density, rather than as isolated atoms. In general, small molecules are also easier to crystallize than macromolecules; however, X-ray crystallography has proven possible even for viruses and proteins with hundreds of thousands of atoms, through improved crystallographic imaging and technology. The technique of single-crystal X-ray crystallography has three basic steps. The first—and often most difficult—step is to obtain an adequate crystal of the material under study. The crystal should be sufficiently large (typically larger than 0.1 mm in all dimensions), pure in composition and regular in structure, with no significant internal
imperfections such as cracks or
twinning. In the second step, the crystal is placed in an intense beam of X-rays, usually of a single wavelength (
monochromatic X-rays), producing the regular pattern of reflections. The angles and intensities of diffracted X-rays are measured, with each compound having a unique diffraction pattern. As the crystal is gradually rotated, previous reflections disappear and new ones appear; the intensity of every spot is recorded at every orientation of the crystal. Multiple data sets may have to be collected, with each set covering slightly more than half a full rotation of the crystal and typically containing tens of thousands of reflections. In the third step, these data are combined computationally with complementary chemical information to produce and refine a model of the arrangement of atoms within the crystal. The final, refined model of the atomic arrangement—now called a
crystal structure—is usually stored in a public database.
Crystallization . Crystals used in X-ray crystallography may be smaller than a millimeter across. Although crystallography can be used to characterize the disorder in an impure or irregular crystal, crystallography generally requires a pure crystal of high regularity to solve the structure of a complicated arrangement of atoms. Pure, regular crystals can sometimes be obtained from natural or synthetic materials, such as samples of metals, minerals or other macroscopic materials. The regularity of such crystals can sometimes be improved with macromolecular crystal
annealing and other methods. However, in many cases, obtaining a diffraction-quality crystal is the chief barrier to solving its atomic-resolution structure. Small-molecule and macromolecular crystallography differ in the range of possible techniques used to produce diffraction-quality crystals. Small molecules generally have few degrees of conformational freedom, and may be crystallized by a wide range of methods, such as
chemical vapor deposition and
recrystallization. By contrast, macromolecules generally have many degrees of freedom and their crystallization must be carried out so as to maintain a stable structure. For example, proteins and larger
RNA molecules cannot be crystallized if their tertiary structure has been
unfolded; therefore, the range of crystallization conditions is restricted to solution conditions in which such molecules remain folded. Protein crystals are almost always grown in solution. The most common approach is to lower the solubility of its component molecules very gradually; if this is done too quickly, the molecules will precipitate from solution, forming a useless dust or amorphous gel on the bottom of the container. Crystal growth in solution is characterized by two steps:
nucleation of a microscopic crystallite (possibly having only 100 molecules), followed by
growth of that crystallite, ideally to a diffraction-quality crystal. The solution conditions that favor the first step (nucleation) are not always the same conditions that favor the second step (subsequent growth). The solution conditions should
disfavor the first step (nucleation) but
favor the second (growth), so that only one large crystal forms per droplet. If nucleation is favored too much, a shower of small crystallites will form in the droplet, rather than one large crystal; if favored too little, no crystal will form whatsoever. Other approaches involve crystallizing proteins under oil, where aqueous protein solutions are dispensed under liquid oil, and water evaporates through the layer of oil. Different oils have different evaporation permeabilities, therefore yielding changes in concentration rates from different percipient/protein mixture. It is difficult to predict good conditions for nucleation or growth of well-ordered crystals. In practice, favorable conditions are identified by
screening; a very large batch of the molecules is prepared, and a wide variety of crystallization solutions are tested. Hundreds, even thousands, of solution conditions are generally tried before finding the successful one. The various conditions can use one or more physical mechanisms to lower the solubility of the molecule; for example, some may change the pH, some contain salts of the
Hofmeister series or chemicals that lower the dielectric constant of the solution, and still others contain large polymers such as
polyethylene glycol that drive the molecule out of solution by entropic effects. It is also common to try several temperatures for encouraging crystallization, or to gradually lower the temperature so that the solution becomes supersaturated. These methods require large amounts of the target molecule, as they use high concentration of the molecule(s) to be crystallized. Due to the difficulty in obtaining such large quantities (
milligrams) of crystallization-grade protein, robots have been developed that are capable of accurately dispensing crystallization trial drops that are in the order of 100
nanoliters in volume. This means that 10-fold less protein is used per experiment when compared to crystallization trials set up by hand (in the order of 1
microliter). Several factors are known to inhibit crystallization. The growing crystals are generally held at a constant temperature and protected from shocks or vibrations that might disturb their crystallization. Impurities in the molecules or in the crystallization solutions are often inimical to crystallization. Conformational flexibility in the molecule also tends to make crystallization less likely, due to entropy. Molecules that tend to self-assemble into regular helices are often unwilling to assemble into crystals. Crystals can be marred by
twinning, which can occur when a unit cell can pack equally favorably in multiple orientations; although recent advances in computational methods may allow solving the structure of some twinned crystals. Having failed to crystallize a target molecule, a crystallographer may try again with a slightly modified version of the molecule; even small changes in molecular properties can lead to large differences in crystallization behavior. For proteins, impurities have been shown to sometimes enhance and sometimes disrupt crystal growth. Oscillation with audible sound sometimes works. These issues may be related to the
phase separation characteristic of protein crystalation.
Data collection Mounting the crystal The crystal is mounted for measurements so that it may be held in the X-ray beam and rotated. There are several methods of mounting. In the past, crystals were loaded into glass capillaries with the crystallization solution (the
mother liquor). Crystals of small molecules are typically attached with oil or glue to a glass fiber or a loop, which is made of nylon or plastic and attached to a solid rod. Protein crystals are scooped up by a loop, then flash-frozen with
liquid nitrogen. This freezing reduces the radiation damage of the X-rays, as well as thermal motion (the Debye-Waller effect). However, untreated protein crystals often crack if flash-frozen; therefore, they are generally pre-soaked in a cryoprotectant solution before freezing. This pre-soak may itself cause the crystal to crack, ruining it for crystallography. Generally, successful cryo-conditions are identified by trial and error. The capillary or loop is mounted on a
goniometer, which allows it to be positioned accurately within the X-ray beam and rotated. Since both the crystal and the beam are often very small, the crystal must be centered within the beam to within ~25 micrometers accuracy, which is aided by a camera focused on the crystal. The most common type of goniometer is the "kappa goniometer", which offers three angles of rotation: the ω angle, which rotates about an axis perpendicular to the beam; the κ angle, about an axis at ~50° to the ω axis; and, finally, the φ angle about the loop/capillary axis. When the κ angle is zero, the ω and φ axes are aligned. The κ rotation allows for convenient mounting of the crystal, since the arm in which the crystal is mounted may be swung out towards the crystallographer. The oscillations carried out during data collection (mentioned below) involve the ω axis only. An older type of goniometer is the four-circle goniometer, and its relatives such as the six-circle goniometer.
Recording the reflections The relative intensities of the reflections provides information to determine the arrangement of molecules within the crystal in atomic detail. The intensities of these reflections may be recorded with
photographic film, an area detector (such as a
pixel detector) or with a
charge-coupled device (CCD) image sensor. The peaks at small angles correspond to low-resolution data, whereas those at high angles represent high-resolution data; thus, an upper limit on the eventual resolution of the structure can be determined from the first few images. Some measures of diffraction quality can be determined at this point, such as the
mosaicity of the crystal and its overall disorder, as observed in the peak widths. Some pathologies of the crystal that would render it unfit for solving the structure can also be diagnosed quickly at this point. One set of spots is insufficient to reconstruct the whole crystal; it represents only a small slice of the full three dimensional set. To collect all the necessary information, the crystal must be rotated step-by-step through 180°, with an image recorded at every step; actually, slightly more than 180° is required to cover
reciprocal space, due to the curvature of the
Ewald sphere. However, if the crystal has a higher symmetry, a smaller angular range such as 90° or 45° may be recorded. The rotation axis should be changed at least once, to avoid developing a "blind spot" in reciprocal space close to the rotation axis. It is customary to rock the crystal slightly (by 0.5–2°) to catch a broader region of reciprocal space. Multiple data sets may be necessary for certain
phasing methods. For example,
multi-wavelength anomalous dispersion phasing requires that the scattering be recorded at least three (and usually four, for redundancy) wavelengths of the incoming X-ray radiation. A single crystal may degrade too much during the collection of one data set, owing to radiation damage; in such cases, data sets on multiple crystals must be taken.
Crystal symmetry, unit cell, and image scaling The recorded series of two-dimensional diffraction patterns, each corresponding to a different crystal orientation, is converted into a three-dimensional set. Data processing begins with
indexing the reflections. This means identifying the dimensions of the unit cell and which image peak corresponds to which position in reciprocal space. A byproduct of indexing is to determine the symmetry of the crystal, i.e., its
space group. Some space groups can be eliminated from the beginning. For example, reflection symmetries cannot be observed in chiral molecules; thus, only 65 space groups of 230 possible are allowed for protein molecules which are almost always chiral. Indexing is generally accomplished using an
autoindexing routine. Having assigned symmetry, the data is then
integrated. This converts the hundreds of images containing the thousands of reflections into a single file, consisting of (at the very least) records of the
Miller index of each reflection, and an intensity for each reflection (at this state the file often also includes error estimates and measures of partiality (what part of a given reflection was recorded on that image)). A full data set may consist of hundreds of separate images taken at different orientations of the crystal. These have to be merged and scaled using peaks that appear in two or more images (
merging) and scaling so there is a consistent intensity scale. Optimizing the intensity scale is critical because the relative intensity of the peaks is the key information from which the structure is determined. The repetitive technique of crystallographic data collection and the often high symmetry of crystalline materials cause the diffractometer to record many symmetry-equivalent reflections multiple times. This allows calculating the symmetry-related
R-factor, a reliability index based upon how similar are the measured intensities of symmetry-equivalent reflections, thus assessing the quality of the data.
Initial phasing The intensity of each diffraction 'spot' is proportional to the modulus squared of the
structure factor. The structure factor is a
complex number containing information relating to both the
amplitude and
phase of a
wave. In order to obtain an interpretable
electron density map, both amplitude and phase must be known (an electron density map allows a crystallographer to build a starting model of the molecule). The phase cannot be directly recorded during a diffraction experiment: this is known as the
phase problem. Initial phase estimates can be obtained in a variety of ways: • '''
Ab initio phasing
or direct methods''' – This is usually the method of choice for small molecules (R = \frac{\sum_{\text{all reflections}} \left|F_\text{obs} - F_\text{calc} \right|}{\sum_{\text{all reflections}} \left|F_\text{obs} \right|}, where
F is the
structure factor. A similar quality criterion is
Rfree, which is calculated from a subset (~10%) of reflections that were not included in the structure refinement. Both
R factors depend on the resolution of the data. As a rule of thumb,
Rfree should be approximately the resolution in angstroms divided by 10; thus, a data-set with 2 Å resolution should yield a final
Rfree ~ 0.2. Chemical bonding features such as stereochemistry, hydrogen bonding and distribution of bond lengths and angles are complementary measures of the model quality. In iterative model building, it is common to encounter phase bias or model bias: because phase estimations come from the model, each round of calculated map tends to show density wherever the model has density, regardless of whether there truly is a density. This problem can be mitigated by maximum-likelihood weighting and checking using
omit maps. It may not be possible to observe every atom in the asymmetric unit. In many cases,
crystallographic disorder smears the electron density map. Weakly scattering atoms such as hydrogen are routinely invisible. It is also possible for a single atom to appear multiple times in an electron density map, e.g., if a protein sidechain has multiple (<4) allowed conformations. In still other cases, the crystallographer may detect that the covalent structure deduced for the molecule was incorrect, or changed. For example, proteins may be cleaved or undergo post-translational modifications that were not detected prior to the crystallization.
Disorder A common challenge in refinement of crystal structures results from crystallographic disorder. Disorder can take many forms but in general involves the coexistence of two or more species or conformations. Failure to recognize disorder results in flawed interpretation. Pitfalls from improper modeling of disorder are illustrated by the discounted hypothesis of
bond stretch isomerism. Disorder is modelled with respect to the relative population of the components, often only two, and their identity. In structures of large molecules and ions, solvent and counterions are often disordered.
Applied computational data analysis The use of computational methods for the powder X-ray diffraction data analysis is now generalized. It typically compares the experimental data to the simulated diffractogram of a model structure, taking into account the instrumental parameters, and refines the structural or microstructural parameters of the model using
least squares based minimization algorithm. Most available tools allowing phase identification and structural refinement are based on the
Rietveld method, some of them being open and free software such as FullProf Suite, Jana2006, MAUD, Rietan, GSAS, etc. while others are available under commercial licenses such as Diffrac.Suite TOPAS, Match!, etc. Most of these tools also allow
Le Bail refinement (also referred to as profile matching), that is, refinement of the cell parameters based on the Bragg peaks positions and peak profiles, without taking into account the crystallographic structure by itself. More recent tools allow the refinement of both structural and microstructural data, such as the FAULTS program included in the FullProf Suite, which allows the refinement of structures with planar defects (e.g. stacking faults, twinnings, intergrowths).
Deposition of the structure Once the model of a molecule's structure has been finalized, it is often deposited in a
crystallographic database such as the
Cambridge Structural Database (for small molecules), the
Inorganic Crystal Structure Database (ICSD) (for inorganic compounds) or the
Protein Data Bank (for protein and sometimes nucleic acids). Many structures obtained in private commercial ventures to crystallize medicinally relevant proteins are not deposited in public crystallographic databases. == Contribution of women to X-ray crystallography ==