Given a template and an alignment, the information contained therein must be used to generate a three-dimensional structural model of the target, represented as a set of
Cartesian coordinates for each atom in the protein. Three major classes of model generation methods have been proposed.
Fragment assembly The original method of homology modeling relied on the assembly of a complete model from
conserved structural fragments identified in closely related solved structures. For example, a modeling study of
serine proteases in
mammals identified a sharp distinction between "core" structural regions conserved in all experimental structures in the class, and variable regions typically located in the
loops where the majority of the sequence differences were localized. Thus unsolved proteins could be modeled by first constructing the conserved core and then substituting variable regions from other proteins in the set of solved structures. Current implementations of this method differ mainly in the way they deal with regions that are not conserved or that lack a template. The variable regions are often constructed with the help of a
protein fragment library.
Segment matching The segment-matching method divides the target into a series of short segments, each of which is matched to its own template fitted from the
Protein Data Bank. Thus, sequence alignment is done over segments rather than over the entire protein. Selection of the template for each segment is based on sequence similarity, comparisons of
alpha carbon coordinates, and predicted
steric conflicts arising from the
van der Waals radii of the divergent atoms between target and template.
Satisfaction of spatial restraints The most common current homology modeling method takes its inspiration from calculations required to construct a three-dimensional structure from data generated by
NMR spectroscopy. One or more target-template alignments are used to construct a set of geometrical criteria that are then converted to
probability density functions for each restraint. Restraints applied to the main protein
internal coordinates –
protein backbone distances and
dihedral angles – serve as the basis for a
global optimization procedure that originally used
conjugate gradient energy minimization to iteratively refine the positions of all heavy atoms in the protein. This method had been dramatically expanded to apply specifically to loop modeling, which can be extremely difficult due to the high flexibility of loops in proteins in
aqueous solution. A more recent expansion applies the spatial-restraint model to
electron density maps derived from
cryoelectron microscopy studies, which provide low-resolution information that is not usually itself sufficient to generate atomic-resolution structural models. To address the problem of inaccuracies in initial target-template sequence alignment, an iterative procedure has also been introduced to refine the alignment on the basis of the initial structural fit. The most commonly used software in spatial restraint-based modeling is
MODELLER and a database called
ModBase has been established for reliable models generated with it. ==Loop modeling==