Ligand-based methods are applied when the three-dimensional structure of the target receptor is unknown. Instead of modeling ligand–receptor interactions directly, these approaches rely on the structural and physicochemical properties of known active ligands to predict how new compounds may bind. Pharmacophore modeling is commonly applied in ligand-based drug design when the three-dimensional structure of the target protein is unknown or when multiple active ligands are available. The general workflow involves several steps: •
Selection of active compounds:A representative and structurally diverse set of biologically active ligands is selected. Structural diversity is important to ensure that the resulting model captures essential interaction features rather than compound-specific characteristics. •
Conformer generation: Because ligands are flexible, multiple low-energy conformations (conformers) are generated for each molecule. This step aims to approximate the bioactive conformation, which is usually unknown in ligand-based approaches. •
Molecular alignment and superposition: The generated conformers are superimposed to identify common spatial arrangements of key chemical features. By aligning shared pharmacophoric elements across active ligands, a pharmacophore hypothesis is constructed. Unlike approaches that rely on a single reference structure, pharmacophore modeling integrates information from multiple active compounds. This generally improves robustness and predictive performance, particularly when dealing with chemically diverse ligands. However, because multiple alignments and feature combinations are possible, pharmacophore modeling does not usually yield a single unique solution. Therefore, generated hypotheses must be validated using external test sets or experimental data.
Shape-based virtual screening Shape-based molecular similarity approaches have been established as important and popular virtual screening techniques. At present, the highly optimized screening platform ROCS (Rapid Overlay of Chemical Structures) is considered the de facto industry standard for shape-based, ligand-centric virtual screening. It uses a Gaussian function to define molecular volumes of small organic molecules. The selection of the query conformation is less important, rendering shape-based screening ideal for ligand-based modeling: As the availability of a bioactive conformation for the query is not the limiting factor for screening — it is more the selection of query compound(s) that is decisive for screening performance.
Field-based virtual screening Field-based virtual screening methods extend shape-based similarity approaches by considering not only molecular shape but also the physicochemical interaction fields that govern ligand–receptor recognition. Rather than focusing solely on structural overlap, these methods compare molecular interaction potentials, such as • Electrostatic fields • Hydrophobic fields • Steric fields • Hydrogen-bonding potentials By evaluating these properties in three-dimensional space, field-based approaches aim to capture the underlying interaction patterns responsible for biological activity, while remaining largely independent of the specific chemical scaffold of the query molecule. Compared to purely shape-based methods, field-based screening can provide a more nuanced description of molecular similarity, particularly when structurally distinct compounds share similar interaction profiles.
Quantitative-structure activity relationship Quantitative Structure–Activity Relationship (QSAR) models are predictive models that relate molecular descriptors to biological activity using a dataset of known active and inactive compounds. In contrast to qualitative structure–activity relationship (SAR) analysis, which identifies trends within structural classes, QSAR provides a quantitative mathematical relationship between molecular properties and measured biological responses.
Traditional QSAR methods Traditional QSAR approaches typically rely on predefined molecular descriptors—such as physicochemical properties, topological indices, or electronic parameters—and apply statistical modeling techniques including: • Ordinary least squares (OLS) • Multiple linear regression (MLR) • Partial least squares (PLS) These methods assume a linear relationship between descriptors and biological activity. QSAR models are widely used to prioritize compounds in lead discovery and optimization.
Machine Learning in QSAR Machine learning (ML) approaches can be viewed as an extension of QSAR methodology. Like classical QSAR models, ML-based methods use molecular descriptors or structural representations (such as molecular fingerprint, •
Decision trees •
Support vector machines (SVM) •
Random forests •
k-Nearest neighbors (k-NN) •
Artificial neural networks These models typically output either: • A predicted activity value (regression), or • A probability that a compound is active (classification), which can then be used for ranking in virtual screening. == Structure-based methods ==