Polygenic score

In genetics, a polygenic score (PGS) is a number that summarizes the estimated effect of many genetic variants on an individual's phenotype. The PGS is also called the polygenic index (PGI) or genome-wide score; in the context of disease risk, it is called a polygenic risk score or genetic risk score. The score reflects an individual's estimated genetic predisposition for a given trait and can be used as a predictor for that trait. It gives an estimate of how likely an individual is to have a given trait based only on genetics, without taking environmental factors into account; and it is typically calculated as a weighted sum of trait-associated alleles.

Background

DNA in living organisms is the molecular genetic code for life. Although polygenic risk scores from studies in humans have gained the most attention, the basic idea was first introduced for selective plant and animal breeding. Similar to the latter-day approaches of constructing a polygenic risk score, an individual'sanimal or plantbreeding value was calculated to be the combined weight of several single-nucleotide polymorphisms (SNPs) by their individual effects on a trait. Human DNA contains about 3 billion bases. The human genome can be broadly separated into coding and non-coding sequences, where the coding genome encodes instructions for genes, including some of the sequence that codes for proteins. Genome-wide association studies (GWASes) enable mapping phenotypes to the variations in nucleotide bases in human populations. Improvements in methodology and studies with large cohorts have enabled mapping many traits—some of which are diseases—to the human genome. Learning which variations influence which specific traits and how strongly they do so is the key target for constructing polygenic scores in humans. The methods were first considered for humans after the year 2000, and specifically by a 2007 proposal that such scores could be used in human genetics to identify individuals at high risk for disease. The concept was successfully applied in 2009 by researchers who organized a genome-wide association study regarding schizophrenia with the objective of constructing scores of risk propensity. That study was the first to use the term polygenic score for a prediction drawn from a linear combination of single-nucleotide polymorphism (SNP) genotypeswhich was able to explain 3% of the variance in schizophrenia. == Calculation with genome-wide association study ==

Calculation with genome-wide association study

A PRS is constructed from the estimated effect sizes derived from a genome-wide association study (GWAS). In a GWAS, single-nucleotide polymorphisms (SNPs) are tested for an association between cases and controls (see graphic above). The results from a GWAS estimate the strength of the association at each SNP (i.e., the effect size at the SNP) as well as a p-value for statistical significance. A typical score is then calculated by adding the number of risk-modifying alleles across a large number of SNPs, whereby the number of alleles for each SNP is multiplied by the weight for the given SNP. In mathematical form, the estimated polygenic score \hat{S} is obtained as the sum across m number of SNPs with risk-increasing alleles weighted by their weights (i.e., \hat{\beta}_{j}): \hat{S} = \sum_{j=1}^{m} X_{j} \hat{\beta}_{j} This idea can be generalized to the study of any trait and is an example of the more general mathematical term regression analysis. Key considerations Methods for generating polygenic scores in humans are an active area of research. Penalized regression can also be used to construct polygenic scores. From prior information, penalized regression assigns probabilities based on: 1) how many genetic variants are expected to affect a trait and 2) the distribution of their effect sizes. These methods, in effect, "penalize" the large coefficients in a regression model and shrink them conservatively. One popular tool for this approach is "PRS-CS". Another is to use Bayesian methods, first proposed in 2001, that directly incorporate genetic features of a given trait and genomic features like linkage disequilibrium. One Bayesian method uses "linkage disequilibrium prediction" or LDpred. More approaches for developing polygenic risk scores continue to be described. For example, by incorporating effect sizes from populations of different ancestral backgrounds, the predictive ability of scores can be improved. Incorporating knowledge of the functional roles of specific genomic chunks can improve the utility of scores. Studies have examined the performances of these methods on standardized dataset. == Application to humans ==

Application to humans

As the number of genome-wide association studies has exploded, along with rapid advances in methods for calculating polygenic scores, its most obvious application is in clinical settings for disease prediction or risk stratification. It is important not to over- or under-state the value of polygenic scores. A key advantage of quantifying polygenic contribution for each individual is that the genetic liability does not change over an individual's lifespan. However, while a disease may have strong genetic contributions, the risk arising from one's genetics has to be interpreted in the context of environmental factors. For example, even if an individual has a high genetic risk for alcoholism, that risk is lessened if that individual is never exposed to alcohol. several authors have noted that some causal variants for some conditions, but not others, are shared between Europeans and other groups across different continents for (e.g.) BMI and type 2 diabetes in African populations as well as schizophrenia in Chinese populations. Other researchers recognize that polygenic under-prediction in non-European population should galvanize new GWAS that prioritize greater genetic diversity in order to maximize the potential health benefits brought about by predictive polygenic scores. Significant scientific efforts are being made to this end. Embryo genetic screening is common with millions biopsied and tested each year worldwide. Genotyping methods have been developed so that the embryo genotype can be determined to high precision. Testing for aneuploidy and monogenetic diseases has increasingly become established over decades, whereas tests for polygenic diseases have begun to be employed more recently, having been first used in embryo selection in 2019. The use of polygenic scores for embryo selection has been criticised due to alleged ethical and safety issues as well as limited practical utility. However, trait-specific evaluations claiming the contrary have been put forth and ethical arguments for PGS-based embryo selection have also been made. The topic continues to be an active area of research not only within genomics but also within clinical applications and ethics. As of 2019, polygenic scores from well over a hundred phenotypes have been developed from genome-wide association statistics. These include scores that can be categorized as anthropometric, behavioural, cardiovascular, non-cancer illness, psychiatric/neurological, and response to treatment/medication. Examples of disease prediction performance When predicting disease risk, a PGS gives a continuous score that estimates the risk of having or getting the disease, within some pre-defined time span. A common metric for evaluating such continuous estimates of yes/no questions (see Binary classification) is the area under the ROC curve (AUC). Some example results of PGS performance, as measured in AUC (0 ≤ AUC ≤ 1 where a larger number implies better prediction), include: • In 2018, AUC ≈ 0.64 for coronary disease using ~120,000 British individuals. • In 2019, AUC ≈ 0.63 for breast cancer, developed from ~95,000 case subjects and ~75,000 controls of European ancestry. • In 2019, AUC ≈ 0.71 for hypothyroidism for ~24,000 case subjects and ~463,00 controls of European ancestry. Note that these results use purely genetic information as input; including additional information such as age and sex often greatly improves the predictions. The coronary disease predictor and the hypothyroidism predictor above achieve AUCs of ~ 0.80 and ~0.78, respectively, when also including age and sex. Since this study, polygenic risk scores have shown promise for disease prediction across other traits. Most use is therefore through consumer genetic testing, where a number of private companies report PRS for a number of diseases and traits. Consumers download their genotype (genetic variant) data and upload them into online PRS calculators, e.g. Scripps Health, Impute.me or Color Genomics. The most frequently reported motivation for individuals to seek out PRS reports is general curiosity (98.2%), and the reactions are generally mixed with common misinterpretations. It is speculated that personal use of PRS could contribute to treatment choices, but that more data is needed. Challenges and risks in clinical contexts At a fundamental level, the use of polygenic scores in clinical context will have similar technical issues as existing tools. For example, if a tool is not validated in a diverse population, then it may exacerbate disparities with unequal efficacy across populations. This is especially important in genetics where, as of 2018, a majority of the studies to date have been done in Europeans. Other challenges that can arise include how precisely the polygenic risk score can be calculated and how precise it needs to be for clinical utility. Since monogenic genetic testing is far more mature than polygenic scores, we can look there for approximating the clinical impact of polygenic scores. While some studies have found negative effects of returning monogenic genetic results to patients, the majority of studies have that negative consequences are minor. Benefits in humans Unlike many other clinical laboratory or imaging methods, an individual's germ-line genetic risk can be calculated at birth for a variety of diseases after sequencing their DNA once. Recognizing an increased genetic burden earlier can allow clinicians to intervene earlier and avoid delayed diagnoses. Polygenic score can be combined with traditional risk factors to increase clinical utility. For example, polygenic risk scores may help improve diagnosis of diseases. This is especially evident in distinguishing Type 1 from Type 2 Diabetes. Likewise, a polygenic risk score based approach may reduce invasive diagnostic procedures as demonstrated in Celiac disease. Polygenic scores may also empower individuals to alter their lifestyles to reduce risk for diseases. While there is some evidence for behavior modification as a result of knowing one's genetic predisposition, more work is required to evaluate risk-modifying behaviors across a variety of different disease states. Polygenic scores can identify a subset of the population at high risk that could benefit from screening. Several clinical studies are being done in breast cancer and heart disease is another area that could benefit from a polygenic score based screening program. == Applications in non-human species ==

Applications in non-human species

The benefit of polygenic scores is that they can be used to predict the future for crops, animal breeding, and humans alike. Although the same basic concepts underlie these areas of prediction, they face different challenges that require different methodologies. The ability to produce very large family size in nonhuman species, accompanied by deliberate selection, leads to a smaller effective population, higher degrees of linkage disequilibrium among individuals, and a higher average genetic relatedness among individuals within a population. For example, members of plant and animal breeds that humans have effectively created, such as modern maize or domestic cattle, are all technically "related". In human genomic prediction, by contrast, unrelated individuals in large populations are selected to estimate the effects of common SNPs. Because of smaller effective population in livestock, the mean coefficient of relationship between any two individuals is likely high, and common SNPs will tag causal variants at greater physical distance than for humans; this is the major reason for lower SNP-based heritability estimates for humans compared to livestock. In both cases, however, sample size is key for maximizing the accuracy of genomic prediction. While modern genomic prediction scoring in humans is generally referred to as a "polygenic score" (PGS) or a "polygenic risk score" (PRS), in livestock the more common term is "genomic estimated breeding value", or GEBV (similar to the more familiar "EBV", but with genotypic data). Conceptually, a GEBV is the same as a PGS: a linear function of genetic variants that are each weighted by the apparent effect of the variant. Despite this, polygenic prediction in livestock is useful for a fundamentally different reason than for humans. In humans, a PRS is used for the prediction of individual phenotype, while in livestock a GEBV is typically used to predict the offspring's average value of a phenotype of interest in terms of the genetic material it inherited from a parent. In this way, a GEBV can be understood as the average of the offspring of an individual or pair of individual animals. GEBVs are also typically communicated in the units of the trait of interest. For example, the expected increase in milk production of the offspring of a specific parent compared to the offspring from a reference population might be a typical way of using a GEBV in dairy cow breeding and selection. == Notes ==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com