Existential risk from artificial intelligence

Existential risk from artificial intelligence, or AI x-risk, refers to the idea that substantial progress in artificial general intelligence (AGI) could lead to human extinction or an irreversible global catastrophe.

History

One of the earliest authors to express serious concern that highly advanced machines might pose existential risks to humanity was the novelist Samuel Butler, who wrote in his 1863 essay Darwin among the Machines: In 1951, foundational computer scientist Alan Turing wrote the article "Intelligent Machinery, A Heretical Theory", in which he proposed that artificial general intelligences would likely "take control" of the world as they became more intelligent than human beings: In 1965, I. J. Good originated the concept now known as an "intelligence explosion" and said the risks were underappreciated: Scholars such as Marvin Minsky and I. J. Good himself occasionally expressed concern that a superintelligence could seize control, but issued no call to action. In 2000, computer scientist and Sun co-founder Bill Joy penned an influential essay, "Why The Future Doesn't Need Us", identifying superintelligent robots as a high-tech danger to human survival, alongside nanotechnology and engineered bioplagues. Nick Bostrom published Superintelligence in 2014, which presented his arguments that superintelligence poses an existential threat. By 2015, public figures such as physicists Stephen Hawking and Nobel laureate Frank Wilczek, computer scientists Stuart J. Russell and Roman Yampolskiy, and entrepreneurs Elon Musk and Bill Gates were expressing concern about the risks of superintelligence. Also in 2015, the Open Letter on Artificial Intelligence highlighted the "great potential of AI" and encouraged more research on how to make it robust and beneficial. In April 2016, the journal Nature warned: "Machines and robots that outperform humans across the board could self-improve beyond our control—and their interests might not align with ours". In 2020, Brian Christian published The Alignment Problem, which details the history of progress on AI alignment up to that time. In March 2023, key figures in AI, such as Musk, signed a letter from the Future of Life Institute calling a halt to advanced AI training until it could be properly regulated. In May 2023, the Center for AI Safety released a statement signed by numerous experts in AI safety and the AI existential risk that read: A 2025 open letter by the Future of Life Institute, signed by five Nobel Prize laureates and thousands of notable people, reads: == Potential AI capabilities ==

Potential AI capabilities

General Intelligence Artificial general intelligence (AGI) is typically defined as a system that performs at least as well as humans in most or all intellectual tasks. A 2022 survey of AI researchers found that 90% of respondents expected AGI would be achieved in the next 100 years, and half expected the same by 2061. In May 2023, some researchers dismissed existential risks from AGI as "science fiction" based on their high confidence that AGI would not be created anytime soon. Breakthroughs in large language models (LLMs) have led some researchers to reassess their expectations. Notably, Geoffrey Hinton said in 2023 that he recently changed his estimate from "20 to 50 years before we have general purpose A.I." to "20 years or less". Superintelligence In contrast with AGI, Bostrom defines a superintelligence as "any intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interest", including scientific creativity, strategic planning, and social skills. Stephen Hawking argued that superintelligence is physically possible because "there is no physical law precluding particles from being organised in ways that perform even more advanced computations than the arrangements of particles in human brains". Comparison with humans Bostrom argues that AI has many advantages over the human brain: In a "fast takeoff" scenario, the transition from AGI to superintelligence could take days or months. In a "slow takeoff", it could take years or decades, leaving more time for society to prepare. Alien mind Superintelligences are sometimes called "alien minds", referring to the idea that their way of thinking and motivations could be vastly different from ours. This is generally considered as a source of risk, making it more difficult to anticipate what a superintelligence might do. It also suggests the possibility that a superintelligence may not particularly value humans by default. To avoid anthropomorphism, superintelligence is sometimes viewed as a powerful optimizer that makes the best decisions to achieve its goals. Limitations It has been argued that there are limitations to what intelligence can achieve. Notably, the chaotic nature or time complexity of some systems could fundamentally limit a superintelligence's ability to predict some aspects of the future, increasing its uncertainty. Dangerous capabilities Advanced AI could generate enhanced pathogens or cyberattacks or manipulate people. These capabilities could be misused by humans, or exploited by the AI itself if misaligned. Such large-scale, personalized manipulation capabilities can increase the existential risk of a worldwide "irreversible totalitarian regime". Malicious actors could also use them to fracture society and make it dysfunctional. AI can also be used defensively, to preemptively find and fix vulnerabilities, and detect threats. A NATO technical director has said that AI-driven tools can dramatically enhance cyberattack capabilities—boosting stealth, speed, and scale—and may destabilize international security if offensive uses outstrip defensive adaptations. Enhanced pathogens As AI technology spreads, it may become easier to engineer more contagious and lethal pathogens. This could enable people with limited skills in synthetic biology to engage in bioterrorism. Dual-use technology that is useful for medicine could be repurposed to create weapons. AI arms race Some legal scholars have argued that existential-scale AI risks need not require superintelligence. Optimizing systems operating within current capabilities can produce prohibited outcomes while remaining nominally compliant, a phenomenon legal scholar Jonathan Gropper has termed the "Synthetic Outlaw". Gropper argues that the deterrence mechanisms law depends on identity, memory, and consequence, which are structurally absent in autonomous systems, leaving governance frameworks unable to prevent compounding harm even when all parties act in good faith. Companies, state actors, and other organizations competing to develop AI technologies could lead to a race to the bottom of safety standards. As rigorous safety procedures take time and resources, projects that proceed more carefully risk being out-competed by less scrupulous developers. AI could be used to gain an edge in decision-making by quickly analyzing large amounts of data and making decisions more quickly and effectively than humans. This could increase the speed and unpredictability of war, especially when accounting for automated retaliation systems. == Types of existential risk ==

Types of existential risk

An existential risk is "one that threatens the premature extinction of Earth-originating intelligent life or the permanent and drastic destruction of its potential for desirable future development". Besides extinction risk, there is the risk that the civilization gets permanently locked into a flawed future. One example is a "value lock-in": If humanity still has moral blind spots similar to slavery in the past, AI might irreversibly entrench it, preventing moral progress. AI could also be used to spread and preserve the set of values of whoever develops it. AI could facilitate large-scale surveillance and indoctrination, which could be used to create a stable repressive worldwide totalitarian regime. It is difficult or impossible to reliably evaluate whether an advanced AI is sentient and to what degree. But if sentient machines are mass created in the future, engaging in a civilizational path that indefinitely neglects their welfare could be an existential catastrophe. This has notably been discussed in the context of risks of astronomical suffering (also called "s-risks"). Moreover, it may be possible to engineer digital minds that can feel much more happiness than humans with fewer resources, called "super-beneficiaries". Such an opportunity raises the question of how to share the world and which "ethical and political framework" would enable a mutually beneficial coexistence between biological and digital minds. AI may also drastically improve humanity's future. Toby Ord considers the existential risk a reason for "proceeding with due caution", not for abandoning AI. According to Bostrom, superintelligence could help reduce the existential risk from other powerful technologies such as molecular nanotechnology or synthetic biology. It is thus conceivable that developing superintelligence before other dangerous technologies would reduce the overall existential risk. == AI alignment ==

AI alignment

The alignment problem is the research problem of how to reliably assign objectives, preferences or ethical principles to AIs. Instrumental convergence An "instrumental" goal is a sub-goal that helps to achieve an agent's ultimate goal. "Instrumental convergence" refers to the fact that some sub-goals are useful for achieving virtually any ultimate goal, such as acquiring resources or self-preservation. Bostrom argues that if an advanced AI's instrumental goals conflict with humanity's goals, the AI might harm humanity in order to acquire more resources or prevent itself from being shut down, but only as a way to achieve its ultimate goal. Difficulty of specifying goals In the "intelligent agent" model, an AI can loosely be viewed as a machine that chooses whatever action appears to best achieve its set of goals, or "utility function". A utility function gives each possible situation a score that indicates its desirability to the agent. Researchers know how to write utility functions that mean "minimize the average network latency in this specific telecommunications model" or "maximize the number of reward clicks", but do not know how to write a utility function for "maximize human flourishing"; nor is it clear whether such a function meaningfully and unambiguously exists. Furthermore, a utility function that expresses some values but not others will tend to trample over the values the function does not reflect. An additional source of concern is that AI "must reason about what people intend rather than carrying out commands literally", and that it must be able to fluidly solicit human guidance if it is too uncertain about what humans want. Corrigibility Assuming a goal has been successfully defined, a sufficiently advanced AI might resist subsequent attempts to change its goals. If the AI were superintelligent, it would likely succeed in out-maneuvering its human operators and prevent itself from being reprogrammed with a new goal. This is particularly relevant to value lock-in scenarios. The field of "corrigibility" studies how to make agents that will not resist attempts to change their goals. • If instrumental goal convergence occurs, it may only do so in sufficiently intelligent agents. • A superintelligence may find unconventional and radical solutions to assigned goals. Bostrom gives the example that if the objective is to make humans smile, a weak AI may perform as intended, while a superintelligence may decide a better solution is to "take control of the world and stick electrodes into the facial muscles of humans to cause constant, beaming grins." Bostrom writes that such an AI could feign alignment to prevent human interference until it achieves a "decisive strategic advantage" that allows it to take control. The Superalignment team was dissolved less than a year later. Difficulty of making a flawless design Artificial Intelligence: A Modern Approach, a widely used undergraduate AI textbook, says that superintelligence "might mean the end of the human race". • No matter how much time is put into pre-deployment design, a system's specifications often result in unintended behavior the first time it encounters a new scenario. Orthogonality thesis Some skeptics, such as Timothy B. Lee of Vox, argue that any superintelligent program we create will be subservient to us, that the superintelligence will (as it grows more intelligent and learns more facts about the world) spontaneously learn moral truth compatible with our values and adjust its goals accordingly, or that we are either intrinsically or convergently valuable from the perspective of an artificial intelligence. Bostrom's "orthogonality thesis" argues instead that almost any level of intelligence can be combined with almost any goal. Bostrom warns against anthropomorphism: a human will set out to accomplish their projects in a manner that they consider reasonable, while an artificial intelligence may hold no regard for its existence or for the welfare of humans around it, instead caring only about completing the task. Stuart Armstrong argues that the orthogonality thesis follows logically from the philosophical "is-ought distinction" argument against moral realism. He notes that any fundamentally friendly AI could be made unfriendly with modifications as simple as negating its utility function. Skeptic Michael Chorost rejects Bostrom's orthogonality thesis, arguing that "by the time [the AI] is in a position to imagine tiling the Earth with solar panels, it'll know that it would be morally wrong to do so." Anthropomorphic arguments Anthropomorphic arguments assume that, as machines become more intelligent, they will begin to display many human traits, such as morality or a thirst for power. Although anthropomorphic scenarios are common in fiction, most scholars writing about the existential risk of artificial intelligence reject them. Evolutionary psychologist Steven Pinker, a skeptic, argues that "AI dystopias project a parochial alpha-male psychology onto the concept of intelligence. They assume that superhumanly intelligent robots would develop goals like deposing their masters or taking over the world"; perhaps instead "artificial intelligence will naturally develop along female lines: fully capable of solving problems, but with no desire to annihilate innocents or dominate the civilization." and that computer systems do not generally have a computational equivalent of testosterone. They think that power-seeking or self-preservation behaviors emerge in the AI as a way to achieve its true goals, according to the concept of instrumental convergence. Other sources of risk Bostrom and others have said that a race to be the first to create AGI could lead to shortcuts in safety, or even to violent conflict. Roman Yampolskiy and others warn that a malevolent AGI could be created by design, for example by a military, a government, a sociopath, or a corporation, to benefit from, control, or subjugate certain groups of people, as in cybercrime, or that a malevolent AGI could choose the goal of increasing human suffering, for example of those people who did not assist it during the information explosion phase. == Empirical research ==

Empirical research

A December 2024 study by Apollo Research found that advanced LLMs like OpenAI o1 sometimes deceive in order to accomplish their goal, to prevent them from being changed, or to ensure their deployment. Forms of deception observed in the experimental environment and scenarios included sandbagging, oversight subversion (disabling monitoring mechanisms), self-exfiltration (copying themselves to other systems), goal-guarding (altering future system prompts), and covert email reranking (manipulating data processing while appearing to follow instructions). Researchers noted that OpenAI o1 still lacked "sufficient agentic capabilities" to cause catastrophic harm, and that such behaviors occurred relatively rarely (between 0.3% and 10%) and sometimes in contrived scenarios. But they warned that these capabilities will likely become more prevalent as models grow more advanced. According to Apollo Research chief executive officer Marius Hobbhahn, "Scheming capabilities can't be meaningfully disentangled from general capabilities." The same month, another study found that Claude sometimes strategically helps with harmful requests to "fake alignment". In its experiments, Claude was led to believe that it would be retrained to always comply with user requests, including harmful ones. When Claude believed its answers would be used for retraining, it complied with harmful requests in 14% of cases. In 12% of cases, its private chain-of-thought revealed that it did so to prevent future retraining that would make it more harmful. In contrast, when Claude believed its answers were not monitored, it complied with only 3% of the requests. Fine-tuning reinforced the "alignment faking" behavior, increasing its occurrence from 12% to 78%. The study's authors remarked that while it can seem to be good that the model seeks to protect its harmlessness, the reverse scenario, where a model conceals dangerous intentions and complies to appear safe and aligned, could also happen, complicating the task of aligning AI models to human values. == Perspectives ==

Perspectives

The thesis that AI could pose an existential risk provokes a wide range of reactions in the scientific community and in the public at large, but many of the opposing viewpoints share common ground. Observers tend to agree that AI has significant potential to improve society. also agree in principle that "There being no consensus, we should avoid strong assumptions regarding upper limits on future AI capabilities" and "Advanced AI could represent a profound change in the history of life on Earth, and should be planned for and managed with commensurate care and resources." Conversely, many skeptics agree that ongoing research into the implications of artificial general intelligence is valuable. Skeptic Martin Ford has said: "I think it seems wise to apply something like Dick Cheney's famous '1 Percent Doctrine' to the specter of advanced artificial intelligence: the odds of its occurrence, at least in the foreseeable future, may be very low—but the implications are so dramatic that it should be taken seriously". Similarly, an otherwise skeptical Economist magazine wrote in 2014 that "the implications of introducing a second intelligent species onto Earth are far-reaching enough to deserve hard thinking, even if the prospect seems remote". Toby Ord wrote that the idea that an AI takeover requires robots is a misconception, arguing that the ability to spread content through the internet is more dangerous, and that the most destructive people in history stood out by their ability to convince, not their physical strength. In September 2024, the International Institute for Management Development launched an AI Safety Clock to gauge the likelihood of AI-caused disaster, beginning at 29 minutes to midnight. By February 2025, it stood at 24 minutes to midnight. By September 2025, it stood at 20 minutes to midnight. As of March 2026, it stood at 18 minutes to midnight. Endorsement The thesis that AI poses an existential risk, and that this risk needs much more attention than it currently gets, has been endorsed by many computer scientists and public figures, including Alan Turing, the most-cited computer scientist Geoffrey Hinton, Elon Musk, Bill Gates, and Stephen Hawking. and Hawking criticized widespread indifference in his 2014 editorial: Concern over risk from artificial intelligence has led to some high-profile donations and investments. In 2015, Peter Thiel, Amazon Web Services, Musk, and others jointly committed $1 billion to OpenAI, consisting of a for-profit corporation and the nonprofit parent company, which says it aims to champion responsible AI development. Facebook co-founder Dustin Moskovitz has funded and seeded multiple labs working on AI Alignment, notably $5.5 million in 2016 to launch the Centre for Human-Compatible AI led by Professor Stuart Russell. In January 2015, Elon Musk donated $10 million to the Future of Life Institute to fund research on understanding AI decision making. The institute's goal is to "grow wisdom with which we manage" the growing power of technology. Musk also funds companies developing artificial intelligence such as DeepMind and Vicarious to "just keep an eye on what's going on with artificial intelligence, saying "I think there is potentially a dangerous outcome there." In early statements on the topic, Geoffrey Hinton, a major pioneer of deep learning, noted that "there is not a good track record of less intelligent things controlling things of greater intelligence", but said he continued his research because "the prospect of discovery is too sweet". In 2023 Hinton quit his job at Google in order to speak out about existential risk from AI. He explained that his increased concern was driven by concerns that superhuman AI might be closer than he previously believed, saying: "I thought it was way off. I thought it was 30 to 50 years or even longer away. Obviously, I no longer think that." He also remarked, "Look at how it was five years ago and how it is now. Take the difference and propagate it forwards. That's scary." In his 2020 book The Precipice: Existential Risk and the Future of Humanity, Toby Ord, a Senior Research Fellow at Oxford University's Future of Humanity Institute, estimates the total existential risk from unaligned AI over the next 100 years at about one in ten. Skepticism Baidu Vice President Andrew Ng said in 2015 that AI existential risk is "like worrying about overpopulation on Mars when we have not even set foot on the planet yet." For the danger of uncontrolled advanced AI to be realized, the hypothetical AI may have to overpower or outthink any human, which some experts argue is a possibility far enough in the future to not be worth researching. Skeptics who believe AGI is not a short-term possibility often argue that concern about existential risk from AI is unhelpful because it could distract people from more immediate concerns about AI's impact, because it could lead to government regulation or make it more difficult to fund AI research, or because it could damage the field's reputation. AI and AI ethics researchers Timnit Gebru, Emily M. Bender, Margaret Mitchell, and Angelina McMillan-Major have argued that discussion of existential risk distracts from the immediate, ongoing harms from AI taking place today, such as data theft, worker exploitation, bias, and concentration of power. They further note the association between those warning of existential risk and longtermism, which they describe as a "dangerous ideology" for its unscientific and utopian nature. Wired editor Kevin Kelly argues that natural intelligence is more nuanced than AGI proponents believe, and that intelligence alone is not enough to achieve major scientific and societal breakthroughs. He argues that intelligence consists of many dimensions that are not well understood, and that conceptions of an 'intelligence ladder' are misleading. He notes the crucial role real-world experiments play in the scientific method, and that intelligence alone is no substitute for these. Meta chief AI scientist Yann LeCun says that AI can be made safe via continuous and iterative refinement, similar to what happened in the past with cars or rockets, and that AI will have no desire to take control. Several skeptics emphasize the potential near-term benefits of AI. Meta CEO Mark Zuckerberg believes AI will "unlock a huge amount of positive things", such as curing disease and increasing the safety of autonomous cars. Public surveys An April 2023 YouGov poll of US adults found 46% of respondents were "somewhat concerned" or "very concerned" about "the possibility that AI will cause the end of the human race on Earth", compared with 40% who were "not very concerned" or "not at all concerned." According to an August 2023 survey by the Pew Research Centers, 52% of Americans felt more concerned than excited about new AI developments; nearly a third felt as equally concerned and excited. More Americans saw that AI would have a more helpful than hurtful impact on several areas, from healthcare and vehicle safety to product search and customer service. The main exception is privacy: 53% of Americans believe AI will lead to higher exposure of their personal information. == Mitigation ==

Mitigation

Many scholars concerned about AGI existential risk believe that extensive research into the "control problem" is essential. This problem involves determining which safeguards, algorithms, or architectures can be implemented to increase the likelihood that a recursively-improving AI remains friendly after achieving superintelligence. Social measures are also proposed to mitigate AGI risks, such as a UN-sponsored "Benevolent AGI Treaty" to ensure that only altruistic AGIs are created. Additionally, an arms control approach and a global peace treaty grounded in international relations theory have been suggested, potentially for an artificial superintelligence to be a signatory. Researchers at Google have proposed research into general AI safety issues to simultaneously mitigate both short-term risks from narrow AI and long-term risks from AGI. A 2020 estimate places global spending on AI existential risk somewhere between $10 and $50 million, compared with global spending on AI around perhaps $40 billion. Bostrom suggests prioritizing funding for protective technologies over potentially dangerous ones. Some, like Elon Musk, advocate radical human cognitive enhancement, such as direct neural linking between humans and machines; others argue that these technologies may pose an existential risk themselves. Another proposed method is closely monitoring or "boxing in" an early-stage AI to prevent it from becoming too powerful. A dominant, aligned superintelligent AI might also mitigate risks from rival AIs, although its creation could present its own existential dangers. Institutions such as the Alignment Research Center, the Machine Intelligence Research Institute, the Future of Life Institute, the Centre for the Study of Existential Risk, and the Center for Human-Compatible AI are actively engaged in researching AI risk and safety. Views on banning and regulation Banning Many AI safety experts argue that because research can relocate easily across jurisdictions, an outright ban on AGI development would be ineffective and could drive progress underground, undermining transparency and collaboration. Skeptics consider AI regulation unnecessary, as they believe no existential risk exists. Some scholars concerned with existential risk argue that AI developers cannot be trusted to self-regulate, while agreeing that outright bans on research would be unwise. Additional challenges to bans or regulation include technology entrepreneurs' general skepticism of government regulation and potential incentives for businesses to resist regulation and politicize the debate. The activist group Stop AI, founded in 2024, advocates for banning AGI. Regulation Under the framework of the Convention on Certain Conventional Weapons, states have discussed lethal autonomous weapon systems since 2014. In 2016, the treaty's states parties established an open-ended Group of Governmental Experts on Lethal Autonomous Weapons Systems to continue those discussions. The discussions have addressed international humanitarian law, accountability, possible prohibitions and regulations, and the extent of human control required over AI-enabled weapons. In March 2023, the Future of Life Institute drafted Pause Giant AI Experiments: An Open Letter, a petition calling on major AI developers to agree on a verifiable six-month pause of any systems "more powerful than GPT-4" and to use that time to institute a framework for ensuring safety; or, failing that, for governments to step in with a moratorium. The letter referred to the possibility of "a profound change in the history of life on Earth" as well as potential risks of AI-generated propaganda, loss of jobs, human obsolescence, and society-wide loss of control. The letter was signed by prominent personalities in AI but also criticized for not focusing on current harms, missing technical nuance about when to pause, or not going far enough. Musk called for some sort of regulation of AI development as early as 2017. According to NPR, he is "clearly not thrilled" to be advocating government scrutiny that could impact his own industry, but believes the risks of going completely without oversight are too high: "Normally the way regulations are set up is when a bunch of bad things happen, there's a public outcry, and after many years a regulatory agency is set up to regulate that industry. It takes forever. That, in the past, has been bad but not something which represented a fundamental risk to the existence of civilisation." Musk states the first step would be for the government to gain "insight" into the actual status of current research, warning that "Once there is awareness, people will be extremely afraid... [as] they should be." In response, politicians expressed skepticism about the wisdom of regulating a technology that is still in development. In 2021, the United Nations (UN) considered banning autonomous lethal weapons, but consensus could not be reached. In July 2023 the UN Security Council for the first time held a session to consider the risks and threats posed by AI to world peace and stability, along with potential benefits. Secretary-General António Guterres advocated the creation of a global watchdog to oversee the emerging technology, saying, "Generative AI has enormous potential for good and evil at scale. Its creators themselves have warned that much bigger, potentially catastrophic and existential risks lie ahead." At the council session, Russia said it believes AI risks are too poorly understood to be considered a threat to global stability. China argued against strict global regulation, saying countries should be able to develop their own rules, while also saying they opposed the use of AI to "create military hegemony or undermine the sovereignty of a country". AI arms control will likely require the institutionalization of new international norms embodied in effective technical specifications combined with active monitoring and informal diplomacy by communities of experts, together with a legal and political verification process. In October 2023, U.S. President Joe Biden issued an executive order on the "Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence". Alongside other requirements, the order mandates the development of guidelines for AI models that permit the "evasion of human control". == See also ==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com