There are many useful abilities that can be described as showing some form of intelligence. This gives better insight into the comparative success of artificial intelligence in different areas. AI, like electricity or the steam engine, is a
general-purpose technology. There is no consensus on how to characterize which tasks AI tends to excel at. Some versions of
Moravec's paradox observe that humans are more likely to outperform machines in areas such as physical dexterity that have been the direct target of natural selection. While projects such as
AlphaZero have succeeded in generating their own knowledge from scratch, many other machine learning projects require large training datasets. Researcher
Andrew Ng has suggested, as a "highly imperfect rule of thumb", that "almost anything a typical human can do with less than one second of mental thought, we can probably now or in the near future automate using AI." Games provide a high-profile benchmark for assessing rates of progress; many games have a large professional player base and a well-established competitive rating system.
AlphaGo brought the era of classical board-game benchmarks to a close when Artificial Intelligence proved their competitive edge over humans in 2016.
Deep Mind's AlphaGo AI software program defeated the world's best professional Go Player
Lee Sedol. Games of
imperfect knowledge provide new challenges to AI in the area of
game theory; the most prominent milestone in this area was brought to a close by
Libratus' poker victory in 2017.
E-sports continue to provide additional benchmarks;
Facebook AI,
Deepmind, and others have engaged with the popular
StarCraft franchise of videogames. Broad classes of outcome for an AI test may be given as: •
optimal: it is not possible to perform better (note: some of these entries were solved by humans) •
super-human: performs better than all humans •
high-human: performs better than most humans •
par-human: performs similarly to most humans •
sub-human: performs worse than most humans
Optimal •
Tic-tac-toe •
Connect Four: 1988 •
Checkers (aka 8x8 draughts): Weakly solved (2007) •
Rubik's Cube: Mostly solved (2010) •
Heads-up limit hold'em poker: Statistically optimal in the sense that "a human lifetime of play is not sufficient to establish with statistical significance that the strategy is not an exact solution" (2015)
Super-human •
Othello (aka reversi): c. 1997 2006 •
Backgammon: c. 1995–2002 •
Chess: Supercomputer (c. 1997); Personal computer (c. 2006); Mobile phone (c. 2009); Computer defeats human + computer (c. 2017) •
Jeopardy!:
Question answering, although the machine did not use
speech recognition (2011) •
Arimaa: 2015 •
Shogi: c. 2017 • Heads-up no-limit hold'em poker: 2017 • Six-player no-limit hold'em poker: 2019 •
Gran Turismo Sport: 2022
High-human •
Crosswords: c. 2012 •
Freeciv: 2016 •
Dota 2: 2018 •
Bridge card-playing: According to a 2009 review, "the best programs are attaining expert status as (bridge) card players", excluding bidding. •
StarCraft II: 2019 •
Mahjong: 2019 •
Stratego: 2022 • No-Press
Diplomacy: 2022 •
Hanabi: 2022 •
Natural language processing Par-human •
Optical character recognition for
ISO 1073-1:1976 and similar special characters. •
Classification of images •
Handwriting recognition •
Facial recognition • Visual question answering • SQuAD 2.0 English reading-comprehension benchmark (2019) • Some school science exams (2019) • Some tasks based on
Raven's Progressive Matrices Sub-human •
Optical character recognition for printed text (nearing par-human for Latin-script typewritten text) •
Object recognition • Various robotics tasks that may require advances in robot hardware as well as AI, including: • Stable bipedal locomotion: Bipedal robots can walk, but are less stable than human walkers (as of 2017) •
Humanoid soccer •
Speech recognition: "nearly equal to human performance" (2017) •
Explainability. Current medical systems can diagnose certain medical conditions well, but cannot explain to users why they made the diagnosis. • Many tests of fluid intelligence (2020) •
Bongard visual cognition problems, such as the Bongard-LOGO benchmark (2020) • Visual Commonsense Reasoning (VCR) benchmark (as of 2020) •
Stock market prediction: Financial data collection and processing using Machine Learning algorithms •
Angry Birds video game, as of 2020 • Various tasks that are difficult to solve without contextual knowledge, including: •
Translation •
Word-sense disambiguation == Proposed tests of artificial intelligence ==