Experimental decades: 1910s–1980s Radio Rex was the first voice-activated toy, patented in 1916 and released in 1922. It was a wooden toy in the shape of a dog that would come out of its house when its name is called. In 1952,
Bell Labs presented "Audrey", the Automatic Digit Recognition machine. It occupied a six-foot-high relay rack, consumed substantial power, had streams of cables and exhibited the myriad maintenance problems associated with complex vacuum-tube circuitry. It could recognize the fundamental units of speech, phonemes. It was limited to the accurate recognition of digits spoken by designated talkers. It could therefore be used for voice dialing, but in most cases, push-button dialing was cheaper and faster, rather than speaking the consecutive digits. Another early tool which was enabled to perform digital speech recognition was the
IBM Shoebox voice-activated calculator, presented to the general public during the
1962 Seattle World's Fair after its initial market launch in 1961. This early computer, developed almost 20 years before the introduction of the first
IBM Personal Computer in 1981, was able to recognize 16 spoken words and the digits 0 to 9. The first
natural language processing computer program or the chatbot
ELIZA was developed by MIT professor
Joseph Weizenbaum in the 1960s. It was created to "demonstrate that the communication between man and machine was superficial". ELIZA used pattern matching and substitution methodology into scripted responses to simulate conversation, which gave an illusion of understanding on the part of the program. Weizenbaum's own secretary reportedly asked Weizenbaum to leave the room so that she and ELIZA could have a real conversation. Weizenbaum was surprised by this, later writing: "I had not realized ... that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people. This gave name to the
ELIZA effect, the tendency to unconsciously assume computer behaviors are analogous to human behaviors; that is, anthropomorphisation, a phenomenon present in human interactions with virtual assistants. The next milestone in the development of voice recognition technology was achieved in the 1970s at the
Carnegie Mellon University in
Pittsburgh, Pennsylvania with substantial support of the
United States Department of Defense and its
DARPA agency, funded five years of a Speech Understanding Research program, aiming to reach a minimum vocabulary of 1,000 words. Companies and academia including IBM, Carnegie Mellon University (CMU) and Stanford Research Institute took part in the program. The result was "Harpy", it mastered about 1000 words, the vocabulary of a three-year-old and it could understand sentences. It could process speech that followed pre-programmed vocabulary, pronunciation, and grammar structures to determine which sequences of words made sense together, and thus reducing speech recognition errors. In 1986, Tangora was an upgrade of the Shoebox, it was a voice recognizing typewriter. Named after the world's fastest typist at the time, it had a vocabulary of 20,000 words and used prediction to decide the most likely result based on what was said in the past. IBM's approach was based on a
hidden Markov model, which adds statistics to digital signal processing techniques. The method makes it possible to predict the most likely
phonemes to follow a given phoneme. Still each speaker had to individually train the typewriter to recognize their voice, and pause between each word. In 1983, Gus Searcy invented the "Butler In A Box", an electronic voice home controller system.
Birth of smart virtual assistants: 1990s–2010s In the 1990s, digital speech recognition technology became a feature of the personal computer with
IBM,
Philips and
Lernout & Hauspie fighting for customers. Much later the market launch of the first
smartphone IBM Simon in 1994 laid the foundation for smart virtual assistants as we know them today. In 1997, Dragon's
NaturallySpeaking software could recognize and transcribe natural human speech without pauses between each word into a document at a rate of 100 words per minute. A version of Naturally Speaking is still available for download and it is still used today, for instance, by many doctors in the US and the UK to document their medical records. In 2001
Colloquis publicly launched
SmarterChild, on platforms like
AIM and
MSN Messenger. While entirely text-based SmarterChild was able to play games, check the weather, look up facts, and converse with users to an extent. The first modern digital virtual assistant installed on a smartphone was
Siri, which was introduced as a feature of the
iPhone 4S on 4 October 2011.
Apple Inc. developed Siri following the 2010 acquisition of
Siri Inc., a
spin-off of
SRI International, which is a research institute financed by
DARPA and the
United States Department of Defense. Its aim was to aid in tasks such as sending a text message, making phone calls, checking the weather or setting up an alarm. Over time, it has developed to provide restaurant recommendations, search the internet, and provide driving directions. In November 2014, Amazon announced Alexa alongside the Echo. In 2016,
Salesforce debuted Einstein, developed from a set of technologies underlying the Salesforce platform. Einstein was replaced by Agentforce, an
agentic AI, in September 2024. In April 2017 Amazon released a service for building
conversational interfaces for any type of virtual assistant or interface.
Large Language Models: 2020s-present In the 2020s, artificial intelligence (AI) systems like
ChatGPT have gained popularity for their ability to generate human-like responses to text-based conversations. In February 2020, Microsoft introduced its Turing Natural Language Generation (T-NLG), which was then the "largest language model ever published at 17 billion parameters." On November 30, 2022, ChatGPT was launched as a prototype and quickly garnered attention for its detailed responses and articulate answers across many domains of knowledge. The advent of ChatGPT and its introduction to the wider public increased interest and competition in the space. In February 2023, Google began introducing an experimental service called "Bard" which is based on its
LaMDA program to generate text responses to questions asked based on information gathered from the
web. While ChatGPT and other generalized chatbots based on the latest
generative AI are capable of performing various tasks associated with virtual assistants, there are also more specialized forms of such technology that are designed to target more specific situations or needs. ==Method of interaction==