In-car systems Voice commands may be used to initiate phone calls, select radio stations, or play music. Voice recognition capabilities vary across car make and model. Some models offer natural-language speech recognition, allowing the driver to use full sentences and common phrases in a conversational style. With such systems, fixed commands are not required.
Education Automatic pronunciation assessment is the use of speech recognition to verify the correctness of speech, as distinguished from assessment by a person. Also called speech verification, pronunciation evaluation, and pronunciation scoring, the main application of this technology is computer-aided pronunciation teaching (CAPT) when combined with
computer-aided instruction for
computer-assisted language learning (CALL), speech
remediation, or
accent reduction. Pronunciation assessment does not determine unknown speech (as in
dictation or
automatic transcription) but instead, compares speech to a reference model for the words spoken, sometimes with inconsequential
prosody such as
intonation,
pitch,
tempo,
rhythm, and
stress. Pronunciation assessment is also used in
reading tutoring, for example in products such as
Microsoft Teams and Amira Learning. Pronunciation assessment can also be used to help diagnose and treat
speech disorders such as
apraxia. Assessing intelligibility is essential for avoiding inaccuracies from accent bias, especially in high-stakes assessments, from words with multiple correct pronunciations, and from phoneme coding errors in digital pronunciation dictionaries. In 2022, researchers found that some newer speech to text systems, based on
end-to-end reinforcement learning to map audio signals directly into words, produce word and phrase confidence scores closely correlated with listener intelligibility. In the
Common European Framework of Reference for Languages (CEFR) assessment criteria for "overall phonological control", intelligibility outweighs formally correct pronunciation at all levels.
Health care Medical documentation In the health care sector, speech recognition can be implemented in front-end or back-end medical documentation processes. In front-end speech recognition, the provider dictates into a speech-recognition engine, words are displayed as they are recognized, and the speaker is responsible for editing and signing off on the document. In back-end or deferred speech recognition the provider speaks into a
digital dictation system, the voice is routed through a speech-recognition machine, and a draft document is routed along with the voice file to an editor, who edits/finalizes the draft and final report. A major issue is that the
American Recovery and Reinvestment Act of 2009 (
ARRA) provides substantial financial benefits to physicians who utilize an
Electronic Health Record (EHR) that complies with "Meaningful Use" standards. These standards require that substantial data be maintained by the EHR. The use of speech recognition is more naturally suited to the generation of narrative text, as part of a radiology/pathology interpretation, progress note or discharge summary; the ergonomic gains of using speech recognition to enter structured discrete data (e.g., numeric values or codes from a list or a
controlled vocabulary) are relatively minimal for people who are sighted and who can operate a keyboard and mouse. A more significant issue is that most EHRs have not been expressly tailored to take advantage of voice-recognition capabilities. A large part of a clinician's interaction with EHR involves navigation through the user interface that is heavily dependent on keyboard and mouse; voice-based navigation provides only modest ergonomic benefits. By contrast, many highly customized systems for radiology or pathology dictation implement voice "macros", where the use of certain phrases – e.g., "normal report", will automatically fill in a large number of default values and/or generate boilerplate, which vary with the type of exam – e.g., a chest X-ray vs. a gastrointestinal contrast series for a radiology system.
Therapeutic use Prolonged use of speech recognition software in conjunction with
word processors has shown benefits to short-term-memory restrengthening in
brain AVM patients who have been treated with
resection. Further research needs to be conducted to determine cognitive benefits for individuals whose AVMs have been treated using radiologic techniques.
Military Aircraft Substantial efforts have been devoted to the test and evaluation of speech recognition in
fighter aircraft. Of particular note have been the US programme in speech recognition for the
Advanced Fighter Technology Integration (AFTI)/
F-16 aircraft (
F-16 VISTA), the programme in France for
Mirage aircraft, and UK programmes dealing with a variety of aircraft platforms. In these programmes, speech recognizers have been operated successfully, with applications including setting radio frequencies, commanding an autopilot system, setting steer-point coordinates and weapons release parameters, and controlling flight display. Working with Swedish pilots flying the
JAS-39 Gripen, Englund (2004) reported that recognition deteriorated with increasing
g-loads. The study concluded that adaptation greatly improved the results in all cases and that the introduction of models for breathing was shown to improve recognition scores significantly. Contrary to what might have been expected, no effects of the broken English of the speakers were found. Spontaneous speech caused problems for the recognizer, as might have been expected. A restricted vocabulary, and above all, a proper syntax, could thus be expected to improve recognition accuracy substantially. The
Eurofighter Typhoon employs a speaker-dependent system, requiring each pilot to create a template. The system is not used for safety-critical or weapon-critical tasks, such as weapon release or lowering of the undercarriage, but is used for many cockpit functions. Voice commands are confirmed by visual and/or aural feedback. The system is seen as a major benefit in the reduction of pilot
workload, and allows the pilot to assign targets with two voice commands or to a wingman with only five commands. Speaker-independent systems are under test for the
F-35 Lightning II (JSF) and the
Alenia Aermacchi M-346 Master lead-in fighter trainer. These systems have produced word accuracy scores in excess of 98%.
Helicopters The problems of achieving high recognition accuracy under stress and noise are particularly relevant in the
helicopter environment as well as in the fighter environment. The acoustic noise problem is actually more severe in the helicopter environment, because of the high noise levels, and because helicopter pilots, in general, do not wear a
facemask, which would reduce acoustic noise in the
microphone. Substantial test and evaluation programmes, notably by the
U.S. Army Avionics Research and Development Activity (AVRADA) and by the Royal Aerospace Establishment (
RAE) in the UK. Work in France included speech recognition in the
Puma helicopter. Voice applications include control of communication radios, navigation systems, and an automated target handover system. The overriding issue for voice is the impact on pilot effectiveness. Encouraging results are reported for the AVRADA tests, although these represent only a feasibility demonstration in a test environment. Much remains to be done both in speech recognition and in overall
speech technology in order to consistently achieve performance improvements in operational settings.
Air traffic control Training for
air traffic controllers (ATC) represents an excellent application for speech recognition systems. Many ATC training systems currently require a trainer to act as a "pseudo-pilot", engaging in a voice dialog with the trainee controller, which simulates the dialog that the controller would have with real pilots. Speech recognition and
synthesis techniques offer the potential to eliminate the need for a person to act as a pseudo-pilot, thus reducing training and support personnel. In theory, air controller tasks are characterized by highly structured speech as the primary output, reducing the difficulty of the speech recognition task. In practice, this is rarely the case. FAA document 7110.65 details the phrases that should be used by air traffic controllers. While this document gives less than 150 examples of such phrases, the number of phrases supported by one of the simulation vendors speech recognition systems is in excess of 500,000. The USAF, USMC, US Army, US Navy, and FAA as well as international ATC training organizations such as the Royal Australian Air Force and Civil Aviation Authorities in Italy, Brazil, and Canada use ATC simulators with speech recognition.
People with disabilities Speech recognition programs can provide many benefit to those with disabilities. For individuals who are
deaf or hard of hearing, speech recognition software can be used to generate
captions of conversations. Additionally, individuals who are blind (see
blindness and education) or have poor vision can benefit from listening to textual content, as well as garner more functionality from a computer by issuing commands with their voice. The use of voice recognition software, in conjunction with a digital audio recorder and a personal computer running word-processing software, has proven useful for restoring damaged short-term memory capacity in individuals who have suffered a stroke or have undergone a
craniotomy. Speech recognition has proven very useful for those who have difficulty using their hands due to causes ranging from mild repetitive stress injuries to disabilities that preclude the use of conventional computer input devices. Individuals with physical disabilities can use voice commands and transcription to navigate electronics hands-free. Speech recognition is used in deaf
telephony, such as voicemail to text,
relay services, and
captioned telephone. Individuals with learning disabilities who struggle with thought-to-paper communication may benefit from the software, but the product's fallibility remains a significant consideration for many. In addition, speech to text technology is only an effective aid for those with intellectual disabilities if the proper training and resources are provided (e.g. in the classroom setting). This type of technology can help those with dyslexia, but the potential benefits regarding other disabilities are still in question. Mistakes made by the software hinder its effectiveness, since misheard words take more time to fix.
Other domains ASR is now commonplace in the field of
telephony. In telephony systems, ASR is predominantly used in contact centers by integrating it with
IVR systems. It is becoming more widespread in
computer gaming and simulation. Despite the high level of integration with word processing in general personal computing, in the field of document production, ASR has not seen the expected increases in use. The improvement of mobile processor speeds has made speech recognition practical in
smartphones. Speech is used mostly as a part of a user interface, for creating predefined or custom speech commands. •
Aerospace, e.g., NASA's
Mars Polar Lander used speech recognition technology from
Sensory, Inc. in the Mars microphone on the lander • Automatic
subtitling with speech recognition • Automatic
emotion recognition • Automatic
shot listing in audiovisual production •
Automatic translation •
E-discovery •
Hands-free computing •
Home automation •
Interactive voice response •
Mobile telephony, including mobile email •
Multimodal interaction •
Robotics • Security, including usage with other biometric scanners for
multi-factor authentication • Speech to text •
Telematics, e.g., vehicle navigation systems •
Transcription •
Video games like ''
Tom Clancy's EndWar and Lifeline'' •
Virtual assistant such as
Siri ==Performance==