Knowledge-Based Communication Prosthesis: Enabling Speech for Every Human
Over a half billion people worldwide have physical or cognitive impairments that prevent them from natural human speech. For example, many people with neurological conditions such as cerebral palsy or ALS can see and hear fine and are fully social — but cannot generate speech. They depend on caregivers to painstakingly interpret grunts and or gestures in response to prompted guessing. For someone with this condition to generate an original sentence often requires tediously cycling through the alphabet or symbol board, slowly assembling words into sentences.
Computers have enormous potential to augment human communication with speech, as eyeglasses do for vision and hearing aids for hearing. Computers have been able to generate comprehensible speech since the 1970’s. The problem is the user interface. Someone without the ability to articulate the muscles required for speech cannot typically operate a keyboard, mouse, or touch pad. They typically can operate a single switch, with limited speed and accuracy. But they need a better interface than cycling through all the letters on a keyboard.
In 1981, Tom looked at developing software for this problem, which he called communication prosthesis (today this is called Augmentative and Alternative Communication or AAC). At the time there were systems for scanning keyboards and some offered a small, fixed vocabulary of canned phrases. Tom realized that this problem could benefit from AI, specifically language models that could predict what you are trying to say before you finish typing it out. Today we all experience these AI models in the predictive keyboards on our cellphones, which didn’t exist at the time (according to patents the algorithms for predictive typing on mobile devices weren’t invented until the mid 1990s, and statistical language models weren’t used for another decade).
So Tom built a way for a language model to drive completion of these precious one-button inputs, to accelerate the generation of speech for people with cerebral palsy. It would offer words and sentences to complete the input, which could be personalized for each user. It used simple, personalized grammar and a knowledge base of entities and topics to dynamically suggest menu choices. This made it possible for someone operating a one-button input device to produce useful speech in a reasonable time. The system came with a personalization interface that allowed family and caregivers to extend and customize the language model — Tom’s first knowledge acquisition tool. The program was deployed on Apple IIe computers, which could almost fit on a wheelchair, and generated speech using the robotic synthesizers of the time. The AI and UI aspects of the project drove research that was published as Tom’s masters thesis, which called the system “an intelligent communication assistant”.
Today, there are many AAC apps that do word and sentence completion, and a few that use language models. The approach that Tom called “semantic autocomplete” — completing on contextual meaning rather than statistical likelihood — was used again in the early Siri app and is finding its way into interfaces for modern semantic search.
The most exciting developments in AAC combine personalized contextual predictive autocomplete with new high-tech input modalities. Tom is currently on the board of a company called Cognixion, which is the leader in high-tech AAC. They developed an iOS app that uses eye gaze detection based on the facial recognition cameras to accelerate the selection of items. The app allows severely disabled individuals to speak in natural speech, or control a virtual assistant that can operate devices in the user’s home. Even more mind blowing is their recently announced Augmented Reality headset for AAC. The system combines active projection of items over the user’s visual field with EEG based Brain-Computer Interface that detects intent, allowing users to literally speak with their brain. The dream is becoming real: Technology can free us from some of the most disabling conditions, allowing all of us to talk with our friends and family, control our physical environments, and participate in the digital world.