Intelligent UI: The Virtual Assistant Paradigm
Together with Adam Cheyer and Dag Kittlaus, Tom founded the company that created Siri, which established the virtual assistant interaction paradigm for the mainstream. Apple bought Siri soon after its release in 2010. About a year later, Siri was released as an integral part of the iOS platform, first on the iPhone 4S and then on all of Apple’s main products. Today it is central to the user experience on iPhones, iPads, Apple watches, Macs, Apple TV, CarPlay, AirPods and the HomePod speaker. Competitive responses from Amazon, Microsoft, and Google largely followed the same approach on their products. Together virtual assistants are used by billions of people around the world, and the interaction paradigm is taken for granted.
Tom was both the Chief Technology Officer and VP of Design — by design. His ambition was to create a new user interaction paradigm, which would require both state-of-the-art AI technology and new ideas about how to bring intelligence to the interface. The virtual assistant paradigm, like the mouse-and-GUI paradigm and its transformation to mobile with multitouch, was about making software and services easy to use. In fact, the goal was universal access, allowing anyone to use the interface to all services. At the time when Siri was created, there were amazing new ways to use the Web on the desktop, but here was a usability barrier to the bounty of the Web, especially on mobile.
Siri showed a new way to get things done, an interface paradigm that lets people simply say what they want to accomplish and let the machine take care of making it happen. In previous interaction paradigms, the user faced multiple problems. First, knowing which tools to use to meet their needs; next, figuring out how to operate those tools by tapping on graphical interfaces and typing into boxes; and finally, manually integrating results across applications. With a virtual assistant, the user need only state what they want to do, in their own native tongue. It’s then the machine’s job to figure out what they mean, which services to call, and how to present the answers in a conversational context. Siri was conceived as the digital analogue to having your own (human) personal assistant on the other end of a call, who is doing searches, accessing multiple web sites, and if possible, completing the transactions for you.
The Siri app required several innovations, many of which we take for granted today:
Speech recognition that was good enough to recognize verbal requests made of an assistant in a large but bounded set of domains. The Siri team did not develop this technology, but knew how to train and deploy it to meet the requirements for a conversational UI.
Natural language processing (NLP) that could interpret vague, ambiguous, context-dependent spoken requests as commands that a computer can act on. NLP does more than parse commands, it also recognizes the names of all the entities that you might mention in a request, including millions of locations, people, businesses, artists, song names — anything that might have a WikiPedia page. Siri also showed how to include personalized vocabulary from your mobile devices, such as the people you communicate with and the places you’ve been.
Service orchestration that could take the requests and delegate them to a set of services that provide the necessary parts to the answer. For example, if you asked to book a good Italian restaurant near work, your new assistant would call a handful of different services such as Yelp, Maps, and OpenTable and combine their results into a completed transaction.
Conversational UI that integrated verbal answers with a graphical user interface that allowed users to browse sets of answers and drill down for details. This allowed Siri to offer the benefits of the previous paradigm (multitouch GUIs in apps) within the new conversational context, which had never been done before.
Semantic Autocomplete — an interactive command building interface that showed users what they could ask and how to ask it, with examples from their personal data and history. While the autocomplete integral to the modern search UI is based on matching popular queries (what has been asked), semantic autocomplete is driven by a predictive language model based on what could be said in the supported domains.
Cloud-based AI with device-resident UI that allows you to summon Siri in real time, start speaking immediately, securely go to the cloud for AI and services, and get an answer back in a single conversational turn.
Since the development of Siri, some of these technologies have gotten really really good. Speech recognition is now on par with humans in reasonable audio environments. NLP can recognize most entities and answer most questions that can be found with general search on the web. Cloud-based AI is now a commodity service and mobile devices are getting powerful enough to run everything on board.
Although the virtual assistant implementations on the major platforms could still use some improvement, we are now seeing startups embarking on bold new adventures into the science fiction versions of AI software that can interact with us in ways that are nearly indistinguishable from natural human communications. Some of these virtual assistants are mature enough to be utilized in mental and physical health care, as well as human learning.
Intelligence at the Interface: The Virtual Assistant Paradigm for Human Computer Interaction. Presented at the BayCHI, the San Francisco Bay Area chapter of the ACM Special Interest Group on Computer-Human Interaction. December 10, 2019. This talk explores the virtual assistant metaphor as an interaction paradigm, like mouse-and-menu or multi-touch. What does it enable? Who does it empower? What are the drivers of usability and utility? What makes it work and where can it falter? How might the assistant metaphor serve us in a world of services powered by artificial intelligence?
Navigating the Startup Ecosystem. Presented at YASED International Investment Summit. December 4, 2020. This talk describes the factors that contributed to the success of Siri the startup, and how those same factors can be applied to startups and investments today.
Siri: A Virtual Personal Assistant. Keynote presented at Semantic Technologies conference, June 16, 2009. This keynote introduced Siri and how it works to a technical audience, laying out the key problems to be solved and the technologies involved. It included an early demo of the working system, which had not yet been released.
Intelligence at the Interface: Semantic Technology and the Consumer Internet Experience. Presented at Semantic Technologies conference, May 20, 2008. This talk set the stage for applying AI to the user interface, grounding it in the context of semantic computing and the work on collective intelligence. The Siri startup was still in stealth mode, only a few months old. It is interesting to see the revelation of the paradigm shift to come without showing the product being worked on. Also includes historical tidbits of Siri’s predecessor at SRI by co-founder Adam Cheyer.