Language assistants have rapidly gained in popularity in recent years. In Germany alone, every third person now uses assistants such as Amazon Alexa or the Google Assistant. But if you look at what they are used for, the current status is rather sobering. Start the radio, ask for the weather, ask simple knowledge questions or set a timer: Basic functions remain the primary use cases. But why is that?
Initial euphoria and high expectations of what language assistants can do are often followed by disillusionment. The core of the problem is to manage expectations sensibly within this area of conflict. Orientation in a voice skill is similar to searching for the light switch in a dark room. Although this presupposes that one would know that there is a lamp at all. Similar to a dark room, there is only the dimension of time. You cannot see what lies ahead of you and where you have come from. So orientation is much more difficult.
In contrast to a GUI (graphical user interface), in language interfaces, we can only give the user hints by explicit suggestions about what he can do, or react afterwards if something doesn't work. When things go badly, it ends like looking for a needle in a haystack.
But before it is even a question of orientation within a skill, the first crucial question is: How do users learn about the skill? In the case of well-known brands, it is more likely that users will search for it specifically. The long tail strategy has a harder time of it. Alexa regularly presents new functions in a newsletter, and there are Amazon and Google stores for the skills and actions. But you cannot or should not rely on these marketing channels alone.
It's a bit like in the beginning of the internet when search engines were not what they are today. The first possibilities are there: the user can execute voice commands (so-called "intents") even without mentioning the name of the skill (e.g. through Implicit Invocations for Google Actions or the CanFulFill-Intent for Alexa). However, it is up to Google and Amazon to decide which skill or action may use the intent. The user is not informed about alternative skills at all, in the figurative sense, the search results page is missing here.
Once the user has then started the skill or action, he should be welcomed and informed about the general scope of the skill in the first use. It makes sense to directly arrange an initial request for action. The same applies to a personal meeting, where during the first introduction, you would ask the person a question in the spirit of pleasantries or small talk.
And what happens next? Then we can distinguish two scenarios:
The first scenario: There is a clear use case for the skill.
Take the booking skill of a well-known hotel, for example: What could the user's expectations of the application be based on? Probably on the experience they have had with the hotel's booking services in other digital channels. And he will probably want to try out the basic functions he knows from there in a voice context. This makes it all the more important that the voice assistant also reacts to intents with functions not yet implemented. So not just: "I didn't understand that", but: "You can't add extras to existing hotel bookings yet, but we're working on it. Please visit our website, I'll send you a link to the Alexa App."
The second scenario: It is a less known brand and/or a less-known use case. For example, skills are involved here that depict functions that are still new to users themselves. For example, when a user gets their first smart home devices connected to a speaker that they would like to use to control them in the future. In this case, it is especially important to inform the user in a structured way about the capabilities of the skills. In the case of more complex skills, this can be done, for example, in an accompanying app in which the setup takes place.
Or it can be done the first time the skill is requested via voice, when the skill can introduce itself and its functions. This is the moment when you set the stage for the user to not only try out the skill but also to be aware of new capabilities for future use. And even in later-stage use it is important —through contextual help and constructive reprompts ("I'm sorry, this function does not exist yet. But try ...") —to give the user the best possible support. This is especially important for first-time users who are otherwise "intimidated" or frustrated. It would be fatal if users got the impression that the fault lies with them and not with the limited scope of VUI.
Making new developments and improvements visible:
A good skill is continuously being developed. But how do users learn about new functions? One approach, for example, is to give suggestions for suitable (carefully selected) new functions as “related functions”. This requires a solid instinct so that users are not disturbed in their efficiency and the suggestions or hints are understood as useful tips and not as advertising.
Of course, existing communication channels such as newsletters or - if available - notifications in apps can also be used. But as with all advertising, the same applies here: The tips should be used with care and users should not be overloaded with generic content. The more appropriate the recommendation is, e.g. through targeting or proper context, the higher the probability that the goal of retaining users will be achieved.
The concepts and ideas are there. Alexa, for example, introduced the concept of "Alexa Conversations" in preview mode.³ This allows switching between different skills within a conversation all while preserving the contextual knowledge. There are also more and more devices with integrated cameras. With visual data included, we would have much more knowledge about the current situation of the user.
In reality, however, this intelligence is not yet fully tangible. Digital assistants have yet to secure so comfortable a position in the everyday lives of users that their benefits outweigh data protection concerns. So it remains exciting to see when the breakthrough towards ubiquitous, truly intelligent assistants will come.