Thinking for Voice: Context is Queen

When we think about designing and building for voice, we often say that we need to pay attention to the context at each stage of the conversation. There are many aspects to what "context" means, however, and it isn't always clear if something is (or needs to be) contextual or not or how contextual something should be.

Context for the comments

I ran into this sort of issue while working on an Action this week. Users can work on different kinds of files, and asking "what can I do" depends on which type of file they're using. Making this help contextual was obvious and straightforward - just provide commands that make sense to this file.

But is that all of the context that we need to be aware of? Possibly not. Some of the users may not have linked their account, so only have limited access to some files. Other users may not be permitted to write to the file, just to read from it. Some features may be premium features.

If we give help, do we tell users about everything that is possible? Or just what they have access to?

In a visual world, we could display a large generic help page, covering all possible options available at that time, indicating which ones are restricted or require additional permissions. We might even use this to upsell the premium version while comparing the features available between the various versions.

We don't have that kind of luxury with voice, however.

With voice, we need to deliver our message succinctly, providing what the user needs at that moment,  but no more than they want, while also making them aware of what else they can ask for if they wish. It is a delicate balance - too much information can be overwhelming, and they don't really get all of it. Too little, and they will get frustrated that they keep having to ask for more information.

While discussing my dilemma with others, they helped me realize that the message I had been using was giving people the impression that they could do more than they could at that moment. While I had been working with concepts borrowed from "greyed out" menu items that I was helping them understand what they could do in the Action.

The difference is important, and making sure we understand what the user expects when they ask the question is important as well.

(One question raised is how often users actually do something like this. It is a good question, and one I'll have to delve into another time.)

Crafting Context

So how did I fix my problem? Using the multivocal library, responses are just templates and can be keyed to an Intent, so it was easy to use a different template in each scenario. Multivocal's response configuration can also specify under what criteria the response in valid by evaluating environment settings using handlebars.

Since I was already setting an isDemo environment setting in the existing User setting, I leveraged this. I made one response valid if the isDemo environment setting was true, and the other if isDemo was false. There are also references to configurations for suggestion chips and a card that links to the website - both of these are used in other responses, so they're just included here.

It looks something like this

  Local: {
    und: {
      Response: {
        "Intent.filetype.help": [
          {Base: {Ref: "Config/Local/und/linkCard"}},
          {Base: {Ref: "Config/Local/und/suggestions"}},

            Criteria: "{{not User.isDemo}}",
            Template: {
              Text: enHelpText

            Criteria: "{{User.isDemo}}",
            Template: {
              Text: enHelpDemoText

The enHelpText and enHelpDemoText are just JavaScript constants that contain the text for each scenario.

const enHelpText = `
  You currently have a file open. 
  You can say things like:
   "Tell me about this file",
   "Add a record",
   "Set a value", or
   "Get a value". 
  Visit example.com for more examples and help.

const enHelpDemoText = `
  You currently have a sample file open, 
  so your commands are more limited.
  You can say things like
   "Tell me about this file", or
   "Get a value".
  Visit example.com for details and further help.

In Conclusion

When thinking for voice, we have to keep context in mind for everything we say to people talking to our assistants. Not just the context of what their question was, but how skilled they are, what permission they have, and dozens of other contextual factors that we may have. Above all, we need to keep this context in mind to make sure our replies guide them towards the information they want, understanding what state they're in, what else they can do, and how they may be able to learn more if necessary.

This post was based on a tweet I sent out, pondering the issue. My thanks to Cathy Pearl, Rebecca E, Jeremy Wilken, and Siddharth Shukla, for their input and discussion on the issue. Do you have thoughts on this? I'd love to hear them on the tweet above!

No comments:

Post a Comment