How to Get Siri and Alexa to Understand What You're Saying
If talking out loud to Alexa (or your digital assistant of choice) feels unnatural, you’re not alone. I’ve had Siri for as long as she’s been alive, yet I can count on one hand the number of times I’ve spoken to her. It’s always seemed easier to open an app on my iPhone or type in a Google query and get exactly what I’m looking for, rather than navigating a line of verbal questioning that eventually leads to my desired answer—or not.
It’s also hard to get past the weirdness of speaking to an inanimate object. Though at at least one in six US adults uses a voice assistant to order our takeout, set our morning alarms, control our thermostats, and read us the weather report, many of us still struggle with the idea of treating our voice assistants like real, live beings capable of human-like conversation, especially because they exist only as little black boxes (figuratively and, sometimes, literally). In fact, nearly 60 percent of Americans say they change the way they speak when talking to a robot.
That in mind, we spoke with experts in speech recognition, language processing, and machine learning to determine exactly how we should be interacting with our voice assistants—Alexa, Google Home, and Siri—to get the information we want and avoid the dreaded, “Sorry, I can’t help with that.” With practice, my own queries have become more like one side of a conversation instead of a shouting match. Here’s what you need to know:
Allow a little time for your request to be processed
Machines first take the sounds of our speech and translate them into words, similar to dictation. But they can’t actually do anything with those words unless they can make sense of the transcript. Humans can write down sounds we hear in different languages with a reasonable level of accuracy, but that doesn’t mean we can understand their meaning.
That’s what makes voice assistants appear “smart”—they process and understand natural human speech and respond accordingly. But this is actually a pretty simple task, says Candy Sidner, professor of computer science at Worcester Polytechnic Institute.
“[Voice assistants] are essentially programmed to do certain kinds of things, so they are breaking down utterances presented to them and then doing a search on the web,” she says.
Sidner adds that there’s always a gap between the end of a question and a machine’s response to account for processing time, especially when it has to understand speech—typing a query straight into Google doesn’t require that extra step. Make your questions as specific as possible to get the best single result, and give the assistant an opportunity to retrieve and relay an answer before following up or assuming Siri mis-heard you.
Speak to Alexa like she’s your friend
Voice assistants are trained using human speech patterns. This means that talking at a higher volume or a slower pace, over-enunciating your words, or over-simplifying your questions will actually make your queries less successful, not more. Pretend Alexa, Siri, and Google Assistant are people sitting next to you, not voices on inanimate devices, and they’ll be more likely to process your queries correctly.
“When the system doesn’t understand, people tend to speak in robot-speak and get louder and more crisp, which is funny because the data is built on real, natural human speech,” says Cathy Pearl, head of conversation design outreach at Google. “The data model is really most accessible when you speak the most naturally, not yelling or enunciating too much.”
Don’t try to cover up your accent
Experts say that voice assistants are surprisingly responsive to users’ accents—if they’ve been trained using human speech particular to a language or region.
“The reason that speech recognition works as well as it does today is because we have years and years of anonymized, actual spoken utterances—things people have said,” Pearl says. “We have to think about different ways that people talk about and interact with the world when we localize to different countries.”
It might be more difficult for a voice assistant to parse speech from a non-native or non-American English speaker than to differentiate between users from New York and Alabama, but all offer several English accent alternatives. If your device has settings for your accent—British English, for example—you can switch to that mode for more accurate processing. In general, even without a special setting, you’ll get the best results if you speak naturally.
Alexa, Siri, and Google Assistant can understand different languages, too, if you configure them that way. Supported languages are fairly limited depending on the assistant. For example, Alexa speaks English —with five accent options—as well as Japanese and German. Google Assistant has several configurations available on enabled phones and tablets, and Google expects to have more than 30 languages enabled by the end of this year. Siri supports 20 languages, with a number of additional dialects in several of those languages.
Be willing to reword or repeat yourself
It’s easy to get annoyed when your voice assistant doesn’t understand your question the first time you ask, but humans aren’t always great at this, either.
“One thing a system could do if it doesn’t understand is to say ‘I didn’t get that. Can you say it a different way?’” says Alexander Rudnicky, professor emeritus at Carnegie Mellon University’s Language Technologies Institute. “If you’re a human, then that’s a reasonable way to try to work with a system. Just say it in another way.”
Where a human is likely to respond with “huh” or “what” or a blank stare, your assistant will at least acknowledge your request and say “sorry” when it needs more information, doesn’t understand you, can’t retrieve an answer, or hasn’t been trained on a certain phrases or types of questions.
While voice assistants don’t require users to stick to a script, they may misinterpret a request or take incorrect action because of how a user phrases his or her question. For example, if you say to Google Assistant, “Play the new Jason Derulo song Colors,” she may recognize the artist first rather than the song and respond with, “Alright, here’s Jason Derulo on Spotify,” which isn’t exactly what you were asking for. If you reword the request to “Play Colors by Jason Derulo,” the response is, “Colors by Jason Derulo, sure. Playing on Spotify.”
Voice assistants generally respond best to simple, direct, and specific requests, so if you find that your device isn’t doing what you ask, try rephrasing your query.
Don’t expect complex or nuanced responses
Experts agree that although voice assistants are pretty good at responding to simple questions and getting to know basic user preferences, they lack the ability to understand context the way humans can. For example, Pearl explains, if you asked your best friend, “Who was at the party last night?” she or he would give you a different answer than a person you don’t know that well.
When a voice assistant can’t grasp context, it generally won’t be able to respond appropriately. If you ask Google Assistant, “Is Paddington 2 on Netflix yet?” she will say, “My apologies…I don’t understand.” In this case, the word “on” has multiple interpretations, Pearl says. If the user instead requests a specific action—“Can I stream Paddington 2 on Netflix?”—the context is clear and the assistant responds with “I looked for Paddington 2 on Netflix, but it either isn’t available or can’t be played right now.”
Although voice assistants can control our smart home devices, play music, give a weather report, and request an Uber, they have a lot left to learn about human conversation.
“In some ways these assistants are really smart,” Pearl says. “They know a lot of facts. But in some ways they’re very dumb. They don’t have a lot of common sense about how the world works.”