AI may soon eliminate the need for wake words in voice assistants. Researchers at Carnegie Mellon University have developed a machine learning model that can determine the direction a voice is coming from, effectively capturing user intent without requiring a specific trigger phrase or gesture.
This innovative approach leverages the natural properties of sound as it interacts with the environment. The system identifies the first, loudest, and clearest sound as the primary source, while other sounds are often quieter, delayed, and muffled. Additionally, the model recognizes that human speech frequencies differ depending on the direction one is facing, with lower frequencies being more omnidirectional.
This “lightweight” software does not require audio data to be sent to the cloud, enhancing privacy and efficiency. While the technology may not be immediately available, the research team has released code and data to facilitate further development by others.
Imagine interacting with a smart speaker that plays music simply by voice command, without the interruption of wake words or activating other devices. This advancement could significantly improve user privacy by requiring physical presence rather than relying on gaze-detecting technology, bringing us closer to the futuristic vision of responsive voice assistants reminiscent of Star Trek.