Empower AI Agents with Voice: Discover Deepgram's Aura Technology

Deepgram has emerged as a leading startup in the voice recognition landscape. Today, the company proudly announced the launch of Aura, its innovative real-time text-to-speech API. Aura merges highly realistic voice models with a low-latency API, enabling developers to create engaging, conversational AI agents. Enhanced by large language models (LLMs), these agents can effectively act as customer service representatives in call centers and various customer-facing roles.

According to Deepgram co-founder and CEO Scott Stephenson, while high-quality voice models have been available, they often come with hefty price tags and long processing times. In contrast, existing low-latency models can sound robotic. Aura stands out by delivering human-like voice models with rapid response times—typically under half a second—at an affordable rate.

“Today, there's a clear demand for real-time voice AI bots that can listen, understand, and verbally respond,” he explained. He emphasized that a successful product like Aura requires a balance of accuracy (a must-have feature), low latency, and cost-effectiveness, especially in the context of the typically high costs associated with LLMs.

Deepgram asserts that Aura's pricing is extremely competitive at just $0.015 per 1,000 characters, making it more affordable than most rivals. Google’s WaveNet voices are priced at $0.016 per 1,000 characters, as are Amazon’s Polly Neural voices. However, Deepgram’s offering stands out, particularly when considering Amazon's higher pricing tiers.

“To achieve success, we needed to establish a solid price point across various segments alongside exceptional speed and accuracy,” Stephenson detailed. “This was our focus from the outset, which is why we spent four years developing the underlying infrastructure before launching any products.”

Currently, Aura offers approximately a dozen voice models, all meticulously trained using a unique dataset created in collaboration with voice actors. Each Aura model, like all of Deepgram’s offerings, has been developed in-house.

You can experience Aura for yourself here. I have tested it extensively and, while there are occasional unusual pronunciations, the speed of response is particularly impressive, complementing Deepgram’s robust speech-to-text capabilities. The company highlights that Aura typically begins speaking in less than 0.3 seconds and the LLM usually completes its response in just under a second.

With Aura, Deepgram is well-positioned to expand its enterprise voice recognition business effectively.

Most people like

Find AI tools in YBX