Discover the Benefits of GPT-4o Advanced Voice Mode: Introducing Hume’s EVI 2 with Emotionally Inflected Voice AI and API Solutions

When we last covered Hume, the innovative AI startup co-founded by former Google DeepMind scientist Alan Cowen, it was spring 2024, and the company had just secured $50 million in a Series B funding round to advance its unique voice AI technology.

Hume, named after the 18th-century Scottish philosopher David Hume, leverages cross-cultural voice recordings matched with self-reported emotional surveys to create an AI model that produces lifelike vocal expressions and understands nuances in various languages and dialects.

Recently, Hume released its enhanced Empathic Voice Interface 2 (EVI 2), featuring improvements designed to boost naturalness, emotional responsiveness, and customization while reducing costs for developers and businesses. EVI 2 offers a 40% reduction in latency and is 30% cheaper than its predecessor through the API.

Cowen emphasized the goal of enabling developers to integrate this technology into their applications, allowing for a trusted and personalized user experience. The new design allows voice assistants powered by EVI 2 to operate directly within apps, enhancing user interactions without needing a separate AI assistant.

The timing of EVI 2's launch positions Hume advantageously in a crowded AI market, illustrating its capability ahead of competitors like Anthropic and OpenAI. While OpenAI’s ChatGPT Advanced Voice Mode, based on the GPT-4o model, is still in limited release, Cowen asserts that EVI 2 excels in emotion detection and response.

EVI 2 is designed for faster, more fluent conversations, boasting sub-second response times and supporting a diverse range of voice customizations. Key advancements include:

- Faster Response Times: EVI 2 reduces latency by 40%, yielding response times between 500 and 800 milliseconds for a more natural conversation flow.

- Emotional Intelligence: By integrating voice and language, EVI 2 can comprehend emotional context, ensuring appropriate and empathetic interactions.

- Customizable Voices: A new voice modulation method allows developers to adjust parameters like pitch and gender, offering versatile voice options without the risks of voice cloning.

- In-Conversation Prompts: Users can dynamically modify the AI's speaking style, fostering more engaging interactions.

- Multilingual Capabilities: EVI 2 currently supports English, with plans to add Spanish, French, and German by the end of 2024. Remarkably, the model has autonomously learned several languages from its data exposure.

Hume AI has also adjusted its pricing for EVI 2 to $0.072 per minute, a 30% decrease from the legacy model's cost. Enterprise users can take advantage of volume discounts, enhancing scalability for high-demand businesses.

EVI 2 is currently available in beta and can be integrated via Hume’s API, with developers able to use the same configuration options as EVI 1 until it is phased out in December 2024.

Overall, EVI 2 embodies Hume AI's commitment to refining user experience through AI, focusing on emotional alignment and responsiveness. Future updates will include expanded language support and seamless integration with other large language models and tools, ensuring developers have a robust resource for their applications.

In addition to EVI 2, Hume AI continues to offer the Expression Measurement API and Custom Models API, enhancing the capabilities for developers engaged in emotionally responsive AI applications.

Most people like

Find AI tools in YBX