Can You Hear Me Now? Harnessing AI-Coustics to Combat Noisy Audio Using Generative AI

Noisy interview and speech recordings are a significant challenge for audio engineers. A German startup, AI-coustics, aims to revolutionize audio clarity in video using innovative generative AI technology. The company has emerged from stealth mode with €1.9 million in funding. Co-founder and CEO Fabian Seipel explains that AI-coustics' approach transcends standard noise reduction, ensuring compatibility with any device and speaker.

"Our primary mission is to elevate every digital interaction—whether it's a conference call, a consumer device, or a casual social media video—to the clarity of a professional studio broadcast," Seipel shared in an interview.

Seipel, a trained audio engineer, co-founded AI-coustics in 2021 with Corvin Jaedicke, a lecturer in machine learning at the Technical University of Berlin. Their partnership sprang from shared experiences of subpar audio quality during online courses and tutorials.

“We are personally motivated to address the widespread issue of poor audio quality in digital communication,” Seipel stated. He revealed that his own experiences with hearing difficulties from early music production spurred the duo’s commitment to enhancing speech quality and intelligibility.

The demand for AI-driven noise reduction and voice enhancement software is already considerable. AI-coustics faces competition from companies like Insoundz, which utilizes generative AI to improve streamed and recorded speech, and Veed.io, an editing suite offering background noise removal tools.

However, Seipel asserts that AI-coustics stands out due to its distinctive approach to developing noise-reduction AI mechanisms. The startup trains its model using a collection of speech samples recorded in Berlin. Participants are compensated for their contributions, although Seipel did not disclose the payment details.

"We have created a unique method to simulate audio challenges—like noise, reverberation, and distortion—during our training process," Seipel explained.

While some may raise concerns about the startup's compensation strategy for its contributors, particularly as the AI model may be quite profitable, the more pressing issue is bias. Research indicates that speech recognition technologies can perpetuate biases, disproportionately failing with diverse speakers. A study published in The Proceedings of the National Academy of Sciences revealed that leading speech recognition systems were twice as likely to misinterpret audio from Black speakers compared to their white counterparts.

To combat this, Seipel emphasizes the importance of recruiting a diverse range of speech samples: “Incorporating a wide variety of voices is essential to eliminate bias and ensure our technology works across all languages, speaker identities, ages, accents, and genders.”

To evaluate AI-coustics' performance, I uploaded three distinct video clips—including an interview with an 18th-century farmer—to the platform. The results were impressive; AI-coustics successfully enhanced the audio clarity, significantly reducing background noise.

For example, here’s the original 18th-century farmer clip:

And here is the enhanced version:

Seipel envisions AI-coustics' technology being utilized for both real-time and recorded speech enhancement, with potential applications in devices like soundbars, smartphones, and headphones for automatic clarity improvement. Currently, the company offers a web app, API for post-processing, and an SDK to integrate AI-coustics into existing workflows and hardware.

At present, AI-coustics has five enterprise clients and around 20,000 users, even though not all are paying customers. The company plans to expand its four-person team and further improve its speech enhancement model in the coming months.

“Before securing our initial investment, we operated leanly to navigate the challenges of the VC market,” Seipel shared. “AI-coustics now benefits from a strong network of investors and mentors in Germany and the UK. Our robust technology foundation allows us to adapt across different markets effectively.”

When questioned about the potential job displacement due to advancements like AI-coustics, Seipel highlighted the technology's capability to streamline tedious tasks traditionally handled by human audio engineers.

“Content creators can save both time and money using AI-coustics to automate parts of the audio production process without sacrificing speech quality,” he noted. “Poor speech quality remains a common issue across various consumer and professional devices, as well as in content creation. Our technology has the potential to enhance any application involving speech recording, processing, or transmission.”

The recent funding round, comprising both equity and debt, was led by Connect Ventures, Inovia Capital, FOV Ventures, and Ableton CFO Jan Bohl.

Most people like

Find AI tools in YBX