Project Gutenberg Releases 5,000 Free Audiobooks Online Utilizing Synthetic Speech Technology

Open book repository Project Gutenberg has rapidly transformed thousands of titles into audiobooks using synthetic speech technology, now available for download or streaming across various platforms. While the selection is somewhat eclectic—reflecting the archive’s unique character—it represents a significant leap forward in making literature more accessible to all.

Traditionally, producing an audiobook requires substantial time and resources, including compensation for narrators, editing, and publishing. This often makes it financially unfeasible to create audiobooks for older and lesser-known titles, leaving many readers who prefer audio formats without options.

Project Gutenberg is committed to disseminating public domain literature in multiple formats, and addressing this gap has likely been on their agenda for years. However, it was not until their collaboration with MIT and Microsoft that they harnessed the power of AI-generated speech to breathe life into these books.

One challenge they faced was that the Project Gutenberg archive is not consistently formatted. Many files originate from various sources, and often contain errors due to imperfect optical character recognition processes. While volunteers have made commendable efforts in editing, even well-edited files can present difficulties for automated reading systems—often narrating page numbers, footnotes, and other non-essential elements.

“Each e-book on Project Gutenberg has its unique HTML format filled with elements you wouldn’t want to hear, like tables and indices. The most challenging aspect was extracting the relevant text for narration,” explained Mark Hamilton, co-lead of the project and a member of both Microsoft and MIT.

To tackle this, the team created a system that organized the archive by identifying similarly formatted book files and determining which clusters were best suited for automatic reading. This initial batch is indeed varied; for instance, it includes only one Dickens novel—the unfinished “Edwin Drood”—yet features a dozen volumes titled “Notes and Queries, Number 176, March 12, 1853: A Medium of Inter-communication for Literary Men, Artists, Antiquaries, Genealogists, etc.”

“We selected the books for the first batch based on the capabilities of our automated parser,” Hamilton added. “While we did our best, some notable titles didn’t make the cut. Now that we’ve released this first wave, we’re focused on refining the system to include more of the 60,000 books in future updates.”

Regarding the narration, the team utilized advanced machine learning and synthetic speech technologies that have become increasingly sophisticated in recent years. The arrival of automated audiobook production at scale was anticipated, and now it has become a reality.

The project’s approach to creating engaging audiobooks involves an automatic speaker and emotion inference system, which dynamically adjusts the reading voice and tone based on contextual cues. This technique enriches passages with lively character interactions and emotional dialogue. Initially, the text is segmented into narration and dialogue, and the speaker for each dialogue section is identified. The system then predicts the emotion of each character’s dialogue in a self-supervised manner. Lastly, distinct voices and emotional tones are assigned to the narrator and the characters using a multi-style and context-based neural text-to-speech model.

Listeners can enjoy the first 5,000 or so audiobooks for free on Spotify, Apple Podcasts, and the Internet Archive, with the project’s code being documented on GitHub.

Most people like

Find AI tools in YBX