OpenAI Unveils DALL-E 3 API and Innovative Text-to-Speech Models

Home AI News OpenAI Unveils DALL-E 3 API and Innovative Text-to-Speech Models

Updated on October 23 2024

OpenAI made waves during its inaugural Developer Day by unveiling a variety of new APIs designed to enhance its offerings. At the forefront is the DALL-E 3 API, which follows its successful integration into ChatGPT and Bing Chat. Building on the capabilities of DALL-E 2, the DALL-E 3 API includes built-in moderation features to prevent misuse and ensure responsible use.

The DALL-E 3 API provides users with multiple format and quality options, delivering resolutions from 1024×1024 to 1792×1024, with costs starting at just $0.04 per generated image. However, it currently offers fewer features than its predecessor, DALL-E 2. Notably, the DALL-E 3 API does not support the creation of edited versions of existing images or the generation of variations from them. Additionally, when a prompt is submitted, OpenAI automatically rewrites it to enhance detail and safety, which may result in less precise output depending on the specifics of the request.

In another exciting development, OpenAI introduced its Audio API, a text-to-speech solution featuring six diverse voices—Alloy, Echo, Fable, Onyx, Nova, and Shimer—and two generative AI model variants. This feature is live now, starting at $0.015 per 1,000 characters of input. “This is much more natural than anything else out there, which can make applications easier to interact with and more accessible,” said OpenAI CEO Sam Altman during the announcement. “It also opens up numerous possibilities for language learning and voice assistance.”

However, unlike some other speech synthesis tools, OpenAI’s Audio API does not allow for control over the emotional tone of generated audio. The documentation notes that certain text characteristics, such as grammar and capitalization, may influence how the voices sound, though results from internal tests have been mixed. It’s also mandated that developers who utilize this API inform users that the audio is generated by artificial intelligence.

Lastly, OpenAI has released an updated version of its automatic speech recognition model, Whisper large-v3. This open-source model claims to provide enhanced performance across multiple languages and is accessible on GitHub under a permissive license.

AI App Store: OpenAI's GPT Store Empowers You to Create and Profit from Your Own Custom GPT

OpenAI Commits to Protecting Business Clients from Copyright Claims

Most people like

BypassAI

Transform AI-generated content into engaging human-like text with the leading ChatGPT bypasser. Experience the difference in clarity and nuance that enhances your communication!

AI humanizer AI Detector

Paraphrase Tool

Introducing Paraphrase Tool: your go-to resource for free online paraphrasing, grammar checking, and plagiarism removal. Available in over 100 languages, we help you enhance your writing clarity and originality effortlessly.

AI Plagiarism Checker

Mera Monitor

Enhance your team's efficiency with a powerful workforce analytics tool designed for tracking productivity and performance.

Employee monitoring AI Analytics Assistant

Joyland

Dive into captivating, character-driven dialogues on Joyland, where every interaction invites deeper exploration and engagement.

AI AI Voice Chat Generator

Find AI tools in YBX

Discover Black Forest Labs: The Startup Behind Elon Musk’s Revolutionary AI Image Generator

On Tuesday night, Elon Musk’s AI platform, Grok, unveiled a new image-generation feature that, like its AI chatbot counterpart, operates with minimal safeguards. Users can create and share outrageous images—such as Donald Trump smoking marijuana on the Joe Rogan show—directly on the X platform. However, it's important to note that Grok's image generation is powered by Black Forest Labs, a recently launched startup.

October 20 2024

Bing’s New Deep Search Feature Delivers Comprehensive Answers for Complex Queries

Microsoft Bing is introducing an innovative “Deep Search” feature, powered by OpenAI’s GPT-4, aimed at delivering users more relevant and thorough responses to complex search queries. Importantly, Microsoft clarifies that Deep Search will not replace Bing's traditional web search; instead, it serves as an enhancement, enabling users to explore the web more deeply.

October 23 2024

Relay, Backed by a16z, Aims to Compete with Zapier as it Races to Market

A new automation startup is ready to disrupt the market, officially launching to the public today after an extended beta period. Relay, as the company is named, positions itself as a comprehensive workflow automation platform that assists individuals in managing tedious, repetitive tasks, going “beyond triggers and actions” commonly associated with established platforms like Zapier and IFTTT.

October 22 2024

Refresh Articles