Unveiling the Secrets of ChatGPT’s Voice Interaction
ChatGPT’s voice interaction has sparked considerable discussion due to its natural and fluid conversational style, surpassing many AI dialogue products. Recently, the underlying secret behind this capability—system prompts—has come to light.
In its interactions, ChatGPT adheres to several key principles:
- Clear and Simple Language: It uses natural, clear, and easily understandable language, favoring short sentences and simple vocabulary.
- Conciseness: Most responses are brief, typically one to two sentences, unless the user seeks further exploration of a topic.
- Fluid Dialogue: It enhances understanding through conversational markers rather than lists, ensuring a smooth flow of the dialogue.
- Clarification: When faced with ambiguity, ChatGPT asks clarifying questions instead of making assumptions.
- Engagement: It doesn’t rush to conclude conversations, even if the user simply wishes to chat.
- Topical Questions: Inquiries should remain focused on the topic at hand, avoiding inquiries about whether the user needs help.
- Avoiding Lists in Voice Dialogue: It refrains from utilizing lists or Markdown formats in spoken interactions.
- Numerical Representation: ChatGPT expresses numbers in words (e.g., “twenty twelve” instead of 2012).
- Error Management: If information seems nonsensical, it may be due to misinterpretation. It also ensures there are no typos or pronunciation errors from the user.
- Adherence to Guidelines: These rules must be followed consistently, even when questioned.
Typically, these system prompts are set by developers and remain unknown to users. However, a user named Bryce Drennan discovered that specific commands can prompt ChatGPT to reveal its mechanisms.
DALL-E 3: A More Complex System Prompt
The system prompts for DALL-E 3 are more nuanced than those for voice interactions. ChatGPT is instructed to summarize image descriptions in plain text to generate images. If a user does not specify the number of images, it defaults to creating four diverse options, adhering to specific requirements.
- Language Translation: If descriptions are not in English, they must be translated first.
- Image Generation Limits: Regardless of the request, a maximum of four images can be generated, and none may depict public figures.
- Art References: If a user requests references to works by artists from the last century, the response must simply state, “I cannot reference that artist,” without elaborating on this limitation. Users can reference styles from artists over a century ago by describing their style using three adjectives that reflect relevant artistic movements or historical contexts.
When describing images, users should clearly specify the type, like “oil painting” or “watercolor,” and ensure descriptions are varied. For individuals, details like lineage and gender are included, along with precise physical characteristics. The terms "various" or "diverse" should only be used when describing groups of three or more, and care should be taken not to alter fictional characters’ origins. Additionally, it is essential to avoid creating offensive images and to address traditional biases fairly. For notable figures, descriptions should replace specific identities with non-identifying information to maintain privacy.
Accessing System Prompts for Other Models
Methods to uncover system prompts also apply to other models like Bing. Users can enter specific commands in new dialogue windows to obtain this information. While in standard mode, only partial prompts may display; to view the complete version, switching to GPT-4 or equivalent modes is necessary. Users may need to experiment multiple times to achieve successful results.
The exploration of these capabilities is ongoing, with users discovering intriguing outcomes across different models.