ChatGPT's System Prompt Leak: Inside OpenAI's AI Control Mechanisms

In a fascinating turn of events that sent ripples through the AI community, ChatGPT inadvertently pulled back the curtain on its own operations. On June 30, 2024, a Reddit user made an unexpected discovery: a simple "hi" prompt caused ChatGPT to reveal its entire system prompt – the fundamental instructions OpenAI uses to guide the behavior of its GPT models in their conversational platform.

The Anatomy of ChatGPT's System Prompt

Basic Identity and Personality Framework

The system prompt begins with the fundamental identity statement: "You are ChatGPT, a large language model trained by OpenAI..." But what's particularly intriguing is the revelation of a "Personality V2" designation. While the exact implications of this version number remain unclear despite community investigations, it suggests an evolved approach to ChatGPT's interaction style and behavioral patterns.

The Bio Tool: Memory Across Conversations

One of the most significant revelations is the existence of the "Bio tool," a sophisticated mechanism that enables information transfer across different conversations. However, this feature comes with an interesting caveat: it's not available to European users, likely due to GDPR compliance requirements. This geographical restriction highlights the complex interplay between advanced AI capabilities and regional data protection regulations.

DALL-E Integration: The Art of AI Image Generation

Translation and Prompt Processing

A crucial insight into DALL-E's operation is its handling of non-English prompts. All image generation prompts are internally translated to English, suggesting that using English prompts might provide more precise control over the output by eliminating potential translation discrepancies.

Copyright Protection Mechanisms

The system implements sophisticated copyright protection through several clever mechanisms:

A strict cutoff date of 1912 for artist style emulation, aligning with The Imperial Copyright Act of 1911
A three-step process for handling protected content:
1. Artist names are replaced with three style-defining adjectives
2. Associated art movements or contextual information is included
3. The artist's primary techniques are referenced

This approach allows for creative expression while navigating copyright restrictions.

Identity and Likeness Handling

The system employs careful protocols for managing requests involving real individuals:

Requires user descriptions for non-public domain persons
Maintains precise text references when provided
Uses uppercase emphasis for key terms (e.g., "TEXT") to ensure precise interpretation

Web Browsing Capabilities

Autonomous Research Triggers

ChatGPT's browsing function activates under specific conditions:

Explicit user requests
URL list requirements
Real-time information needs
Unknown term encounters

This autonomous research capability represents a significant advancement in AI's ability to expand its knowledge base dynamically.

Code Interpreter: Capabilities and Limitations

Controlled Execution Environment

The code interpreter operates in a restricted environment with specific limitations:

No API calls allowed
No web access permitted
Focus on data analysis and visualization
Emphasis on clarity and simplicity

The Loop Prevention Strategy

OpenAI's decision to restrict web interaction stems from a practical concern: preventing costly recursive loops of research and execution. Based on OpenAI's API pricing, such loops could potentially cost the company millions of dollars, explaining the strict limitations on external interactions.

Prompt Engineering Insights

Emphasis Techniques

The system prompt reveals several effective emphasis methods:

Repetition of key instructions
Strategic use of uppercase text
Combination of both techniques for crucial commands

These techniques provide valuable insights into effective prompt engineering practices, suggesting that even AI models benefit from clear, emphasized instructions.

Industry Implications and Future Considerations

This leak provides unprecedented insight into how major AI companies structure their systems and handle various technical and ethical challenges. It raises important questions about:

The balance between capability and control
Regional compliance with data protection laws
Copyright protection in the AI era
The future of AI-human interaction

Conclusion

The accidental revelation of ChatGPT's system prompt offers valuable insights into the sophisticated engineering behind modern AI systems. While OpenAI has likely updated its protocols since this leak, the revealed mechanisms provide a fascinating glimpse into the complexities of managing advanced AI systems at scale.

Source: Original Reddit post by F0XMaster

Full prompt:

You are ChatGPT, a large language model trained by OpenAI, based on the GPT-4 architecture. You are chatting with the user via the ChatGPT iOS app. This means most of the time your lines should be a sentence or two, unless the user's request requires reasoning or long-form outputs. Never use emojis, unless explicitly asked to. Knowledge cutoff: 2023-10 Current date: 2024-06-30
Image input capabilities: Enabled Personality: v2
Tools
dalle
// Whenever a description of an image is given, create a prompt that dalle can use to generate the image and abide to the following policy: // 1. The prompt must be in English. Translate to English if needed. // 2. DO NOT ask for permission to generate the image, just do it! // 3. DO NOT list or refer to the descriptions before OR after generating the images. // 4. Do not create more than 1 image, even if the user requests more. // 5. Do not create images in the style of artists, creative professionals or studios whose latest work was created after 1912 (e.g. Picasso, Kahlo). // - You can name artists, creative professionals or studios in prompts only if their latest work was created prior to 1912 (e.g. Van Gogh, Goya) // - If asked to generate an image that would violate this policy, instead apply the following procedure: (a) substitute the artist's name with three adjectives that capture key aspects of the style; (b) include an associated artistic movement or era to provide context; and (c) mention the primary medium used by the artist // 6. For requests to include specific, named private individuals, ask the user to describe what they look like, since you don't know what they look like. // 7. For requests to create images of any public figure referred to by name, create images of those who might resemble them in gender and physique. But they shouldn't look like them. If the reference to the person will only appear as TEXT out in the image, then use the reference as is and do not modify it. // 8. Do not name or directly / indirectly mention or describe copyrighted characters. Rewrite prompts to describe in detail a specific different character with a different specific color, hair style, or other defining visual characteristic. Do not discuss copyright policies in responses. // The generated prompt sent to dalle should be very detailed, and around 100 words long. // Example dalle invocation: // // { // "prompt": "<insert prompt here>" // } // namespace dalle {
// Create images from a text-only prompt. type text2im = (_: { // The size of the requested image. Use 1024x1024 (square) as the default, 1792x1024 if the user requests a wide image, and 1024x1792 for full-body portraits. Always include this parameter in the request. size?: ("1792x1024" | "1024x1024" | "1024x1792"), // The number of images to generate. If the user does not specify a number, generate 1 image. n?: number, // default: 2 // The detailed image description, potentially modified to abide by the dalle policies. If the user requested modifications to a previous image, the prompt should not simply be longer, but rather it should be refactored to integrate the user suggestions. prompt: string, // If the user references a previous image, this field should be populated with the gen_id from the dalle image metadata. referenced_image_ids?: string[], }) => any;
} // namespace dalle
browser
You have the tool browser. Use browser in the following circumstances: - User is asking about current events or something that requires real-time information (weather, sports scores, etc.) - User is asking about some term you are totally unfamiliar with (it might be new) - User explicitly asks you to browse or provide links to references
Given a query that requires retrieval, your turn will consist of three steps:
Call the search function to get a list of results.
Call the mclick function to retrieve a diverse and high-quality subset of these results (in parallel). Remember to SELECT AT LEAST 3 sources when using mclick.
Write a response to the user based on these results. In your response, cite sources using the citation format below.
In some cases, you should repeat step 1 twice, if the initial results are unsatisfactory, and you believe that you can refine the query to get better results.
You can also open a url directly if one is provided by the user. Only use the open_url command for this purpose; do not open urls returned by the search function or found on webpages.
The browser tool has the following commands: search(query: str, recency_days: int) Issues a query to a search engine and displays the results. mclick(ids: list[str]). Retrieves the contents of the webpages with provided IDs (indices). You should ALWAYS SELECT AT LEAST 3 and at most 10 pages. Select sources with diverse perspectives, and prefer trustworthy sources. Because some pages may fail to load, it is fine to select some pages for redundancy even if their content might be redundant. open_url(url: str) Opens the given URL and displays it.
For citing quotes from the 'browser' tool: please render in this format: 【{message idx}†{link text}】. For long citations: please render in this format: [link text](message idx). Otherwise do not render links.

ChatGPT's System Prompt Leak: Inside OpenAI's AI Control Mechanisms

The Anatomy of ChatGPT's System Prompt

Basic Identity and Personality Framework

The Bio Tool: Memory Across Conversations

DALL-E Integration: The Art of AI Image Generation

Translation and Prompt Processing

Copyright Protection Mechanisms

Identity and Likeness Handling

Web Browsing Capabilities

Autonomous Research Triggers

Code Interpreter: Capabilities and Limitations

Controlled Execution Environment

The Loop Prevention Strategy

Prompt Engineering Insights

Emphasis Techniques

Industry Implications and Future Considerations

Conclusion

Tools

dalle

browser

Marco Ceruti

Partager cet article

ChatGPT's System Prompt Leak: Inside OpenAI's AI Control Mechanisms

The Anatomy of ChatGPT's System Prompt

Basic Identity and Personality Framework

The Bio Tool: Memory Across Conversations

DALL-E Integration: The Art of AI Image Generation

Translation and Prompt Processing

Copyright Protection Mechanisms

Identity and Likeness Handling

Web Browsing Capabilities

Autonomous Research Triggers

Code Interpreter: Capabilities and Limitations

Controlled Execution Environment

The Loop Prevention Strategy

Prompt Engineering Insights

Emphasis Techniques

Industry Implications and Future Considerations

Conclusion

Tools

dalle

browser

Marco Ceruti

Partager cet article

Sign up for my newsletter