GPT-2: 6-month follow-up

We’re releasing the 774 million parameter GPT‑2 language model after the release of our small124M model⁠in February, staged release of our medium355M model⁠in May, and subsequent research with partners and the AI community into the model’s potential for misuse and societal benefit. We’re also releasing an open-source legal agreement to make it easier for organizations to initiate model-sharing partnerships with each other, and are publishing a technical report about our experience in coordinating with the wider AI research community on publication norms.

## Key things we’ve learned

1. Coordination is difficult, but possible.To date, there hasn’t been a public release of a 1558M parameter language model, though multiple organizations have developed the systems to train them, or have publicly discussed how to train larger models. For example, teams from both NLP developerHugging Face⁠(opens in a new window)and theAllen Institute for Artificial Intelligence⁠(opens in a new window)(AI2) with the University of Washingtonhave explicitly adopted similar staged release approaches to us⁠(opens in a new window). Since February, we’ve spoken with more than five groups who have replicated GPT‑2.A

2. Humans can be convinced by synthetic text.Research from our research partners Sarah Kreps and Miles McCain at Cornellpublished in _Foreign Affairs_⁠(opens in a new window)says people find GPT‑2 synthetic text samples almost as convincing (72% in one cohort judged the articles to be credible) as real articles from the New York Times (83%).BAdditionally, research from AI2/UW has shown that news written by a system called “GROVER” can bemore plausible than human-written propaganda⁠(opens in a new window). These research results make us generally more cautious about releasing language models.

3. Detection isn’t simple.In practice, we expect detectors to need to detect a significant fraction of generations with very few false positives. Malicious actors may use a variety of sampling techniques (including rejection sampling) or fine-tune models to evade detection methods. A deployed system likely needs to be highly accurate (99.9%–99.99%) on a variety of generations. Our research suggests that current ML-based methods only achieve low to mid–90s accuracy, and that fine-tuning the language models decreases accuracy further. There are promising paths forward (see especially those advocated by the developers of “GROVER⁠(opens in a new window)”) but it’s a genuinely difficult research problem. We believe that statistical detection of text needs to be supplemented with human judgment and metadata related to the text in order to effectively combat misuse of language models.

We’ve partnered with four leading research organizations to analyze both the newly-released 774M parameter GPT‑2 model and the unreleased full-size GPT‑2 model. We’ve included some preliminary results from them in our technical report, and their ongoing analysis will factor into the potential release of the 1558M model. We’ve also developed a non-commercial legal agreement to facilitate the sharing of models between organizations and are publishing it here to help others initiate such sharing schemes.

## Future release decisions

Research from these partners will factor into our future release decisions, as will observing how the 774M model is used, and discussing language models with researchers and policymakers to understand the considerations around larger models. As part of our staged release strategy, our current plan is to release the 1558M parameter model in a few months, but it’s plausible that findings from a partner, or malicious usage of our 774M model, could change this.

We think that a combination of staged release and partnership-based model sharing is likely to be a key foundation of responsible publication in AI, particularly in the context of powerful generative models. The issues inherent to large models are going to grow, rather than diminish, over time. We hope that our work on GPT‑2, discussed further in thetechnical report⁠(opens in a new window)we’re publishing, will help provide evidence the AI community can draw on when thinking about the publication challenges inherent to some parts of AI research.

OpenAI publishes ablog post⁠andpaper⁠(opens in a new window)on GPT‑2.

Released small parameter (124M) GPT‑2 model.

The Partnership on AI co-hosts a dinner with OpenAI todiscuss publication norms⁠(opens in a new window), then publishes a blog summarizing the discussion.

Released medium parameter (355M) model.

Released dataset of outputs from large-scale models.

Released a detection baseline to help people understand how to detect outputs of models like GPT‑2.

The original blog post isupdated⁠to reflect these changes.

Adam Kinglaunches⁠(opens in a new window)“TalktoTransformer.com”, giving people an interface to play with the newly released models.

Hugging Face releases a conversational AI demo based on GPT‑2 models, discusses some of the ethical considerations in the release decision, anddecides not to release the large GPT‑2 model⁠(opens in a new window).

Researchers with the University of Washington and Allen Institute for AI Researchreveal GROVER⁠(opens in a new window), a GPT‑2–style language model; they do not release the large versions of the model, and conduct research into the detection of the outputs of such models.

OpenAI testifies in Congress⁠(opens in a new window)about the implications of synthetic media, including a discussion of synthetic text.

DeepMind discusses GPT‑2 and the importance of appropriate publication norms for generative models in their recentdiscussion⁠(opens in a new window)of unsupervised learning.

OpenAI commences a research collaboration with thePartnership on AI⁠(opens in a new window)for publication norms in AI research. We’re trying to work with a diverse set of AI research organizations to come up with questions scientists may want to ask ahead of publication, and potential frameworks they can use to make publication decisions.

DeepTabNine develops a code autocompleter⁠(opens in a new window)based on GPT‑2.

Multi-turn Dialogue Response Generation with Autoregressive Transformer Models⁠(opens in a new window)

GLTR: Statistical Detection and Visualization of Generated Text⁠(opens in a new window)

Researchers with the Thoughtful Technology Project and the University of Cambridge published a working paper on “Reducing malicious use of synthetic media research: Considerations and potential release practices for machine learning⁠(opens in a new window)”.

Hello, It’s GPT‑2—How Can I Help You? Towards the Use of Pretrained Language Models for Task-Oriented Dialogue Systems⁠(opens in a new window)

AI startup AI21 Labs releasesHAIM⁠(opens in a new window), a neural text generator; they only release a 345M variant of the model, “equivalent in size to the publicly released versions of Grover and GPT‑2.”

NVIDIA Researchtrains⁠(opens in a new window)8.3 billion parameter GPT‑2 model.

Released larger parameter (774M)model.

1. A Having these conversations is difficult, as it involves talking candidly about proprietary systems and it’s unclear who to reach out to in specific organizations to discuss such models and what the appropriate processes are for inter-org discussion about unreleased research.

2. B These samples were generated via a “human-in-the-loop” process meant to simulate contemporary disinformation operations, where a human generated samples and periodically selected some for exposure to people.

Democratic inputs to AI grant program: lessons learned and implementation plans Safety Jan 16, 2024

Building agricultural database for farmers API Jan 12, 2024

Creating websites in minutes with AI Website Builder API May 29, 2025

Our Research * Research Index * Research Overview * Research Residency * OpenAI for Science * Economic Research

Latest Advancements * GPT-5.3 Instant * GPT-5.3-Codex * GPT-5 * Codex

Safety * Safety Approach * Security & Privacy * Trust & Transparency

ChatGPT * Explore ChatGPT(opens in a new window) * Business * Enterprise * Education * Pricing(opens in a new window) * Download(opens in a new window)

Sora * Sora Overview * Features * Pricing * Sora log in(opens in a new window)

API Platform * Platform Overview * Pricing * API log in(opens in a new window) * Documentation(opens in a new window) * Developer Forum(opens in a new window)

For Business * Business Overview * Solutions * Contact Sales

Company * About Us * Our Charter * Foundation * Careers * Brand

Support * Help Center(opens in a new window)

More * News * Stories * Livestreams * Podcast * RSS

Terms & Policies * Terms of Use * Privacy Policy * Other Policies

(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)

English United States

GPT-2: 6-month follow-up

The unpaid, unrecognised burden of the women-led care economy of India

Andrej Karpathy Transitions from Coding to Directing AI Agents

Musk and Hassabis Discuss AI's Impact on Scientific Discovery

Perfios Reports 46% Profit Increase to ₹104 Cr in FY25, Revenue Surpasses ₹700 Cr

Latest Briefs