Prompt Caching in the API

Many developers use the same context repeatedly across multiple API calls when building AI applications, like when making edits to a codebase or having long, multi-turn conversations with a chatbot. Today, we’re introducing Prompt Caching, allowing developers to reduce costs and latency. By reusing recently seen input tokens, developers can get a 50% discount and faster prompt processing times.

## Prompt Caching Availability & Pricing

Starting today, Prompt Caching is automatically applied on the latest versions of GPT‑4o, GPT‑4o mini, o1‑preview and o1‑mini, as well as fine-tuned versions of those models. Cached prompts are offered at a discount compared to uncached prompts.

Here's an overview of pricing:

| | Uncached Input Tokens | Cached Input Tokens | Output Tokens | | --- | --- | --- | --- | | GPT‑4o | | | | | gpt-4o-2024-08-06 | $2.50 | $1.25 | $10.00 | | GPT‑4o fine-tuning | $3.75 | $1.875 | $15.00 | | GPT‑4o mini | | | | | gpt-4o-mini-2024-07-18 | $0.15 | $0.075 | $0.60 | | GPT‑4o mini fine-tuning | $0.30 | $0.15 | $1.20 | | o1 | | | | | o1‑preview | $15.00 | $7.50 | $60.00 | | o1 mini | $3.00 | $1.50 | $12.00 |

## Monitoring Cache Usage

API calls to supported models will automatically benefit from Prompt Caching on prompts longer than 1,024 tokens. The API caches the longest prefix of a prompt that has been previously computed, starting at 1,024 tokens and increasing in 128-token increments. If you reuse prompts with common prefixes, we will automatically apply the Prompt Caching discount without requiring you to make any changes to your API integration.

Requests using Prompt Caching have a `cached_tokens` value within the `usage` field in the API response:

`1usage: {2 total_tokens: 2306,3 prompt_tokens: 2006,4 completion_tokens: 300,5 6 prompt_tokens_details: {7 cached_tokens: 1920,8 audio_tokens: 0,9 },10 completion_tokens_details: {11 reasoning_tokens: 0,12 audio_tokens: 0,13 }14}`

Caches are typically cleared after 5-10 minutes of inactivity and are always removed within one hour of the cache's last use. As with all API services, Prompt Caching is subject to our Enterprise privacy⁠ commitments. Prompt caches are not shared between organizations.

Prompt Caching is one of a variety of tools for developers to scale their applications in production while balancing performance, cost and latency. For more information, check out the Prompt Caching docs⁠(opens in a new window).

Our Research * Research Index * Research Overview * Research Residency * OpenAI for Science * Economic Research

Latest Advancements * GPT-5.3 Instant * GPT-5.3-Codex * GPT-5 * Codex

Safety * Safety Approach * Security & Privacy * Trust & Transparency

ChatGPT * Explore ChatGPT(opens in a new window) * Business * Enterprise * Education * Pricing(opens in a new window) * Download(opens in a new window)

Sora * Sora Overview * Features * Pricing * Sora log in(opens in a new window)

API Platform * Platform Overview * Pricing * API log in(opens in a new window) * Documentation(opens in a new window) * Developer Forum(opens in a new window)

For Business * Business Overview * Solutions * Contact Sales

Company * About Us * Our Charter * Foundation * Careers * Brand

Support * Help Center(opens in a new window)

More * News * Stories * Livestreams * Podcast * RSS

Terms & Policies * Terms of Use * Privacy Policy * Other Policies

(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)

English United States

Prompt Caching in the API

The unpaid, unrecognised burden of the women-led care economy of India

Andrej Karpathy Transitions from Coding to Directing AI Agents

Musk and Hassabis Discuss AI's Impact on Scientific Discovery

Perfios Reports 46% Profit Increase to ₹104 Cr in FY25, Revenue Surpasses ₹700 Cr

Latest Briefs