Context Window
The maximum number of tokens an AI language model can process in a single inference call, determining the total amount of text, code, or structured data a model can consider when generating a response or taking an action.
The context window is the bounded memory of an AI language model during any single interaction. Everything the model can reference when generating output, whether a user’s instruction, a retrieved document, a codebase, or a conversation history, must fit within the context window. Information outside the window is invisible to the model during that inference call.
Tokens are the unit of measurement: roughly one token corresponds to 0.75 English words, though tokenisation efficiency varies significantly by language. Chinese, Japanese, and Korean scripts typically encode at lower token-to-character efficiency than Latin script, which has material implications for APAC-focused AI products where documents are more token-dense for equivalent information content.
How Context Window Size Changed the AI Industry
Context window size has expanded rapidly since the first commercial language models. Early transformer models operated with windows of 512 to 1,024 tokens. GPT-3 extended this to 4,096 tokens; GPT-4 to 8,192 tokens, then 128,000 tokens in later versions. Anthropic’s Claude 3 operates with a 200,000-token context window. Google’s Gemini 1.5 Pro claimed 1 million tokens in early 2024. Moonshot AI, a Chinese AI company, built its entire product thesis around long-context capability: Kimi AI launched with a 200,000-token window and expanded to 1 million tokens, positioning it directly against global competitors on this dimension.
These numbers matter commercially because context window size determines which enterprise use cases a model can serve:
4,096 tokens (roughly 3,000 words): Sufficient for single-document summarisation, short-context question answering, and simple code completion. Inadequate for reading a full contract, a regulatory filing, or a multi-file codebase.
128,000 tokens (roughly 96,000 words): Sufficient for reading an entire novel, a long legal contract, or a multi-hundred-page regulatory submission. Permits whole-file code understanding but not whole-repository comprehension for large codebases.
1,000,000 tokens (roughly 750,000 words): Sufficient for reading an entire software repository of moderate size, a full case law database, or several quarters of corporate earnings transcripts simultaneously. Enables qualitatively different application types, particularly in legal AI, financial AI, and enterprise code intelligence.
Why Context Window Matters for AI Company Valuation
Context window capability is a meaningful valuation driver in AI company M&A because it determines which enterprise use cases a company’s model can credibly serve, and therefore which acquirer universe it belongs to.
An AI code tool company whose model can process only a single file at a time competes with dozens of plugin-based autocomplete tools. An AI code tool company whose model processes an entire codebase of 500,000 lines simultaneously serves a qualitatively different enterprise need — whole-repository refactoring, cross-file dependency analysis, codebase onboarding for new engineers — that the single-file tool cannot address regardless of how accurate its completions are. The latter product has a different acquirer universe and a different valuation ceiling.
The same logic applies across AI verticals. In legal AI, the difference between a 4,096-token and a 200,000-token context window is the difference between summarising a clause and reading an entire contract. In financial services AI, a 1 million-token window allows a model to read multiple years of earnings transcripts, regulatory filings, and analyst reports simultaneously — a capability that affects the product’s competitive position against human analysts in a way that shorter-context models cannot match.
In M&A due diligence, acquirers evaluate context window claims with scepticism: a model that technically supports 1 million tokens in a benchmark does not necessarily maintain coherence, accuracy, or attention to the full window in production. The “needle in a haystack” evaluation — whether a model can reliably retrieve a specific piece of information from a long document — is the relevant test, and performance degrades in non-trivial ways as context length increases for most current architectures. Acquirers conducting technical due diligence on an AI company with long-context claims will typically test this independently rather than accepting benchmark performance as representative of production behaviour.
Context Window in AI M&A Due Diligence
When evaluating an AI company with long-context capability as part of an M&A process, acquirers focus on three questions:
Is the context window capability based on proprietary infrastructure or on a base model upgrade? A company that has extended context capability through proprietary training and inference infrastructure (custom attention mechanisms, ring attention, sliding window variants) has a more defensible technical moat than a company that extended context by switching to a newer base model with a larger window. The former reflects genuine technical depth; the latter reflects access to a public capability that competitors can also access.
What is the inference cost at long context windows in production? Inference cost scales roughly with the square of context length for standard attention architectures — processing a 1 million-token context costs orders of magnitude more than processing a 4,000-token context. Companies that claim long-context capability without demonstrating sustainable unit economics at production scale are presenting a product capability that is not commercially viable for the enterprise use cases it ostensibly enables. The relevant question is not “does the model support 1 million tokens” but “what is the per-token inference cost at 100,000 to 1 million token contexts, and what customer price point does that require?”
Does the company own the inference infrastructure that enables long context, or is it dependent on third-party API providers? An AI company that delivers long-context capability by routing through Anthropic or OpenAI APIs has an inference cost structure and context window capability that both depend on the API provider’s pricing and model updates. An acquirer buying this company is not acquiring the context window capability itself; they are acquiring customer relationships and a product layer that could be disrupted by API pricing changes or by the API provider launching a competing product. Companies with proprietary inference infrastructure have a meaningfully different acquisition proposition.
Context Window and APAC AI Differentiation
For APAC AI companies, context window capability has particular strategic significance in three markets:
Japan. Japanese regulatory and financial documents are among the most information-dense in the world, combining kanji, kana, and numerical data at high character-per-concept ratios. A Japanese legal AI company or financial AI company whose model can process an entire regulatory filing or banking compliance document set simultaneously has a competitive position that a shorter-context model cannot replicate, regardless of overall language model quality. Fujitsu, NTT Data, and major Japanese financial institutions are all evaluating AI companies on context window performance in Japanese-language documents as a primary technical differentiator.
Korea. Korean government and chaebol corporate governance produces large document volumes in Hangul script. Samsung, SK, LG, and Hyundai group companies manage extensive regulatory reporting, contract documentation, and internal corporate governance materials that benefit from long-context AI processing. Korean AI companies with demonstrated long-context performance in Korean-language corpora are positioned for strategic acquisition by chaebol IT services groups seeking internal AI capability.
China. Moonshot AI’s Kimi AI product, built entirely around long-context capability in Mandarin, demonstrates the market thesis: a model that can read an entire research paper, legal brief, or engineering specification in Mandarin simultaneously — rather than processing it in chunks with diminishing coherence — is a qualitatively different product for Chinese enterprise users. The Kimi launch contributed to a broader Chinese AI company funding surge in 2024 that valued long-context capability as a standalone investment thesis rather than a feature.
AI code tool companies, legal AI platforms, and financial services AI companies considering a sale or fundraising process should be prepared to demonstrate context window performance under realistic enterprise conditions, not benchmark conditions. Amafi Advisory works with AI company founders across APAC to prepare for technical and commercial due diligence. For a confidential discussion of your AI company’s strategic options, contact our team.
Related terms: Synthetic Data — for how training data generation affects context window fine-tuning economics. Red-Teaming — for how long-context models introduce specific evaluation challenges (attention drift, coherence degradation, information retrieval failures). RAG — the retrieval architecture that determines what enters the context window; larger context windows improve RAG performance by allowing more retrieved passages per query. ARR — the primary financial metric alongside which context window capability is evaluated in AI company M&A. Due Diligence — for the full technical and commercial due diligence process in AI company acquisitions.