Skip to content
Home / Glossary / Glossary

Foundation Model

A large neural network trained on broad, unlabeled data at scale and designed to be adapted to a wide range of downstream tasks through fine-tuning, prompting, or retrieval augmentation. Foundation models are the architectural basis of most commercially deployed AI systems, and their licensing terms, compute dependencies, and performance characteristics are central to AI company valuation in M&A transactions.

The term foundation model was coined by Stanford researchers in 2021 to describe large neural networks trained on internet-scale datasets that can be adapted to many tasks without task-specific training from scratch. The practical definition has narrowed somewhat in commercial use: today’s foundation models are predominantly large language models (GPT-4, Claude 3, Gemini 1.5, Llama 3) and multimodal models (capable of processing text, images, audio, or video), though the category technically includes large vision models and biological sequence models used in drug discovery and genomics.

What distinguishes a foundation model from earlier machine learning approaches is the training paradigm. Previous AI systems were trained task-by-task on labeled datasets assembled for specific applications. A fraud detection model was trained on labeled fraud cases; a document classifier was trained on labeled documents. Foundation models invert this: training on vast corpora of unlabeled text or image data produces a general representation of language or visual information, and that general representation is then transferred to specific applications. The result is that a foundation model trained on a trillion tokens of general text can, with relatively modest adaptation effort, outperform models trained from scratch on smaller task-specific datasets for an enormous range of downstream applications.

The commercial consequence is that the barrier to building a capable AI product has dropped substantially. An AI company today can build a competitive contract review tool, customer service agent, or code generation assistant by fine-tuning or prompting an existing foundation model, without the years of data collection and model training that previously differentiated AI companies. This changes the M&A due diligence question from “is their model good?” to “is anything about this product defensible beyond the model?”


Foundation Model Categories Relevant to APAC AI M&A

Closed-source commercial models. OpenAI (GPT-4o, o1), Anthropic (Claude 3 Sonnet, Opus), and Google (Gemini 1.5 Pro, Gemini Flash) offer foundation models as API services. An AI company building on these models has no capital expenditure on model training, fast time-to-market, and access to state-of-the-art capabilities, but pays inference costs at commercial API rates, has no training data transparency, and faces vendor dependency risk if pricing or access changes.

Open-source models. Meta (Llama 3), Mistral, and a growing number of APAC-native open-source releases (Qwen from Alibaba, ELYZA from Japan, HyperCLOVA X from Naver Korea) are available for self-hosted deployment. Open-source models allow companies to run inference on their own infrastructure (eliminating per-query API costs for high-volume applications), fine-tune on proprietary data without sharing that data with a third-party API provider, and negotiate licensing terms if the open-source license allows commercial use. The tradeoff is model maintenance, infrastructure cost, and performance on general benchmarks that typically lags frontier closed-source models by six to eighteen months.

Domain-specific foundation models. A small number of AI companies have trained foundation models on domain-specific data: BloombergGPT (financial text), Med-PaLM 2 (medical knowledge), and various code-specialized models. In APAC, several companies have trained CJK-language-focused foundation models where general English-dominant models underperform. Domain-specific foundation models can offer genuine performance advantages on their target domain, but training and maintaining them requires infrastructure and ML engineering at a scale that most early-stage AI companies cannot afford.


Foundation Model Dependency in AI Company Valuation

When an AI company is evaluated in an M&A process, foundation model dependency is one of the first technical diligence questions. The valuation implications are significant.

Heavy dependence on a single closed-source API creates vendor risk. An AI company that routes 90% of its production inference through a single closed-source API (OpenAI, Anthropic, Google) has a business model that can be disrupted by pricing changes, access policy changes, model deprecation, or API availability incidents. Acquirers apply a risk discount to companies with undiversified foundation model dependencies, particularly for enterprise applications where SLA commitments to clients require API reliability that a third-party model provider cannot guarantee. The Anthropic terms of service, for example, allow API pricing changes with 30 days’ notice — a structural risk for any company with margin commitments to enterprise clients.

Foundation model licensing must be clean for data-sensitive industries. Commercial API providers have different data policies regarding training data use. OpenAI’s default API terms historically allowed the use of API inputs for model training unless customers opted out; policies have changed over time and vary by product tier. For AI companies processing sensitive client data — financial records, medical information, legal documents, or proprietary business intelligence — foundation model data policies are a diligence item. APAC financial regulators (MAS, HKMA, FSA) have each issued guidance on data governance for AI systems that processes customer financial data, and an AI company whose data flows to a third-party foundation model API in a non-compliant way has a regulatory risk that surfaces in acquisition diligence.

Open-source model choices affect infrastructure cost and gross margins. An AI company that has migrated from expensive closed-source APIs to open-source model hosting for high-volume applications typically shows structurally better gross margins. Acquirers model the infrastructure cost trajectory: a company running inference on Llama 3 70B on its own GPU infrastructure at $0.0002 per 1,000 tokens is in a fundamentally different margin position from one running equivalent inference on GPT-4o at commercial list rates. Migration difficulty varies by application; for structured output, function calling, and agentic workflows, the performance gap between frontier closed models and best-in-class open models has narrowed substantially since 2024.

APAC-language performance requires explicit benchmark evidence. Foundation models trained predominantly on English-language data perform worse on Japanese, Korean, Traditional and Simplified Chinese, Vietnamese, and Bahasa Indonesia. For APAC AI companies serving customers whose workflows are in CJK languages, the choice of foundation model is not interchangeable. An acquirer evaluating an APAC AI company with Japanese or Korean enterprise clients should request benchmark evidence of the model’s performance on representative APAC-language tasks, not just English-language benchmarks, before accepting capability claims at face value.


Foundation Models in APAC: The Domestic Alternative Layer

APAC has produced a meaningful set of foundation models specifically designed for the region’s language and regulatory environment. These are relevant to AI company diligence in two ways: as alternative foundation model choices for APAC-native AI companies, and as potential acquirers of AI companies that have built on top of their model ecosystems.

Alibaba’s Qwen models (Tongyi Qianwen in Chinese) are open-weight models with strong Mandarin Chinese performance, optimized for APAC commercial and enterprise use cases. Alibaba provides both API access through Alibaba Cloud and open weights for self-hosting. AI companies serving Mainland China enterprises that process primarily Chinese text have strong reasons to evaluate Qwen against global alternatives.

Naver’s HyperCLOVA X is a Korean-language foundation model trained on a large-scale Korean text corpus, with performance on Korean-language tasks that significantly outperforms general multilingual models. Korean AI companies in legal tech, compliance, or financial services that serve Korean-language enterprise clients should benchmark HyperCLOVA X explicitly.

ELYZA (Japan) has developed Japanese-language LLMs fine-tuned on large volumes of Japanese business and web text. For APAC AI companies targeting Japanese enterprise contracts, ELYZA models offer an alternative to prompting a multilingual model in Japanese with lower performance.

For acquirers evaluating APAC AI companies, foundation model choice is a proxy for how well the founding team understands their market. A company building AI products for Japanese enterprise clients on a US-centric foundation model without Japanese-language benchmarking has a technical blind spot that a better-informed team would have addressed.


Foundation Model Licensing and M&A Structural Considerations

Foundation model licensing is an emerging area of M&A complexity that most APAC founders and corporate development teams do not fully anticipate before entering a transaction.

Open-source licenses vary significantly in commercial terms. Llama 3’s license allows commercial use but imposes restrictions on companies with more than 700 million monthly active users; products with a smaller user base can use it freely. Mistral’s Apache 2.0 license allows unrestricted commercial use. Stable Diffusion’s CreativeML Open RAIL-M license restricts specific use cases (deepfake generation, defamatory content) while allowing commercial use generally. An AI company in an acquisition process should document exactly which open-source model weights are used in production, which licenses govern them, and whether any license restrictions apply to the acquirer’s planned use of the acquired product.

Fine-tuned derivatives have complex IP considerations. When a company fine-tunes a foundation model on proprietary data to create a specialized derivative model, the IP status of that derivative depends on the license of the base model, the nature of the fine-tuning (whether it produces a derivative work, and in which jurisdiction), and whether any third-party training data was used that has its own license restrictions. Acquirers in APAC transactions involving fine-tuned models have increasingly requested detailed AI IP schedules from target companies, similar to the software IP schedules that are standard in technology M&A. Founders who have not maintained clean documentation of their training data provenance, model versioning, and license obligations will spend significant time in diligence reconstructing this information.

API-based foundation model dependencies should be disclosed early. If a product’s core functionality depends on API access to a foundation model that is not owned or controllable by the seller, that dependency is a disclosure obligation in most APAC jurisdictions’ M&A representations and warranties frameworks. Acquirers regularly request that material API agreements be assigned or novated as part of deal closing. Some foundation model providers have written provisions into their API terms prohibiting assignment without consent. Founders who have not considered the assignment implications of their API contracts before a deal process will encounter this issue as a closing condition.


Foundation Models and the AI Company Valuation Premium

The valuation distinction between AI companies that have built defensible products on top of foundation models versus thin wrappers around foundation model APIs is the central AI M&A valuation question in the current market.

A product that calls a foundation model API and returns the response with minimal proprietary processing is a thin wrapper. The value of that product is almost entirely in the customer relationships, distribution, and workflow integrations — the model itself provides no defensible competitive advantage. Thin wrappers trade at 2-4x ARR when the customer base is strong, or at minimal premium to revenue multiples for comparable SaaS companies without AI differentiation.

A product that has built defensible value on top of a foundation model layer — through proprietary training data (see synthetic data, fine-tuning), retrieval infrastructure (see RAG), proprietary evaluation frameworks, or APAC-language optimization — has a genuine AI moat. These companies trade at 8-20x ARR depending on the strength of the data moat, the defensibility of the fine-tuned capability, and the NRR quality. The foundation model is infrastructure; the defensible asset is what the company has built on top of it.

For founders preparing for an acquisition, the most important preparation work is making that defensible layer explicit: document the proprietary training data, the evaluation benchmarks that demonstrate differentiation, and the APAC-specific optimizations that a general-purpose model cannot replicate. Acquirers will do this analysis in diligence regardless; presenting it proactively shortens the timeline and removes the uncertainty discount that ambiguous AI positioning creates.

Amafi Advisory advises AI company founders on sell-side M&A and fundraising advisory in Asia Pacific. For AI companies where foundation model architecture and IP questions are central to transaction positioning, get in touch to discuss how to present the AI layer in a transaction process.

Related terms

fine tuning rag inference cost context window synthetic data embeddings model card training corpus