How is AI company due diligence different from standard tech M&A diligence?

In traditional software M&A, the primary asset is the codebase, the customer relationships, and the recurring revenue stream. In AI company M&A, the primary asset is often the trained model — and the model's value depends on factors that don't appear on a balance sheet: the quality and provenance of training data, the ownership chain of model weights, the reproducibility of the training process, and the defensibility of the performance advantage. An AI company diligence scope must therefore include a dedicated model IP track, a training data provenance track, and a technical performance validation track that simply do not exist in standard software due diligence.

What documents should an AI company put in its data room for due diligence?

The AI-specific data room additions beyond standard M&A materials include: (1) Model IP schedule — a document listing every trained model, who owns it, when it was trained, what data was used, and whether any third-party model (e.g. OpenAI, Hugging Face base models) was used as a starting point; (2) Training data inventory — sources, licensing status, privacy compliance assessment, and data access agreements; (3) Open-source component register — all OSS libraries and models used, licence types, and any GPL/AGPL contamination analysis; (4) Technical performance benchmarks — validated accuracy, precision/recall, and performance against baseline and competitor models; (5) GPU and infrastructure cost schedule — current monthly compute costs, lease obligations, and a projection of inference costs at scale; (6) Key person agreements — IP assignment clauses, non-competes, and retention provisions for ML engineers and technical leads.

How do acquirers assess whether a trained AI model actually belongs to the company?

Acquirers look for three things. First, clean work-for-hire arrangements with every employee, contractor, and academic collaborator who contributed to the model — IP assignment must have been documented in writing at the time of contribution, not retrospectively. Second, clear chain of title for any third-party model used as a base — if the company fine-tuned a foundation model (e.g. an open-source LLM), the licence terms of that base model govern what can be done with the resulting fine-tuned model, including whether it can be sold or transferred. Third, absence of cloud provider claims — some enterprise AI development agreements with cloud providers contain clauses that give the provider rights to anonymised model derivatives. Acquirers will review cloud service agreements and AI platform terms of service as standard.

What are the most common deal-killers in AI company due diligence?

The five most common AI-specific deal-killers are: (1) Training data provenance — models trained on scraped web data without adequate legal basis, or on licensed data that prohibits commercial transfer; (2) Open-source GPL contamination — GPL-licensed components in core model infrastructure that restrict proprietary use; (3) Founder-only reproducibility — training processes that only work because of one individual's undocumented expertise, with no reproducibility without that person; (4) Customer contract IP issues — contracts that granted customers ownership or co-ownership of fine-tuned model versions; (5) Regulatory exposure — AI systems classified as high-risk under the EU AI Act without completed conformity assessments, or AI operating in regulated sectors (financial services, healthcare) without the required authorisations.

What is a pre-diligence IP audit and should AI founders do one before running an exit process?

A pre-diligence IP audit is an internal review — typically conducted with IP counsel — of all IP ownership, model rights, training data licensing, open-source usage, and employee/contractor IP assignments. For AI companies, this is strongly recommended before running any formal sale process. The audit typically takes 4–8 weeks and costs USD 15,000–50,000 in legal fees. Its value is in identifying problems early enough to fix them — rather than having them surface during acquirer due diligence, where they become price chips or deal-breakers. Companies that have completed a pre-diligence audit and can produce a clean IP summary close faster and at better valuations than those that cannot.

Home / Blog / M&A Fundamentals

AI Company Due Diligence: What Acquirers Examine

A comprehensive guide to AI company due diligence — covering IP and model ownership, training data rights, technical diligence, team retention, commercial quality, and regulatory compliance. For sell-side founders and buy-side corporate development teams.

Published April 23, 2026By Daniel Bae

Acquiring an AI company is not the same as acquiring a software company. The due diligence scope looks superficially similar — financial, commercial, legal, technical — but the specific questions within each workstream are fundamentally different. The asset being purchased is a trained model (or a pipeline of models), and validating that asset requires expertise and investigation that most standard M&A diligence frameworks do not address.

This guide covers AI company due diligence comprehensively — written for sell-side founders who need to understand what to prepare, and for buy-side corporate development teams building their AI acquisition playbook.

Amafi Advisory advises AI company founders and corporate acquirers on AI M&A transactions across Asia Pacific. Understanding what acquirers will examine — and preparing for it — is one of the highest-value activities a founder can undertake before running a sale process.

Why AI Company Due Diligence Is Different

In a conventional SaaS acquisition, the primary asset is the recurring revenue stream and the codebase that generates it. The diligence question is: does this revenue actually recur, and is the software maintainable?

In an AI acquisition, the primary asset is the trained model — and the model’s value depends on things that don’t appear in a P&L:

Who actually owns the model weights? Weights are the product of a training process that may have involved employees, contractors, academic collaborators, third-party base models, and cloud provider infrastructure. Ownership is not automatic.
Is the training data legally usable? Models are inseparable from the data that trained them. Data obtained in violation of copyright, privacy law, or contractual restrictions taints the model and may make it untransferable.
Can the model be reproduced and improved? A model that cannot be reproduced — because the training pipeline is undocumented or depends on one person’s expertise — is a depreciating asset, not a defensible one.
What happens when the model decays? AI models degrade as the world changes. Diligence must assess the ongoing cost and feasibility of model maintenance and retraining.

These questions are unique to AI M&A. Acquirers who skip them pay for it post-close.

IP and Model Ownership Diligence

The model ownership question is the centrepiece of AI-specific diligence. Acquirers systematically examine:

Who owns the model weights. This requires tracing back to every contributor to the training process: full-time employees (covered by employment IP assignment agreements), contractors (who must have had separate written IP assignment clauses — employment agreements do not automatically cover contractor work), academic collaborators (university IP policies often claim rights to research outputs), and interns. A complete IP ownership schedule must document every material contributor with confirmation of assignment.

Third-party model dependencies. If the company built on top of a pre-trained foundation model — whether a commercial API (OpenAI, Anthropic, Cohere) or an open-source model (LLaMA, Mistral, Falcon) — the licence terms of that base model govern what can be done with the resulting product. Commercial API terms often prohibit using outputs to train competing models. Open-source licences vary: MIT and Apache 2.0 are permissive; GPL and AGPL require that derivative works also be open-sourced, which is incompatible with most commercial acquisitions. Acquirers will map every model dependency and its licence.

Open-source licence risks. Even if the company’s own model is proprietary, open-source components in the training infrastructure or inference stack may carry restrictive licences. GPL contamination — where GPL-licensed code is incorporated into proprietary software — can require that the entire integrated codebase be open-sourced. For AI companies, which typically use rich open-source ecosystems, a systematic OSS licence audit is non-negotiable.

Contributor agreements. Academic or research-origin AI companies must have valid contributor licence agreements (CLAs) from all contributors who submitted code or training utilities to the company’s repositories.

Cloud provider terms. Enterprise agreements with major cloud providers (AWS, Google Cloud, Azure) and AI platform providers sometimes contain clauses granting the provider rights to anonymised model derivatives or usage data. Acquirers review all cloud and AI service agreements for these provisions.

Training Data Diligence

Training data diligence is the area where AI M&A most often surfaces material problems. The key questions:

Data provenance and licensing. Where did the training data come from? Acquirers require a complete data source inventory: proprietary first-party data (highest value, lowest risk), licensed third-party datasets (review licence terms for commercial transfer and sub-licensing rights), synthetic data (assess generation methodology and any dependencies on underlying licensed or copyrighted material), and web-scraped data (the highest-risk category — recent legal challenges to web scraping for AI training, including cases in the US and EU, have created material legal uncertainty around scraping-based training datasets).

Privacy compliance. If any training data includes personal information — including data about individuals collected from web scraping, customer interactions, or purchased datasets — it is subject to privacy law. In APAC transactions:

GDPR (EU): Applies extraterritorially to any personal data of EU residents. Training data involving EU personal data requires a valid legal basis under GDPR. The European Data Protection Board has issued guidance indicating that personal data used for AI model training must meet standard GDPR conditions.
Singapore PDPA: The Personal Data Protection Act governs personal data in Singapore; cross-border transfers require adequate protection safeguards.
Australian Privacy Act: Australia’s Privacy Act applies to AI companies processing Australian personal data; recent amendments have tightened requirements around automated decision-making.
South Korea PIPA: South Korea’s Personal Information Protection Act imposes strict requirements on personal data processing and cross-border transfer, particularly relevant for AI companies targeting Korean acquirers.

Synthetic data quality. Synthetic data is increasingly used to scale AI training datasets. Diligence must assess the generation methodology, whether synthetic data was generated using a licensed base dataset, and whether the quality and distribution of synthetic data adequately represents real-world deployment conditions.

Data access agreements. For AI companies that access proprietary data through enterprise customer agreements (a common model in B2B AI), acquirers review whether those agreements permit the company to use customer data for model training, and whether the data access rights transfer with the acquisition.

Technical Diligence

Technical diligence for AI companies goes well beyond standard software architecture review:

Model architecture and reproducibility. Acquirers — particularly technical ones like Korean chaebols with strong internal engineering teams — will conduct a hands-on technical review. This includes: documented training pipelines with version-controlled code, the ability to reproduce model training results from documented checkpoints, experiment tracking (MLflow, Weights & Biases, or equivalent), and model card documentation covering intended use, performance characteristics, and known limitations.

Performance benchmarking. Independent validation of performance claims is standard. Acquirers will run the model on held-out test datasets, benchmark against publicly available alternatives, and test performance under distribution shift (i.e., data that looks different from the training set). Inflated benchmark claims are a significant red flag.

Infrastructure costs and GPU dependencies. AI companies often have significant compute costs that behave differently from software COGS. Diligence examines: current monthly GPU spend (cloud vs. on-premise), inference cost per query or per unit of output, how COGS scales with usage growth, GPU reservation and lease obligations, and whether the company is dependent on specific hardware (e.g., specific NVIDIA GPU generations) with limited supply availability.

Model decay risk. All AI models degrade over time as the world changes — a model trained on 2023 data performs worse on 2026 inputs. Acquirers assess: how quickly does this model decay, what triggers retraining, how long does retraining take, what does it cost, and does the team have a systematic process for monitoring and responding to performance degradation?

Security and adversarial robustness. AI models are susceptible to adversarial attacks, prompt injection (for LLM-based systems), and data poisoning. Enterprise acquirers, particularly those in financial services, healthcare, or government-adjacent applications, will assess adversarial robustness testing and red-team results.

Team and Talent Diligence

In AI acquisitions, the team is often the primary asset — and acquirers know it.

Key ML engineers and researcher dependencies. Acquirers map the technical org chart and identify single points of failure: individuals whose departure would materially impair the company’s ability to maintain, retrain, or improve its models. For small AI companies (10–30 people), this is often 2–5 individuals. For each key person, acquirers assess existing retention arrangements, unvested equity, non-compete provisions, and immigration status.

Founder dependency. Is the training methodology documented or does it live in the founder’s head? Can the model be improved without the founder? This is a direct question in AI technical diligence that has no equivalent in standard software M&A.

Non-competes and prior employer IP. AI engineers often come from large tech companies (Google, Meta, Microsoft, OpenAI) or universities with strict IP assignment policies. Acquirers check that key employees’ prior employment agreements do not restrict their ability to work on the AI company’s technology, and that they did not bring proprietary IP from prior employers.

Immigration status in APAC jurisdictions. For AI companies with international teams — common in APAC — acquirers assess visa and work authorisation status for key personnel. An acquirer completing a cross-border transaction needs confidence that key engineers can continue working post-close. In APAC, this includes reviewing Employment Pass (Singapore), Skilled Worker visa (Australia), and other jurisdiction-specific work authorisation arrangements.

Commercial Diligence for AI Companies

ARR quality and AI-specific contract terms. Not all ARR is equal in AI companies. Acquirers examine: what percentage of revenue is genuinely recurring vs. project-based, whether contracts include AI-specific SLAs (accuracy minimums, uptime, model update obligations), whether customers have audit rights over the AI system’s outputs, and whether AI performance SLAs create material liability exposure.

Customer concentration. Standard M&A diligence applies — heavy concentration in one or two customers is a risk. For AI companies, concentration risk compounds because a single customer often contributes a disproportionate amount of training data, creating dependency beyond just revenue.

Churn and model update obligations. AI companies often contractually commit to maintaining or improving model performance. Acquirers assess whether the company has the resources to meet these obligations post-close, particularly if GPU costs will increase or the acquiring company has different infrastructure standards.

Competitive moat assessment. The central commercial diligence question for an AI company: is this a real moat or a temporary performance advantage that will be competed away? Acquirers assess the defensibility of the training data (proprietary or replicable?), the cost and time to replicate the model from scratch, and the network effects (does more usage generate better training data that compounds the performance advantage?).

Regulatory and Compliance Diligence

EU AI Act. The EU AI Act’s General Purpose AI obligations took effect August 2025. High-risk system obligations under Article 6 take effect August 2026. For any AI company with EU customers or EU personal data in its training set, acquirers assess: whether the AI system is classified as high-risk under Annex III, whether required conformity assessments have been completed, and whether the company has adequate documentation and human oversight mechanisms. Fines under the EU AI Act reach up to EUR 35 million or 7% of global turnover for prohibited AI practices. EU AI Act compliance has become a standard deal-room item in 2026.

Singapore IMDA AI Governance. Singapore’s IMDA updated its AI Verify testing framework in May 2025. While Singapore’s approach remains voluntary (unlike the EU’s mandatory framework), IMDA governance alignment is increasingly expected by Singapore-based acquirers and has become a due diligence item for cross-border transactions involving Singapore assets.

China AI regulations. For AI companies with Chinese users or Chinese training data, the Cyberspace Administration of China’s AI regulations (the Generative AI Regulation effective August 2023, and subsequent guidance) impose requirements around algorithmic recommendation, deep synthesis, and generative AI services. Cross-border AI transactions involving China-derived data require specific compliance assessment.

Sector-specific AI regulation. AI companies operating in regulated sectors face additional regulatory diligence: financial services AI (MAS Technology Risk Management Guidelines in Singapore; APRA guidance in Australia), healthcare AI (TGA software as medical device classification in Australia; MFDS in Korea), and government/critical infrastructure AI (national security review in multiple jurisdictions).

Financial Diligence Specifics for AI Companies

GPU capex and cloud compute treatment. AI companies often have significant compute obligations that are accounted for inconsistently. Acquirers normalise financials to properly reflect GPU spend: whether cloud GPU costs are in COGS or R&D, how long-term GPU reservations are treated on the balance sheet, and what the true economic gross margin is after fully-loaded compute costs.

R&D capitalisation. Model development costs may be capitalised as intangible assets under IFRS or US GAAP, but capitalisation criteria in AI (particularly distinguishing the research phase from the development phase) are not always consistently applied. Acquirers review R&D capitalisation policies and may restate financials to reflect a more conservative approach.

Inference cost projection. As AI companies scale, inference costs (the cost of running the model to serve each customer query) become a significant COGS driver. Acquirers build their own inference cost projections at the target’s projected scale, and stress-test whether the company’s gross margin assumptions survive volume growth.

Earnout and milestone structuring. Because AI company valuations are heavily dependent on future model performance and team retention, earnout structures are common. Financial diligence includes stress-testing earnout scenarios: what happens to valuation if key technical milestones are missed, or if a key engineer departs in year two of the earnout period?

How to Prepare Your AI Company for Diligence

Run a pre-diligence IP audit. Before beginning any sale process, commission an IP audit with specialist AI IP counsel. The audit should cover model ownership chain, training data licensing, open-source component register, and employee/contractor IP assignments. Identified problems are fixable before they become deal-breakers; they are far more costly to address under the time pressure of a live diligence process.

Build an AI-specific data room. Beyond standard M&A data room materials, AI company data rooms should include: a model IP schedule, a training data inventory with provenance documentation, technical performance benchmarks (third-party validated where possible), GPU and infrastructure cost schedules, key person agreements with IP assignment and retention provisions, and AI governance documentation (EU AI Act compliance status, internal AI ethics policy, bias testing results).

Document the training pipeline. Reproducibility is diligence-critical. Ensure the training process is documented in sufficient detail that it can be reproduced by a technically competent team without the original developers present. Version-controlled code, documented hyperparameters, and stored model checkpoints are the minimum standard.

Clean up open-source dependencies. Commission an OSS licence audit and remediate any GPL or AGPL contamination issues. This is particularly important for AI companies that have used open-source components rapidly during R&D phases without systematic licence tracking.

Prepare team retention packages. Before starting a process, work with your legal team to have retention arrangements for key ML engineers and technical leads ready to present. Acquirers will ask about team retention early, and having a concrete proposal (rather than “we’ll figure it out”) materially accelerates the process.

Amafi Advisory runs sell-side processes for AI companies across Asia Pacific and supports founders through the preparation and diligence phases of M&A transactions. If you are considering a sale and want to understand what acquirers will examine — and how to be prepared — contact us for a confidential discussion.

Related: Korean Chaebol AI Acquisitions: A Founder’s Guide — for founders exploring Korean corporate acquirers. For a broader APAC perspective, see Selling Your AI Company in Asia. The standard M&A diligence framework is covered in our M&A Due Diligence Checklist. For AI-specific safety and security evaluation methods used in technical diligence, see our glossary entry on red-teaming.

ABOUT THE AUTHOR

Daniel Bae

Co-founder & CEO · Amafi

Daniel is an investment banker with 15+ years of experience in M&A, having advised on deals worth over US$30 billion. His career spans Citi, Moelis, Nomura, and ANZ across London, Hong Kong, and Sydney. He holds a combined Commerce/Law degree from the University of New South Wales. Daniel founded Amafi to solve the pain points in M&A, enabling bankers to focus on what matters most — delivering trusted advice to clients.

LinkedIn Email