What types of contracts does AMLEGALS handle?

AMLEGALS provides comprehensive contract advisory across 43+ contract types including Technology contracts (SaaS, AI/ML, Cloud Services), Data Privacy agreements (DPAs, DPDPA compliance), Commercial contracts (MSAs, Supply, Distribution), IP & Licensing, Employment agreements, and International Trade contracts.

What is the TCL Framework for contracts?

The TCL Framework is AMLEGALS proprietary approach that examines every contract through three lenses: Technical (understanding the operational realities), Commercial (aligning business objectives), and Legal (ensuring compliance and risk mitigation). This integrated approach ensures contracts that work in practice, not just on paper.

Does AMLEGALS handle DPDPA-compliant contracts?

Yes, AMLEGALS specializes in DPDPA-compliant contracts including Data Processing Agreements, Cross-Border Transfer Mechanisms, Data Sharing Agreements, and Consent Management frameworks. Our Vibe Data Privacy™ framework ensures comprehensive compliance with the Digital Personal Data Protection Act, 2023.

Can AMLEGALS help with AI and technology contracts?

Yes, AMLEGALS has dedicated expertise in AI/ML Contracts, AI Governance agreements, AI Training Data Licensing, SaaS Agreements, Cloud Services agreements, and System Integration contracts. We address emerging issues like algorithmic accountability, model ownership, and AI liability allocation.

Does existing software licensing permit AI training?

Most existing software licenses do not explicitly address AI training, and standard terms often exclude it. Licenses typically grant rights to "use" software in its intended manner—training an AI on software outputs may not fall within this grant. Specific analysis of license terms is required, and explicit AI training permissions should be sought for new licenses. Some open-source licenses have been interpreted to permit training, but this remains contested.

Who owns AI-generated outputs under Indian law?

Indian copyright law requires human authorship. Works generated entirely by AI without human creative contribution may not be protectable by copyright. However, most AI outputs involve human input at various stages—training data curation, prompt engineering, output selection and editing. The degree of human creative control determines whether copyright subsists and who owns it. Contracts should allocate ownership clearly rather than relying on uncertain legal doctrines.

What are model residual rights?

Model residual rights address what happens to knowledge embedded in AI models when the underlying data license expires. Data used to train a model leaves traces in the model's parameters even after the data is deleted. Does the data licensor have continuing rights in the trained model? Must the model be deleted or "unlearned"? These questions have no clear legal answers, making contractual clarity essential.

Can DPDPA-protected personal data be used for AI training?

Using personal data for AI training requires a lawful basis under DPDPA—typically consent, contract necessity, or legitimate interest (if recognized in final rules). Purpose limitation principles apply—data collected for one purpose may not be repurposed for AI training without additional consent. Data subject rights including access and correction apply to training data. Anonymized or synthetic data may avoid these restrictions, but true anonymization of training data is technically challenging.

Software Licensing for AI Training | AI Training Data | IP in AI-Generated Work

Overview

The training of AI models has created a new category of data licensing with distinct considerations from traditional data use agreements. When data is used to train a machine learning model, the data's influence persists in the model's parameters even after the original data is no longer directly accessible. This raises questions that traditional copyright and data licensing frameworks were not designed to address: Is training a "copy"? Does the licensor retain any rights in outputs generated by models trained on their data? What happens to trained models when a data license expires?

Software licensing for AI training has become a contentious area. Traditional software licenses contemplate human use of software outputs. The use of software, databases, or online content as training data for AI systems sits awkwardly within existing license grants. Whether existing licenses permit such use, and what additional permissions are required, has become a significant commercial and legal question for AI developers and content owners alike.

Intellectual property in AI-generated work presents perhaps the most fundamental challenge. Indian copyright law requires human authorship—a work created entirely by AI may not be protectable. But most AI outputs involve significant human input at various stages: in training data curation, in prompt engineering, in selection and editing of outputs. Determining who owns what in this human-AI collaboration requires careful contractual allocation.

Key Considerations

Training Use Rights

Explicitly addressing whether data may be used for machine learning training, the scope of permitted training, and any restrictions on the types of models that may be trained.

Model Residual Rights

Defining what rights (if any) the data licensor retains in models trained on their data, and what obligations apply to such models after license expiration.

Output Ownership Allocation

Establishing clear ownership and usage rights for AI-generated outputs, addressing the human-AI authorship questions that current law leaves unclear.

Synthetic Data Rights

Addressing whether and how synthetic data generated from licensed data may be created, used, and shared, including derivative work considerations.

Attribution and Provenance

Establishing whether and how the origin of training data must be disclosed or attributed in AI systems or their outputs.

Compliance with Data Rights

Ensuring that AI training complies with underlying data rights, including personal data protections, database rights, and copyright in compiled datasets.

Applying the TCL Framework

Technical

Understanding the training process and how data influences model behavior
Assessing data provenance and underlying rights in training datasets
Evaluating whether models can be "unlearned" if required to address data rights issues
Understanding the relationship between training data and model outputs
Reviewing data quality, bias, and representativeness requirements

Commercial

Pricing models for training data—per-use, per-model, revenue share
Valuing the contribution of training data to AI system commercial value
Allocating risk of IP challenges to training data use
Structuring ongoing royalties or use fees for trained models
Addressing competitive restrictions on training data use

Legal

Drafting explicit training use grants within license scope
Addressing the copyright status of AI outputs under Indian law
Structuring ownership allocation for human-AI collaborative works
Including representations about training data provenance and rights
Addressing moral rights and attribution in AI contexts

“

"The most valuable asset in AI is often the training data. Yet most data agreements predate AI and don't contemplate training use. The gap between what existing licenses permit and what AI development requires creates both risk and opportunity—risk for those who assume permissions that don't exist, opportunity for those who structure new agreements that capture this value."

Anandaday Misshra

Founder & Managing Partner

Common Pitfalls

Assumed Training Rights

Assuming that standard data or software licenses permit AI training use when many do not contemplate this use case.

Ignoring Model Persistence

Treating data licenses as expiring cleanly without addressing the persistence of data influence in trained models.

Unclear Output Rights

Failing to address ownership of AI-generated outputs, leaving critical IP questions to uncertain legal doctrines.

Data Provenance Gaps

Not verifying the underlying rights in training data, creating liability exposure when data has been improperly sourced.

Oversimplified Ownership

Assigning all AI output IP to one party without considering the various contributions and their legal implications.

IP and Data Framework

Indian copyright law requires human authorship—works created without human creative input may not be protectable. The Copyright Act's provisions on computer-generated works address works created "by or under the circumstances" of computers, but AI-generated content tests these boundaries. Database rights in India are less developed than in the EU, affecting protection for compiled datasets. DPDPA imposes restrictions on using personal data for AI training—consent requirements, purpose limitations, and data subject rights apply. The intersection of copyright, database rights, contract, and data protection law creates a complex framework requiring careful contractual navigation.

Practical Guidance

Explicitly address AI training in all data and content licenses—both as licensor and licensee.
Document training data provenance and maintain audit trails of data rights.
Include clear ownership allocation for AI outputs, specifying the basis of each party's rights.
Address model residual issues—what happens to trained models when data licenses end?
Consider whether exclusive training rights or competitive restrictions are appropriate.
Include representations about compliance with underlying data rights, including personal data.

Frequently Asked Questions

Related Practice Areas

Intellectual Property Technology & IT Data Privacy

Need Assistance with AI Data Licensing?

Our team brings deep expertise in ai and technology matters.

Contact Our Team