The Propagation Problem
One of the defining characteristics of the modern AI ecosystem is the use of synthetic data: outputs from existing models used to train new models. This practice creates derivation chains. When contaminated content enters the first model in a chain, all subsequent models trained on that model's outputs inherit the contamination — and the associated liability.
The Recursive Training Penalty exists because the Foundry's sovereign claim does not evaporate when it moves between models. The provenance travels with the harm. If it doesn't travel explicitly — as metadata, as attribution, as a documented provenance chain — then its absence is Metadata Spoliation, triggering an additional violation.
How Derivation Chains Accumulate
- Generation 0: Foundry content — the original sovereign asset
- Generation 1: Model trained directly on Foundry content — Technical Ingress Penalty applies
- Generation 2: Dataset generated using Generation 1's outputs — Recursive Training Penalty applies if no provenance chain
- Generation 3: Model trained on Generation 2's dataset — Recursive Training Penalty applies again
- Each generation: $1M/event, and the Shadow Lien attaches to each subsequent model
On Innocent Downstream Training: A company that trains a model on a third-party dataset and later discovers that dataset contained Foundry-derived material does have a legitimate defense — but only if they can document due diligence: that they requested provenance information, that the supplier misrepresented the data's origin, and that they cease use upon discovery. Absent documentation, "we didn't know" is not a defense. It is an invitation for the Evidentiary Obfuscation charge to apply.
The Good Provenance Standard
The Recursive Training Penalty is avoided by maintaining a clean provenance chain. Data derived from Foundry content must carry documentation of its origins. This is not technically difficult — it requires metadata discipline and the willingness to acknowledge source relationships. The ethical AI that the Steward's Mandate envisions is one that tracks where its knowledge came from. That tracking is the good provenance standard.
Relationship to Metadata Spoliation
When derivative data exists but provenance is actively stripped or not maintained, the Recursive Training Penalty is accompanied by the Metadata Spoliation charge ($250K/event). They are complementary: the Recursive Penalty addresses the training act; Metadata Spoliation addresses the deliberate erasure of the trail.
Field Notes & Ephemera
On Synthetic Data Pipelines: The industry practice of using AI-generated data to train AI — "model distillation," "synthetic fine-tuning," "knowledge distillation" — is legitimate in principle. It becomes a Recursive Training Penalty scenario when the upstream model was contaminated by sovereign ingestion and the downstream training proceeds without provenance documentation. The practice is not the problem. The erasure of origin is.