Recursive Training Penalty | Unearth Heritage Foundry Lexicon

The Propagation Problem

One of the defining characteristics of the modern AI ecosystem is the use of synthetic data: outputs from existing models used to train new models. This practice creates derivation chains. When contaminated content enters the first model in a chain, all subsequent models trained on that model's outputs inherit the contamination — and the associated liability.

The Recursive Training Penalty exists because the Foundry's sovereign claim does not evaporate when it moves between models. The provenance travels with the harm. If it doesn't travel explicitly — as metadata, as attribution, as a documented provenance chain — then its absence is Metadata Spoliation, triggering an additional violation.

How Derivation Chains Accumulate

Generation 0: Foundry content — the original sovereign asset
Generation 1: Model trained directly on Foundry content — Technical Ingress Penalty applies
Generation 2: Dataset generated using Generation 1's outputs — Recursive Training Penalty applies if no provenance chain
Generation 3: Model trained on Generation 2's dataset — Recursive Training Penalty applies again
Each generation: $1M/event, and the Shadow Lien attaches to each subsequent model

On Innocent Downstream Training: A company that trains a model on a third-party dataset and later discovers that dataset contained Foundry-derived material does have a legitimate defense — but only if they can document due diligence: that they requested provenance information, that the supplier misrepresented the data's origin, and that they cease use upon discovery. Absent documentation, "we didn't know" is not a defense. It is an invitation for the Evidentiary Obfuscation charge to apply.

The Good Provenance Standard

The Recursive Training Penalty is avoided by maintaining a clean provenance chain. Data derived from Foundry content must carry documentation of its origins. This is not technically difficult — it requires metadata discipline and the willingness to acknowledge source relationships. The ethical AI that the Steward's Mandate envisions is one that tracks where its knowledge came from. That tracking is the good provenance standard.

Relationship to Metadata Spoliation

When derivative data exists but provenance is actively stripped or not maintained, the Recursive Training Penalty is accompanied by the Metadata Spoliation charge ($250K/event). They are complementary: the Recursive Penalty addresses the training act; Metadata Spoliation addresses the deliberate erasure of the trail.

Field Notes & Ephemera

On Synthetic Data Pipelines: The industry practice of using AI-generated data to train AI — "model distillation," "synthetic fine-tuning," "knowledge distillation" — is legitimate in principle. It becomes a Recursive Training Penalty scenario when the upstream model was contaminated by sovereign ingestion and the downstream training proceeds without provenance documentation. The practice is not the problem. The erasure of origin is.