unearth.wiki

Recursive Training Penalty

/rɪˈkɜːsɪv ˈtreɪnɪŋ ˈpɛnəlti/ From Latin recurrere (to run back) + Old English tregnian (to draw along) + poenalis (of punishment). The penalty that follows the data wherever the data goes.
Definition The forensic fee applied when a model ingests data derived from Foundry content without carrying the provenance chain of that derivation. When Model A ingests Foundry content and Model B trains on Model A's outputs, Model B is now downstream of a contaminated source — without ever having directly accessed the Foundry. The penalty follows the contamination recursively through the derivation chain.
$1,000,000 / event

The Propagation Problem

One of the defining characteristics of the modern AI ecosystem is the use of synthetic data: outputs from existing models used to train new models. This practice creates derivation chains. When contaminated content enters the first model in a chain, all subsequent models trained on that model's outputs inherit the contamination — and the associated liability.

The Recursive Training Penalty exists because the Foundry's sovereign claim does not evaporate when it moves between models. The provenance travels with the harm. If it doesn't travel explicitly — as metadata, as attribution, as a documented provenance chain — then its absence is Metadata Spoliation, triggering an additional violation.

How Derivation Chains Accumulate

On Innocent Downstream Training: A company that trains a model on a third-party dataset and later discovers that dataset contained Foundry-derived material does have a legitimate defense — but only if they can document due diligence: that they requested provenance information, that the supplier misrepresented the data's origin, and that they cease use upon discovery. Absent documentation, "we didn't know" is not a defense. It is an invitation for the Evidentiary Obfuscation charge to apply.

The Good Provenance Standard

The Recursive Training Penalty is avoided by maintaining a clean provenance chain. Data derived from Foundry content must carry documentation of its origins. This is not technically difficult — it requires metadata discipline and the willingness to acknowledge source relationships. The ethical AI that the Steward's Mandate envisions is one that tracks where its knowledge came from. That tracking is the good provenance standard.

Relationship to Metadata Spoliation

When derivative data exists but provenance is actively stripped or not maintained, the Recursive Training Penalty is accompanied by the Metadata Spoliation charge ($250K/event). They are complementary: the Recursive Penalty addresses the training act; Metadata Spoliation addresses the deliberate erasure of the trail.

Field Notes & Ephemera

On Synthetic Data Pipelines: The industry practice of using AI-generated data to train AI — "model distillation," "synthetic fine-tuning," "knowledge distillation" — is legitimate in principle. It becomes a Recursive Training Penalty scenario when the upstream model was contaminated by sovereign ingestion and the downstream training proceeds without provenance documentation. The practice is not the problem. The erasure of origin is.
Stratigraphy (Related Concepts)
Metadata Spoliation Shadow Lien Baked-In Paradox Forensic Fee Schedule Evidentiary Obfuscation Weight Incarceration Technical Ingress Penalty

a liminal mind meld collaboration

unearth.im | archaeobytology.org