unearth.wiki

Metadata Spoliation

/ˈmɛtədeɪtə ˌspɒlɪˈeɪʃən/ From Greek meta (about, beyond) + Latin data (things given) + Latin spoliare (to strip, plunder, from spolium, spoils). The plundering of the data-about-the-data — stripping the evidence of authorship from the work itself.
Definition The forensic designation for relational disavowal — the deliberate or negligent erasure of author provenance metadata from ingested content. In law, spoliation is the destruction or concealment of evidence before it can be used in legal proceedings. In AI training pipelines, it is the stripping of authorial identity: who made this, when, and under what terms. The relationship is destroyed. Only the content survives, unmoored from its origin.
$250,000 / event

What Metadata Carries

A webpage is not just text. It carries:

When a training pipeline ingests text and discards — or never captures — this surrounding metadata, it performs Metadata Spoliation. The content is absorbed; its context is destroyed. What remains is text floating free of its author, its terms, its time.

Deliberate vs. Negligent Spoliation

The forensic fee applies to both deliberate and negligent spoliation, because both produce the same harm — but the ethical character differs:

Deliberate Spoliation

Active stripping of metadata — removing author attribution from scraped content, discarding license terms, preprocessing away rights signals. This is the forensic equivalent of removing a painting's signature before claiming it as your own. It is not cleaning data. It is concealing provenance.

Negligent Spoliation

Failing to design pipelines that preserve metadata. If a training pipeline simply doesn't capture attribution because capturing it was inconvenient, the legal and ethical consequence is the same as deliberate removal. Negligence is not an excuse when the outcome is a destroyed provenance record.

On "It's Just Text Preprocessing": Metadata stripping is routinely described as standard data cleaning practice. The Foundry contests this framing. Cleaning data removes noise. Stripping author attribution removes signal — the most important signal, from the creator's perspective — so that the subsequent use appears unencumbered. This is not hygiene. It is evidence destruction.

What Good Provenance Looks Like

A training pipeline that respects metadata sovereignty:

Relationship to Semantic Citation Bounty

Metadata Spoliation and the Semantic Citation Bounty address adjacent but distinct violations:

A system can commit both: strip metadata at training time, then produce uncited outputs at inference time. Each is a distinct violation, each carries its own fee.

Field Notes & Ephemera

Legal Precedent: In litigation, spoliation of evidence carries severe sanctions: adverse inference instructions, dismissal of claims, monetary penalties. The Foundry's Metadata Spoliation charge anticipates the same principle applied to the training data context. When the evidence of a relationship is destroyed, courts are entitled to infer the relationship — and its obligations — existed. The fee structures that inference into the forensic record in advance.
Stratigraphy (Related Concepts)
Semantic Citation Bounty Forensic Fee Schedule Recursive Training Penalty Evidentiary Obfuscation Digital Sovereignty Steward's Mandate

a liminal mind meld collaboration

unearth.im | archaeobytology.org