The Prompt
Twenty-eight years after creating a homepage on AOL, I wanted it back. Not as nostalgia, exactly—though there's that—but as an archaeobyte, a specimen for study. What does a 1997 personal website look like when excavated from the digital strata? What survives the journey through time and platform death?
The site lived at members.aol.com/ajjvelasco from roughly 1997 to 1999. It was a typical
teenager's homepage: poetry (mine and others'), web rings, interactive JavaScript games, awards won from
other sites, and that particular aesthetic of early web design—background colors, animated GIFs, counter
badges, and the earnest belief that the web was a place for community.
AOL deleted the members.aol.com subdomain years ago. But Archive.org had snapshots. The question was: could I reconstruct it?
And could AI help me do the reconstruction?
Part I: Initial Contact
The first conversation went something like this:
I need to download my old AOL homepage from archive.org. It was at members.aol.com/ajjvelasco. Can you help me write a script to get all the pages?
I can help with that. We'll need to use the Wayback Machine's API to find snapshots and download them. Let me write a PowerShell script that will:
- Query the Wayback Machine CDX API for all captured URLs
- Filter for the most recent/best snapshots of each page
- Download each page while preserving the directory structure
- Handle rate limiting to be respectful to Archive.org's servers
What date range would work best? Sometimes earlier snapshots are more complete.
The site was active from about 1997-1999. Let's try to get the most complete version we can find.
Here's the script. It will create a catalog of what it finds first, then download everything systematically...
The First Script
The initial script was a PowerShell rescue operation. It queried Archive.org's CDX API, which returns a list of all captured URLs for a given domain. The challenge: archive.org had captured the site multiple times over multiple years, and we needed to pick the "best" snapshot of each page.
# Query Wayback CDX API for all snapshots
$cdxUrl = "http://web.archive.org/cdx/search/cdx?url=members.aol.com/ajjvelasco/*&output=json"
$snapshots = Invoke-RestMethod -Uri $cdxUrl
# For each unique URL, find the best snapshot
# Download from web.archive.org/web/[timestamp]/[original-url]
But this raised immediate questions:
- Which snapshot is "best"? The most recent? The most complete? The one with the fewest broken images?
- How do we handle images vs HTML? Archive.org wraps some content differently.
- What about case sensitivity? AOL usernames could be mixed case, but Archive.org might have normalized them.
Watching the Download
As the script ran, we watched the downloads in real-time. Files appeared with names like:
members.aol.com_80_AJJVelasco_index.htmlmembers.aol.com_80_ajjvelasco_park_french.htmlmembers.aol.com_AJJVelasco_graphics_info_b_h_1.gif.html
The prefixes were inconsistent. Some files used AJJVelasco, others ajjvelasco.
Some images had .gif.html extensions—Archive.org's wrapper around binary content.
This wasn't just annoying. This was archaeologically significant. The Archive.org capture process itself left traces—artifacts of the archival layer, not the original site.
Part II: The Restoration
Problem 1: The Filename Chaos
The downloaded files needed to be transformed back into a coherent site structure. The AI and I collaborated on a Python script to:
- Strip the Archive.org prefixes:
members.aol.com_80_AJJVelasco_→ nothing - Detect directory structures:
park_french.html→park/french.html - Fix double extensions:
info_b_h_1.gif.html→info_b_h_1.gif - Handle duplicates intelligently: If two files had the same content (checked by size), keep one
PREFIXES = [
"members.aol.com_80_AJJVelasco_",
"members.aol.com_80_ajjvelasco_",
"members.aol.com_AJJVelasco_"
]
KNOWN_DIRS = [
"graphics", "park", "pets", "mypoems",
"peoplespoems", "entries", "rings", "awards"
]
def restore_files():
for filename in files:
# Strip prefix
clean_name = strip_prefix(filename)
# Detect directory
target_dir = detect_directory(clean_name)
# Fix extensions
final_name = fix_extensions(clean_name)
# Move to proper location
move_file(filename, target_dir, final_name)
Running this script, the chaos resolved into structure:
Problem 2: The Broken Links
Every HTML page still pointed to http://members.aol.com/ajjvelasco/...—absolute URLs that no
longer existed. The AI helped create a link-fixing script that:
- Scanned every HTML file for references to the old AOL domain
- Calculated the correct relative path from the current file to the target
- Replaced absolute links with relative ones
TARGET_URL_REGEX = re.compile(
r'http://members\.aol\.com/(AJJVelasco|ajjvelasco)/?([^"\'>\s]*)',
re.IGNORECASE
)
def fix_links(content, current_file_path):
def replacement(match):
target_path = match.group(2) or "index.html"
relative_path = calculate_relative(current_file_path, target_path)
return relative_path
return TARGET_URL_REGEX.sub(replacement, content)
This worked beautifully for internal links. But external links—to GeoCities sites, to web ring hubs, to other teenagers' homepages—those stayed broken. They pointed into the void.
Problem 3: The Scaling
When we opened the recovered site in a modern browser, it looked tiny. 1997 web design assumed 800x600 monitors. Modern screens are 1920x1080 or higher. The site was a postage stamp in the corner.
The AI suggested injecting responsive scaling code into every page:
<script>
function vintageScale() {
var baseWidth = 800; // 1997 standard width
var scale = window.innerWidth / baseWidth;
document.body.style.zoom = scale;
document.body.style.transform = "scale(" + scale + ")";
document.body.style.transformOrigin = "top left";
document.body.style.width = baseWidth + "px";
}
window.onload = vintageScale;
window.onresize = vintageScale;
</script>
We wrote another script to inject this into every HTML file. The site now scales to fill modern screens while preserving its 1997 layout integrity.
Part III: What We Recovered
After the excavation and restoration, here's what we had:
HTML Pages (Complete)
- Main navigation: index.html, home.html, index2.html
- Poetry section: mypoems/ (7 poems), peoplespoems/ (10 submissions)
- Interactive park: park/ (11 JavaScript games/tools)
- Awards section: awards/ (7 pages of web ring awards)
- Web rings: rings/ (7 web ring hub pages)
- Journal entries: entries/ (24 dated entries)
- Info pages: me.html, library.html, archives.html, etc.
Images (Partial)
- Navigation graphics: 12 button/header images
- Decorative elements: Animated GIFs (butterflies, birds, cats)
- Web ring logos: Various webrings badges and graphics
- Counter badges: Site meter images (non-functional)
Interactive Elements (Degraded)
- JavaScript games: Code intact but some reference broken resources
- Forms: HTML structure present, but POST endpoints dead
- Chat links: Point to defunct AOL chat servers
The site is substantially recovered. You can read the poetry. You can browse the journal entries. You can see the web ring affiliations. The JavaScript games mostly work.
But it's not complete.
Part IV: What's Missing (And Why)
This is where archaeobytology becomes essential. Understanding what's missing isn't just about cataloging gaps—it's about understanding the decay patterns of digital artifacts.
1. External Dependencies (Ecosystem Extinction)
What's missing: Counter badges, banner ads, web ring functionality
The site is filled with HTML like this:
<img src="http://www.sitemeter.com/meter.asp?site=jeffcity" border=0>
<a href="http://www.linkexchange.com/">LinkExchange Member</a>
<img src="http://www.smartclicks.com:81/01/B016020/smartimg">
Every one of these external services is dead. Sitemeter.com is defunct. LinkExchange was acquired by Microsoft in 1998 and shut down. SmartClicks disappeared entirely. The HTML structure survived, but the resources it depends on are extinct.
Archaeobytological significance: This demonstrates dependency risk and ecosystem extinction. Personal websites in the 1990s existed within a web ecosystem of counters, rings, banners, and guestbooks. When that ecosystem collapsed, millions of sites became partial ghosts.
2. Server-Side Processing (Platform Death)
What's missing: Form submissions, guestbook entries, poem submissions
The site had forms like this:
<form action="http://members.aol.com/ajjvelasco/submit.cgi" method="POST">
<input type="text" name="poem_title">
<textarea name="poem_content"></textarea>
<input type="submit" value="Submit Your Poem">
</form>
The form HTML survived. But submit.cgi is gone. Archive.org doesn't archive server-side
scripts—they're executed at runtime, not stored as static files. We have the interface but not
the functionality.
More critically: we don't have the data. Whatever poems people submitted via that form? Lost. Guestbook entries? Gone. The site's interactive layer—the thing that made it a community space rather than a static document—that's almost entirely absent.
Archaeobytological significance: This is platform death combined with the physicalist's fallacy. We assume that if Archive.org "saved" a site, we have the site. But we have the facade. The dynamic, server-side, database-backed aspects are largely unrecoverable unless the site owner had backups.
3. Third-Party Content (Link Rot)
What's missing: Links to friends' sites, web ring navigation, banner exchanges
The site contains hundreds of links to other members.aol.com pages, GeoCities sites, personal domains. Roughly 95% of them are dead. This isn't the fault of Archive.org—they couldn't archive the entire web. But it means the site exists in isolation, severed from the network it was part of.
Web rings are particularly poignant. The site proudly displayed membership in several rings:
<a href="http://www.webring.org/cgi-bin/webring?ring=100_cool&id=45&prev">
<- Previous Site
</a>
Those ring IDs are meaningless now. Webring.org exists, but the old ring databases don't. The site's position in a ring of 100 cool homepages is a phantom limb.
Archaeobytological significance: This demonstrates context collapse. The site as an isolated artifact is recoverable. The site as a node in a social network is not. Archaeobytology asks: what are we actually preserving when we save one node of a dead network?
4. Temporal Elements (Snapshot Limits)
What's missing: The site's evolution over time
Archive.org captured this site sporadically—a snapshot in 1997, another in 1998, a few in 1999. We reconstructed from the "best available" snapshots. But we lost the development history. Did I add the poetry section first or the game park? When did I join each web ring? How did the site change as I learned more HTML?
We have a version of the site, not the story of the site.
Archaeobytological significance: This is the limitation of stratigraphic analysis when your preservation mechanism is discontinuous. Physical archaeology has continuous strata. Digital archaeology often has snapshots with gaps.
5. The Unarchived (Selection Bias)
What's missing: Possibly entire sections we don't know about
Did the site have a pets/ subdirectory with photos of my childhood dog? The catalog suggests it might
have—there's an empty pets/ folder. But Archive.org never captured it, or captured it so
incompletely that nothing useful remains.
We don't know what we don't know.
Archaeobytological significance: This is the custodial filter at work. Archive.org's crawler made choices about what to prioritize, what to follow, what depth to crawl. Those choices determine what survives. We're not recovering "the site"—we're recovering what Archive.org chose to save.
Part V: Archaeobytological Analysis
What This Dig Teaches Us
1. Preservation is always partial. Even with the best archival efforts, we recover fragments. The question is whether the fragments are meaningful—whether they preserve enough context to be interpreted.
2. The archive is the artifact. We didn't recover my 1997 homepage. We recovered Archive.org's 1997-1999 captures of my homepage, as filtered through their crawler and storage systems. The archival layer is inseparable from the artifact.
3. Websites are ecosystems, not documents. A 1990s homepage depended on counters, webrings, banner exchanges, guestbooks, form processors. When the ecosystem dies, the sites become husks. They look right but don't work right.
4. AI is a valuable excavation partner. The scripting, the pattern recognition, the problem-solving—the AI accelerated work that would have taken days into hours. But the interpretation—understanding what's missing and why—that requires human judgment informed by archaeobytological theory.
5. Loss patterns are not random. What survives and what disappears follows predictable patterns:
- Static HTML: High survival rate
- Images on the same server: Medium survival rate
- External dependencies: Low survival rate
- Server-side code: Very low survival rate
- Database content: Essentially zero survival rate without explicit backups
The Role of AI in Digital Archaeology
This excavation demonstrated several ways AI can assist archaeobytological work:
- Script generation: Rapidly prototyping data recovery tools
- Pattern recognition: Identifying filename patterns, link structures, directory conventions
- Iterative refinement: Debugging scripts in real-time as issues emerged
- Documentation: Helping articulate the process for future reference
But AI couldn't replace the archaeologist's judgment about:
- Which snapshot was "best" when multiple captures existed
- Whether to "restore" broken elements or leave them broken as historical evidence
- How to interpret absences—are they archival gaps or things that never existed?
- The cultural and technical context needed to understand what the site meant in 1997
The AI was a tireless research assistant. But the research questions came from human understanding of digital cultural history.
Lessons from the Dig
This excavation recovered approximately 75% of the original site by file count, but perhaps only 40% by functionality. The static content—pages, poetry, journal entries—survives well. The dynamic, networked, interactive elements are largely ghosts.
This is the pattern we see across all archaeobytological digs into 1990s web culture. We can recover the documents. But the web as a social space—the rings, the guestbooks, the webs of linked personal sites—that's much harder to reconstruct.
If there's a lesson here, it's this: preservation must happen at the ecosystem level, not just the artifact level. Saving individual sites isn't enough. We need to save the infrastructure, the services, the relationships between sites. Otherwise, we're building a museum of disconnected fossils, specimens divorced from the environment that gave them meaning.
Appendix: The Recovered Site
The excavated site is preserved in its restored form at
c:\Users\Admin\Documents\rhizome\rescued-homepage\. It stands as both a recovered artifact
and a demonstration of what digital archaeology can—and cannot—achieve.
View the site to see:
- What teenage poetry looked like in 1997
- How JavaScript games were built in the browser's early days
- What "community" meant in the pre-social-media web
- How web rings organized the early internet
- What's lost when platforms die and ecosystems collapse
This case study is part of the Archaeobytology project.
For questions about digital archaeology methodology: contact