Âé¶¹´«Ã½AV

Skip to content

Shelly Palmer - We used to hoard pre-nuclear steel

Now we’re hoarding pre-AI content.
nutsandboltsunsplash
A project is quietly cataloguing human-created content from before generative AI flooded the web.

Greetings from Cannes, where the best spots were “created by humans” – at least according to the AI that wrote the press releases.

In the news: Before 1945, steel was just steel. Then, nuclear bomb tests contaminated the atmosphere, embedding trace radiation into all newly smelted metal. Today, when uncontaminated steel is needed for radiation-sensitive instruments (like Geiger counters, particle detectors, or space telescopes), they salvage it from ships that sank before the blasts. It's called low-background steel and it is a rare, coveted material.

Now, one man is doing the digital equivalent.

A new project by software engineer John Graham-Cumming, , is quietly cataloguing human-created content from before generative AI flooded the web. Think of it as a time capsule of authentic expression: text, images, and video produced without algorithmic influence. It includes sources such as the August 2022 full dump of Wikipedia, Project Gutenberg’s public domain book collection, the Library of Congress photo archives, GitHub’s Arctic Code Vault (2020), and wordfreq, a linguistic tool frozen in time.

According to Graham-Cumming, the project isn't anti-AI. It’s pro-human. He compares today's AI-content saturation to post-nuclear atmospheric contamination, writing: “The idea is to point to sources of text, images and video that were created prior to the explosion of AI-generated content.”

This isn’t hypothetical. In September 2024, developer Robyn Speer shut down , a widely-used Python library for multilingual word frequency analysis. Her reasoning? The internet had become “full of slop generated by large language models, written by no one to communicate nothing.” AI-generated noise corrupted the tool’s statistical value.

Researchers also worry about "model collapse," a phenomenon that occurs when AI models train on AI-generated output and gradually degrade in quality. But 2024 research by Gerstgrasser et al., published as , suggests that collapse is avoidable if synthetic and real data are properly mixed and curated.

In reality, we don’t know what pre-AI content might be worth in five years, but just as low-background steel now serves a critical role, this material could become essential for future historians, linguists, or even AI developers looking to re-anchor in uncorrupted data. Graham-Cumming has even proposed a “cryptographic ark” to securely timestamp and verify authentic, pre-AI media.

The line between human and machine expression is blurring fast. If you care about authenticity, is worth a look – and maybe a submission.

As always your thoughts and comments are both welcome and encouraged. -s

About Shelly Palmer

Shelly Palmer is the Professor of Advanced Media in Residence at Syracuse University’s S.I. Newhouse School of Public Communications and CEO of The Palmer Group, a consulting practice that helps Fortune 500 companies with technology, media and marketing. Named  he covers tech and business for , is a regular commentator on CNN and writes a popular . He's a , and the creator of the popular, free online course, . Follow  or visit . 

push icon
Be the first to read breaking stories. Enable push notifications on your device. Disable anytime.
No thanks