The Lost Web Lives

April 14th, 2018

Why is the web being lost, and how do we save it?

The easily editable nature of the Web, combined with the massive amounts of information changed each day, makes it nearly impossible to keep track of every single byte of information.

Earlier hypertext systems solved the problems of link decay through systems of updated distributed databases, but these solutions were not adopted by Tim Berners-Lee’s original Web plan and so never took hold.

The Web is the primary primary source for tomorrow’s historians looking to study today’s events. Some may argue that the ephemeral nature of the web renders it too impossible to save everything important. Further arguments involve who gets to decide what is important, and how those decisions are made. This is part of the larger field of archival and historical ethics.

“In terms of sheer durability, the technology for writing reached a peak five thousand years ago and has been going downhill ever since.”
—Abbey Smith Rumsey, When We Are No More

Founded in 1996 by Brewster Kahle, the Internet Archive is the main body working to preserve the Internet, the Web, and digital artifacts in the USA. The Archive utilizes web crawlers to capture static versions of web pages, which are then navigable and searchable through its free, publicly accessible Wayback Machine website.

The Archive is designated as a library by the state of California. In addition to its web capturing services, it has thousands of books scanned and uploaded per year, as well as hundreds of terabytes of public-domain music, video, and information uploaded to its servers by the public.

Other preservation organizations and tools include:

The Library of Congress
Webrecorder.io
International Internet Preservation Consortium