Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One day, the wayback machine will start returning 404s too


I think that the Wayback machine supports either an HTTP status code or meta tag that causes it to not serve previously-cached contents.

Therefore I try to save webpages that I care about, but it's getting harder and harder. Not to mention the space it takes - is it really worth the hundreds of GiB of personal archives when I'll likely want - not need - maybe a few KiBs of it decades down the line. And even then, will I be able to find it?

Sometimes I think about just archiving a screenshot and the text of websites, instead of markup and related files.


That's... a bit sobering. At that point, we will really have lost a lot of history.


It's possible some other group will buy the data, or they will make it easily download able


Actually, archive.org is already easily downloadable, in that you can already help host the dweb copy.

archive.org, available via webtorrent: https://dweb.archive.org/

It's pretty slow, though.


Like the way Google did with the Usenet archive. Make it available via a searchable interface, then gradually degrade the archive until old posts can't be found any more.


They also degraded the interface so that the posts that once would be found via normal searching now aren't anymore. After the removal of the discussion filter, searches for X started returning more sites selling X than forums talking about X.

https://www.reddit.com/r/google/comments/2b54ux/google_compl...

There was a thread on Google forums about that, and I recall many upset users asking for the option to be reactivated, but (the irony) that discussion was removed as well.


One day there will be no more 404s or 200s.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: