Internet Archive played crucial role in tracking shady CDC data removals

[email protected]

This comment from 8 months ago says 152PB: https://www.reddit.com/r/DataHoarder/comments/1cu79ke/the_archiveteam_has_a_cost_shameboard_of_the_top/l4om4m6/

[email protected]

These guys seem cool but they're not the archive.org from the op article

[email protected]

As I understand it, their data does in fact enter into the Wayback Machine. They are just also available in the direct WARC archive files(which IMO sounds beneficial to the idea of exporting in bulk to another backup host). At least that’s how their FAQ reads.

And given that they focus on web crawling, and not other arbitrary data formats that IA accepts, 2.8% of over 100 petabytes is still a respectable amount of data.

That said, help is help. If another archival project team wants me to run a worker node so they can distribute load and dodge crawler blocks, let me know, I’ve got space.

[email protected]

It's a team of volunteers who help scrape and upload things to archive.org.

[email protected]

It does go into the WaybackMachine AFAIK.

[email protected]

Need an archive of the archive

[email protected]

It doesn't help that people put silly things onto the IA. I've seen some things like YouTube videos that really didn't need to be there (they have, objectively, nothing of value enough to warrant taking up space on these servers that could be used for more important materials..).

[email protected]

I literally posted a comment saying "sure is odd that this is happening right before the election. Not saying it means anything, but maybe it's not a coincidence?" and got downvoted to hell lmao.

[email protected]

If they added download options on different taxonomies, I'd try to grab some things to archive.

[email protected]

Yeah, stuff that is able to be taught is vital to have archives, but some Twitch streamer playing some MMO/shooter/scary game isn't what I would consider very imperative to get backed up.

[email protected]

As I understand it, their data does in fact enter into the Wayback Machine

Thanks for the info! It never used to, so I guess that changed at some point.

agnos.is Forums

Internet Archive played crucial role in tracking shady CDC data removals