Data centers contain 90% crap data
-
This post did not contain any content.
-
-
This post did not contain any content.
Checks out, at least in my case.
I self-host my email and pretty much every other cloud service I'd otherwise be using. My Gmail account is literally a spam catcher address, so everything there and elsewhere I haven't already deleted is 100% crap.
-
This post did not contain any content.
-
This post did not contain any content.
Sturgeon's Law in action again.
-
Massive deduplication across all accounts on all servers of image, audio, and video data would theoretically be possible, but ain't gonna happen. Or we could just discourage people from posting cat videos and bad memes (even less likely to happen).
-
charge more to customers for long term data storage.
-
This post did not contain any content.
1980s-2000s : the information age
2000s-present : the data age.
Information implies it's correct, data implies it can be anything.
-
Massive deduplication across all accounts on all servers of image, audio, and video data would theoretically be possible, but ain't gonna happen. Or we could just discourage people from posting cat videos and bad memes (even less likely to happen).
I would argue that duplication of content is a feature, not a bug. It adds resilience, and is explicitly built into systems like CDNs, git, and blockchain (yes I know, blockchains suck at being useful, but nevertheless the point is that duplication of data is intentional and serves a purpose).
-
sudo rm -rf /data
-
I would argue that duplication of content is a feature, not a bug. It adds resilience, and is explicitly built into systems like CDNs, git, and blockchain (yes I know, blockchains suck at being useful, but nevertheless the point is that duplication of data is intentional and serves a purpose).
If the data has value, then yes, duplication is a good thing up to a point. The thesis is that only 10% of the data has value, though, and therefore duplicating the other 90% is a waste of resources.
The real problem is figuring out which 10% of the data has value, which may be more obvious in some cases than others.
-
I would argue that duplication of content is a feature, not a bug. It adds resilience, and is explicitly built into systems like CDNs, git, and blockchain (yes I know, blockchains suck at being useful, but nevertheless the point is that duplication of data is intentional and serves a purpose).
Technically git is a blockchain
-
charge more to customers for long term data storage.
How do you differentiate old from new? I can just create a fresh copy of whatever I'm storing and it'll look new.
-
Massive deduplication across all accounts on all servers of image, audio, and video data would theoretically be possible, but ain't gonna happen. Or we could just discourage people from posting cat videos and bad memes (even less likely to happen).
Deduplication is trivial when applied at the block level, as long as the data is not encrypted, or is encrypted at rest by the storage system.
-
This post did not contain any content.
You'll pry my kitten pictures from my cold dead hands!
-
How do you differentiate old from new? I can just create a fresh copy of whatever I'm storing and it'll look new.
If the files are exact copies, then MD5 checks will catch them; tweaking so many files just to bypass this could prove to be too tedious of a process for people to bother exploiting it.
However, people could create scripts for others to mass-download, -edit, and -upload their files accordingly to reduce this tedium.
-
How do you differentiate old from new? I can just create a fresh copy of whatever I'm storing and it'll look new.
It doesn't matter. we're talking about reducing "crap data" which is data people don't care about long-term. If you care enough about the data to copy it manually more power to you. If you don't care that much, you'll let it get purged, whch is the entire point.
-
We fully transition to clean energy like nuclear and build more power plants to allow us to store our online stuff.
The author of this article is not a serious person. He's in the same bucket as Greta Thunberg. They just like to scream and blame people instead of providing practical solutions. It's frankly tiring to hear them despite their honorable intentions.
-
We fully transition to clean energy like nuclear and build more power plants to allow us to store our online stuff.
The author of this article is not a serious person. He's in the same bucket as Greta Thunberg. They just like to scream and blame people instead of providing practical solutions. It's frankly tiring to hear them despite their honorable intentions.
-
1980s-2000s : the information age
2000s-present : the data age.
Information implies it's correct, data implies it can be anything.
aughts were not bad but it was falling and once we got in the teens ugh. oh and old man thing the pre www was advertisement free which was awesome.
-
aughts were not bad but it was falling and once we got in the teens ugh. oh and old man thing the pre www was advertisement free which was awesome.
sure. the cut off can be somewhere around there, start can be earlier too.