DeepSeek collects keystroke data and more, storing it in Chinese servers
-
[email protected]replied to [email protected] last edited by
I swear people do not understand how the internet works.
Anything you use on a remote server is going to be seen to some degree. They may or may not keep track of you, but you can't be surprised if they are. If you run the model locally, there is no indication it is sending anything anywhere. It runs using the same open source LLM tools that run all the other models you can run locally.
This is very much like someone doing surprised pikachu when they find out that facebook saves all the photos they upload to facebook or that gmail can read your email.
-
[email protected]replied to [email protected] last edited by
As a queer woman in the US, I currently care infinitely more what the US gov and companies track about me than what China does.
-
[email protected]replied to [email protected] last edited by
Our data's just too valuable for these parasites. Data privacy laws may eventually pass to compel software companies to store everything in US servers only.
-
[email protected]replied to [email protected] last edited by
Exactly. I'm queer. I'm not scared of China, even if they were doing the same thing the US currently is. Because only one of those actually effects the rights I have and what I do in my day-to-day.
I do not understand how the average person does not realize that.
-
[email protected]replied to [email protected] last edited by
Not in the way you think. They aren't constantly training when interacting, that would be way more inefficient than what US AI companies have been doing.
It might be added to the training data, but a lot of training data now is apparently synthetic and generated by other models because while you might get garbage, it gives more control over the type of data and shape it takes, which makes it more efficient to train for specific domains.
-
[email protected]replied to [email protected] last edited by
It doesn't. They run using stuff like Ollama or other LLM tools, all of the hobbyist ones are open source. All the model is is the inputs, node weights and connections, and outputs.
LLMs, or neural nets at large, are kind of a "black box" but there's no actual code that gets executed from the model when you run them, it's just processed by the host software based on the rules for how these work. The "black box" part is mostly because these are so complex we don't actually know exactly what it is doing or how it output answers, only that it works to a degree. It's a digital representation of analog brains.
People have also been doing a ton of hacking at it, retraining, and other modifications that would show anything like that if it could exist.
-
[email protected]replied to [email protected] last edited by
What would you have preferred? "Most apps sell your data, news at 11"? Would anyone care if it was written like that?
-
[email protected]replied to [email protected] last edited by
It's a chinese company, where else would they store the data?
-
[email protected]replied to [email protected] last edited by
Sure, but its open source and doesn't upload it anywhere. Also doesn't have internet permission
-
[email protected]replied to [email protected] last edited by
and that's what superpowers do, but living in a third world country i'm yet to see the chinese putsch us as the u.s. did during the cold war and beyond, with all due consequences. sorry about my lack of goodwill towards the department of state.
-
[email protected]replied to [email protected] last edited by
Oh yeah, the whole article could be reductively summed up as
"DeepSeek and all the other LLM services are almost as bad as each other, but we think deepseek is worse....because the Chinese government are known for doing bad things".
The title is factual, if a little clickbaity.
Obviously keystrokes you submit to a website are submitted to the website.
This though, it's not technically accurate, a lot of forms and input are done client side and then the resulting information is parceled up and sent to the server.
The actual keystroke data isn't normally sent.
Though this article doesn't go in to what kind of keystroke data is sent, if it was something more than just which keys in which order then that's perhaps an indicator that it's actively being collected for a reason, rather than just incidentally.
If you want to get really paranoid about such things it's known that you can you can do interesting things with actual keystroke data.
Also, afaict none of the the non-chinese services have specified that they don't do this.
-
[email protected]replied to [email protected] last edited by
I shouldn't have anything to hide, but I'm part of a group the current fascist leadership in government want's to eradicate, so hide I shall.
That said, I also feel like people acting like the remote server they are connected to is tracking what you do on it as some kind of surprise is so stupid. "Facebook is keeping track of the pictures I uploaded to it!!!!" There's a lot of stuff to complain about Facebook, google, or whoever, but them tracking stuff you send to them willingly isn't one of them.
-
[email protected]replied to [email protected] last edited by
Isn't it open source? If so it should be near trivial to get rid of all of that.
If it's closed source I wouldn't touch it with a tej foot pole, it's the same reason I rarely use chat gpt, it's just freely giving away your personal data to open AI.
-
[email protected]replied to [email protected] last edited by
The runner is open source, and that's what matter in this discussion. If you host the model on your own servers, you can ensure that no corporation (american or Chinese) has access to your data. Access to the training code and data is irrelevant here.
-
[email protected]replied to [email protected] last edited by
HuggingChat is open source and lets you use DeepSeek.
Very misleading, it lets you use the lighter, watered-down version (Deepseek 32B) compared to the large impressive model they have (Deepseek 671B)
-
[email protected]replied to [email protected] last edited by
I'm confused. Isn't "collecting keystroke data" just an alarmist way to describe text entry?
-
[email protected]replied to [email protected] last edited by
Lmaooooo great find. I wonder why exactly they had to clarify that? Maybe a semi Easter egg? Or a genuine concern? Thanks for sharing.
-
[email protected]replied to [email protected] last edited by
Yes, and all evil is secretly him. Just like you say, secretly he's working with everyone bad in the USA, and he's secretly a very important figure in US politics. Just like he's secretly behind me stubbing my toe.
-
[email protected]replied to [email protected] last edited by
So you were lying when you said you couldn't see them, because you've replied to ones that actively refute you statements.
-
[email protected]replied to [email protected] last edited by
I shouldn’t have anything to hide, but I’m part of a group the current fascist leadership in government want’s to eradicate, so hide I shall.
I agree and i think a lot of people who espouse "nothing to hide" as an approach haven't actually thought it all the way through.
Then there's the fascists, dictators, oligarchs and other all around shitbags who just want the control.
That said, I also feel like people acting like the remote server they are connected to is tracking what you do on it as some kind of surprise is so stupid. “Facebook is keeping track of the pictures I uploaded to it!!!” There’s a lot of stuff to complain about Facebook, google, or whoever, but them tracking stuff you send to them willingly isn’t one of them.
This always surprises me, i originally thought it was because people didn't understand how these things work or how capitalist companies work.
More and more it seems like people don't care until it affects them, which is somewhat understandable, it takes effort to care about this stuff and a lot of people will never be directly affected by the consequences.
What i do still think is that the general population has no idea the extent of what can be done with all of the information they are volunteering.
That's very slowly changing but the usages of the data are also increasing at a much more rapid pace than before.