DeepSeek collects keystroke data and more, storing it in Chinese servers

[email protected]

Is Deepseek Open Source?

Hugging Face researchers are trying to build a more open version of DeepSeek’s AI ‘reasoning’ model

Hugging Face head of research Leandro von Werra and several company engineers have launched Open-R1, a project that seeks to build a duplicate of R1 and open source all of its components, including the data used to train it.

The engineers said they were compelled to act by DeepSeek’s “black box” release philosophy. Technically, R1 is “open” in that the model is permissively licensed, which means it can be deployed largely without restrictions. However, R1 isn’t “open source” by the widely accepted definition because some of the tools used to build it are shrouded in mystery. Like many high-flying AI companies, DeepSeek is loathe to reveal its secret sauce.

[email protected]

"We store the information we collect in secure servers located in the People's Republic of China"

Now you Americans know how we Europeans feel when Google, Amazon and Facebook store our information on American servers. Hint: The protective wall between Chinese servers and their government are about as good as the one between American servers and their government - at least for non-US citizens. The last thin veil of privacy for Eurpeans has been ripped to shreds by Trump last week.

[email protected]

I swear people do not understand how the internet works.

Anything you use on a remote server is going to be seen to some degree. They may or may not keep track of you, but you can't be surprised if they are. If you run the model locally, there is no indication it is sending anything anywhere. It runs using the same open source LLM tools that run all the other models you can run locally.

This is very much like someone doing surprised pikachu when they find out that facebook saves all the photos they upload to facebook or that gmail can read your email.

[email protected]

As a queer woman in the US, I currently care infinitely more what the US gov and companies track about me than what China does.

[email protected]

Our data's just too valuable for these parasites. Data privacy laws may eventually pass to compel software companies to store everything in US servers only.

[email protected]

Exactly. I'm queer. I'm not scared of China, even if they were doing the same thing the US currently is. Because only one of those actually effects the rights I have and what I do in my day-to-day.

I do not understand how the average person does not realize that.

[email protected]

Not in the way you think. They aren't constantly training when interacting, that would be way more inefficient than what US AI companies have been doing.

It might be added to the training data, but a lot of training data now is apparently synthetic and generated by other models because while you might get garbage, it gives more control over the type of data and shape it takes, which makes it more efficient to train for specific domains.

[email protected]

It doesn't. They run using stuff like Ollama or other LLM tools, all of the hobbyist ones are open source. All the model is is the inputs, node weights and connections, and outputs.

LLMs, or neural nets at large, are kind of a "black box" but there's no actual code that gets executed from the model when you run them, it's just processed by the host software based on the rules for how these work. The "black box" part is mostly because these are so complex we don't actually know exactly what it is doing or how it output answers, only that it works to a degree. It's a digital representation of analog brains.

People have also been doing a ton of hacking at it, retraining, and other modifications that would show anything like that if it could exist.

[email protected]

What would you have preferred? "Most apps sell your data, news at 11"? Would anyone care if it was written like that?

[email protected]

It's a chinese company, where else would they store the data?

[email protected]

Sure, but its open source and doesn't upload it anywhere. Also doesn't have internet permission

[email protected]

and that's what superpowers do, but living in a third world country i'm yet to see the chinese putsch us as the u.s. did during the cold war and beyond, with all due consequences. sorry about my lack of goodwill towards the department of state.

[email protected]

Oh yeah, the whole article could be reductively summed up as

"DeepSeek and all the other LLM services are almost as bad as each other, but we think deepseek is worse....because the Chinese government are known for doing bad things".

The title is factual, if a little clickbaity.

Obviously keystrokes you submit to a website are submitted to the website.

This though, it's not technically accurate, a lot of forms and input are done client side and then the resulting information is parceled up and sent to the server.

The actual keystroke data isn't normally sent.

Though this article doesn't go in to what kind of keystroke data is sent, if it was something more than just which keys in which order then that's perhaps an indicator that it's actively being collected for a reason, rather than just incidentally.

If you want to get really paranoid about such things it's known that you can you can do interesting things with actual keystroke data.

Also, afaict none of the the non-chinese services have specified that they don't do this.

[email protected]

I shouldn't have anything to hide, but I'm part of a group the current fascist leadership in government want's to eradicate, so hide I shall.

That said, I also feel like people acting like the remote server they are connected to is tracking what you do on it as some kind of surprise is so stupid. "Facebook is keeping track of the pictures I uploaded to it!!!!" There's a lot of stuff to complain about Facebook, google, or whoever, but them tracking stuff you send to them willingly isn't one of them.

[email protected]

Isn't it open source? If so it should be near trivial to get rid of all of that.

If it's closed source I wouldn't touch it with a tej foot pole, it's the same reason I rarely use chat gpt, it's just freely giving away your personal data to open AI.

[email protected]

The runner is open source, and that's what matter in this discussion. If you host the model on your own servers, you can ensure that no corporation (american or Chinese) has access to your data. Access to the training code and data is irrelevant here.

[email protected]

HuggingChat is open source and lets you use DeepSeek.

Very misleading, it lets you use the lighter, watered-down version (Deepseek 32B) compared to the large impressive model they have (Deepseek 671B)

[email protected]

I'm confused. Isn't "collecting keystroke data" just an alarmist way to describe text entry?

[email protected]

Lmaooooo great find. I wonder why exactly they had to clarify that? Maybe a semi Easter egg? Or a genuine concern? Thanks for sharing.

[email protected]

Yes, and all evil is secretly him. Just like you say, secretly he's working with everyone bad in the USA, and he's secretly a very important figure in US politics. Just like he's secretly behind me stubbing my toe.

agnos.is Forums

DeepSeek collects keystroke data and more, storing it in Chinese servers