OpenAI Furious DeepSeek Might Have Stolen All the Data OpenAI Stole From Us

[email protected]

Hell Nvidia's stock plummeted as well, which makes no sense at all, considering Deepseek needs the same hardware as ChatGPT.

It's the same hardware, the problem for them is that deepseek found a way to train their AI for much cheaper using a lot less than the hundreds of thousands of GPUs from Nvidia that openai, meta, xAi, anthropic etc. uses

[email protected]

Wasn't zuck the cuck saying "privacy is dead" a few years ago

[email protected]

DeepSeek’s actual trained model is immaterial—they could take it down tomorrow and never provide access again, and the damage to OpenAI’s business would still be done. The point is that any organization with a few million dollars and some (hopefully less-problematical) training data can now make their own model competitive with OpenAI’s.

[email protected]

Rob Reiner's dad Carl was best friends with Mel Brooks for almost all of Carl's adult life.

https://www.vanityfair.com/hollywood/2020/06/carl-reiner-mel-brooks-friendship

[email protected]

I'm not all that knowledgeable either lol it is my understanding though that what you download, the "model," is the results of their training. You would need some other way to train it. I'm not sure how you would go about doing that though. The model is essentially the "product" that is created from the training.

[email protected]

How do you know it isn't communicating with their servers? Obviously it needs internet connection to work, so what's stopping it from sending your data?

[email protected]

Why do you think it needs an Internet connection? Why are you saying 'obviously'

[email protected]

The way they found to train their AI cheaper was to steal it from OpenAI (not that I care). They still need GPUs to process the prompts and generate the responses.

[email protected]

It's called distilling the data, to turn the huge amount of data into a compact amount that can be used in another model.

[email protected]

If these guys thought they could out-bootleg the fucking Chinese then I have an unlicensed t-shirt of Nicky Mouse with their name on it.

[email protected]

CUDA being taken down a peg is the best part for me. Fuck proprietary APIs.

[email protected]

How else does it figure out what to say if it doesn't have the access to the internet? Genuine question, I don't imagine you're dowloading the entire dataset with the model.

[email protected]

Tamaleeeeeeeeesssssss

hot hot hot hot tamaleeeeeeeees

[email protected]

they need less powerful and less hardware in general tho, they acted like they needed more

[email protected]

I'll just say, it's ok to not know, but saying 'obviously' when you in fact have no clue is a bad look. I think it's a good moment to reflect on how over confident we can be on the internet, especially about incredibly complex topics that cross into multiple disciplines and touch multiple fields.

To answer your question. The model is in fact run entirely locally. But the model doesn't have all of the data. The model is the output of the processed training data, kind of like how a math expression 1 + 2 has more data than its output '3' the resulting model is orders of magnitude smaller.

The model consists of a bunch of variables, like knobs on panel, and the training process is turning the knobs, the knobs themselves are not that big, but they require a lot of information to know where to be turned too.

Not having access to the dataset is ok from a privacy standpoint, even if you don't know how the data was used or where it was obtained from, the important aspect here is that your prompts are not being transmitted anywhere, because the model is being used locally.

In short using the model and training the model are very different tasks.

[email protected]

I knew something was wrong with this.

[email protected]

No honor among thieves.

[email protected]

Deepseek can't take down the model, it's already been published and is mostly open source. Open source llms are the way, fuck closedAI

[email protected]

Right—by “take it down” I meant take down online access to their running instance of it.

agnos.is Forums

OpenAI Furious DeepSeek Might Have Stolen All the Data OpenAI Stole From Us