OpenAI Furious DeepSeek Might Have Stolen All the Data OpenAI Stole From Us
-
[email protected]replied to [email protected] last edited by
I'm not all that knowledgeable either lol it is my understanding though that what you download, the "model," is the results of their training. You would need some other way to train it. I'm not sure how you would go about doing that though. The model is essentially the "product" that is created from the training.
-
[email protected]replied to [email protected] last edited by
How do you know it isn't communicating with their servers? Obviously it needs internet connection to work, so what's stopping it from sending your data?
-
[email protected]replied to [email protected] last edited by
Why do you think it needs an Internet connection? Why are you saying 'obviously'
-
[email protected]replied to [email protected] last edited by
The way they found to train their AI cheaper was to steal it from OpenAI (not that I care). They still need GPUs to process the prompts and generate the responses.
-
[email protected]replied to [email protected] last edited by
It's called distilling the data, to turn the huge amount of data into a compact amount that can be used in another model.
-
[email protected]replied to [email protected] last edited by
If these guys thought they could out-bootleg the fucking Chinese then I have an unlicensed t-shirt of Nicky Mouse with their name on it.
-
[email protected]replied to [email protected] last edited by
CUDA being taken down a peg is the best part for me. Fuck proprietary APIs.
-
[email protected]replied to [email protected] last edited by
How else does it figure out what to say if it doesn't have the access to the internet? Genuine question, I don't imagine you're dowloading the entire dataset with the model.
-
[email protected]replied to [email protected] last edited by
Tamaleeeeeeeeesssssss
hot hot hot hot tamaleeeeeeeees
-
[email protected]replied to [email protected] last edited by
they need less powerful and less hardware in general tho, they acted like they needed more
-
[email protected]replied to [email protected] last edited by
I'll just say, it's ok to not know, but saying 'obviously' when you in fact have no clue is a bad look. I think it's a good moment to reflect on how over confident we can be on the internet, especially about incredibly complex topics that cross into multiple disciplines and touch multiple fields.
To answer your question. The model is in fact run entirely locally. But the model doesn't have all of the data. The model is the output of the processed training data, kind of like how a math expression 1 + 2 has more data than its output '3' the resulting model is orders of magnitude smaller.
The model consists of a bunch of variables, like knobs on panel, and the training process is turning the knobs, the knobs themselves are not that big, but they require a lot of information to know where to be turned too.
Not having access to the dataset is ok from a privacy standpoint, even if you don't know how the data was used or where it was obtained from, the important aspect here is that your prompts are not being transmitted anywhere, because the model is being used locally.
In short using the model and training the model are very different tasks.
-
[email protected]replied to [email protected] last edited by
I knew something was wrong with this.
-
[email protected]replied to [email protected] last edited by
No honor among thieves.
-
[email protected]replied to [email protected] last edited by
Deepseek can't take down the model, it's already been published and is mostly open source. Open source llms are the way, fuck closedAI
-
[email protected]replied to [email protected] last edited by
Right—by “take it down” I meant take down online access to their running instance of it.
-
[email protected]replied to [email protected] last edited by
-
[email protected]replied to [email protected] last edited by
You made me look ridiculously stupid and rightfully so. Actually, I take that back, I made myself look stupid and you made it obvious as it gets! Thanks for the wake up call
If I understand correctly, the model is in a way a dictionary of questions with responses, where the journey of figuring out the response is skipped. As in, the answer for the question "What's the point of existence" is "42", but it doesn't contain the thinking process that lead to this result.
If that's so, then wouldn't it be especially prone to hallucinations? I don't imagine it would respond adequately to the third "why?" in the row.
-
[email protected]replied to [email protected] last edited by
To add a tiny bit to what was already explained: you do actually download quite a bit of data to run it locally. The "smaller" 14b model I used was a 9GB download. The 32b one is 20GB and being all text, that's a lot of information.
-
[email protected]replied to [email protected] last edited by
I suspect that most usage of the model is going to be companies and individuals running their own instance of it. They have some smaller distilled models based on Llama and Qwen that can run on consumer-grade hardware.
-
[email protected]replied to [email protected] last edited by
Imagine if a little bit of those so many millions that so many companies are willing to throw away to the shit ai bubble was actually directed to anything useful.