Why Mark Zuckerberg wants to redefine open source so badly

[email protected]

If they open source everything they legally can, then do they qualify as "open source" for legal purposes?

No, definitely not! Open source is a binary attribute. If your product is partially open source, it's not open source, only the parts you open sourced.

So Llama is not open source, even if some parts are.

[email protected]

I suppose that both cases apply here. He’s saying that you either comply with an open source license that’s defined by the OSI or you don’t. That includes the source code to be available yes, but the article also mentions Meta license has a clause:

if you have an extremely successful AI program that uses Llama code, you'll have to pay Meta to use it. That's not open source. Period.

You can definitely have multiple licenses, such as Qt does to allow statically linking it and to modify it without distributing the source code, but that simply isn’t an open source one.

[email protected]

I don't get it. What would they redefine it to?

[email protected]

Ask "OpenAI"

[email protected]

I agree with you. What I'm saying is that perhaps the law can differentiate between "not open source" "partially open source" and "fully open source"

right now it's just the binary yes/no. which again determines whether or not millions of people would have access to something that could be useful to them

i'm not saying change the definition of open source. i'm saying for legal purposes, in the EU, there should be some clarification in the law. if there is a financial benefit to having an open source product available then there should be something for having a partially open source product available

especially a product that is as open source as it could possible legally be without violating copyright

[email protected]

Open source isn't defined legally, only through the OSI. The benefit is only from a marketing perspective as far as I'm aware.

Which is also why it's important that "open source" doesn't get mixed up with "partially open source", otherwise companies will get the benefits of "open source" without doing the actual work.

[email protected]

I mean using proprietary data has been an issue with models as long as I've worked in the space. It's always been a mixture of open weights, open data, open architecture.

I admit that it became more obvious when images/videos/audio became more accessible, but from things like facial recognition to pose estimation have all used proprietary datasets to build the models.

So this isn't a new issue, and from my perspective not an issue at all. We just need to acknowledge that not all elements of a model may be open.

[email protected]

You're right, he's a very complex asshole, indeed!

[email protected]

So this isn’t a new issue, and from my perspective not an issue at all. We just need to acknowledge that not all elements of a model may be open.

This is more or less what Zuckerberg is asking of the EU. To acknowledge that parts of it cannot be opened. But the fact that the code is opened means it should qualify for certain benefits that open source products would qualify for.

[email protected]

It is defined legally in the EU

https://artificialintelligenceact.eu/

https://artificialintelligenceact.eu/high-level-summary/

There are different requirements if the provider falls under "Free and open licence GPAI model providers"

Which is legally defined in that piece of legislation

otherwise companies will get the benefits of “open source” without doing the actual work.

Meta has done a lot for Open source, to their credit. React Native is my preferred framework for mobile development, for example.

Again- I fully acknowledge they are a large evil megacorp but also there are certain realities we need to accept based on the system we live in. Open Source only exists because corporations benefit off of these shared infrastructure.

Our laws should encourage this type of behavior and not restrict it. By limiting the scope, it gives Meta less incentive to open source the code behind their AI models. We want the opposite. We want to incentivize

[email protected]

I'm sorry you had to go through this and are suffering. There are people that can (literally) feel your pain, I hope that can give some comfort.

I'm lucky to be in Europe, otherwise I would (very likely) be dead and broke if not.

[email protected]

Because he's a massive douche?

[email protected]

I'm begging for far less, like 0.001%.

Very much unsuccessful so far.

[email protected]

Looking at any picture of mark suckerberg makes you believe that they are very much ahead with AI and robotics.

Either way, fuck Facebook, stop trying to ruin everything good in the world.

[email protected]

I've seen quite a few that have restrictions based off your size, like if its 1-5 ppl no charge, anymore and the cost increases as you go up.

[email protected]

water the tree of liberty? 🥰

[email protected]

Did you read the article?

[email protected]

Aww come on. There's plenty to be mad at Zuckerberg about, but releasing Llama under a semi-permissive license was a massive gift to the world. It gave independent researchers access to a working LLM for the first time. Deepseek started out messing around with Llama derivatives back in the day (though, to be clear, their MIT-licensed V3 and R1 models are not Llama derivatives).

As for open training data, its a good ideal but I don't think it's a realistic possibility for any organization that wants to build a workable LLM. These things use trillions of documents in training, and no matter how hard you try to clean the data, there's definitely going to be something lawyers can find to sue you over. No organization is going to open themselves up to the liability. And if you gimp your data set, you get a dumb AI that nobody wants to use.

[email protected]

when the data used to train the AI is copyrighted, how do you make it open source? it's a valid question.

It is actually possible to reveal the source of training data without showing the data itself. But I think this is a bit deeper since I'll bet all of my teeth that the training data they've used is literally the 20 years of Facebook interactions and entries that they have just chilling on their servers. Literally 3+ billion people's lives are the training data.

[email protected]

Is it for control, money? Of course it is.

agnos.is Forums

Why Mark Zuckerberg wants to redefine open source so badly