Audio Flamingo 3 - Fully Open Large Audio Language Models
-
Now I'm even more confused.
Your professor abused their monopoly. That's the sort of thing I've been condemning. You are basically fine with that. You just think they should adjust their price policy to income. Well, yes, that would be the profit maximizing move. You make everyone pay as much as they are able to. That's what the copyright lobby wants. But I have to point out: There is no reason why they should lower the price for you. After all, you were able to pay. Rather, there seems to be room to raise the price.
Do you actually think this kind of monopoly abuse is a good thing?
Now what is a picture? It’s kind of a summary, a depiction of the outer appearance. And snapping a picture of a book cover would make sense for Fair Use. That’s kind if what it’s made for. If you now snap a picture of each and every one of the 400 pages inside, that’s where law says Fair Use stops.
No, that's not what the law says. I think, the problem is that we have different ideas over how Fair Use in the US actually works. I'll have to think about that.
wrote last edited by [email protected]Well, it's complicated. And depends on which theoretical option we're talking about. I for example think writing the textbook when you're the professor and selling that to the students is a very bad thing. I'm not fine with that at all. They should be funded mainly by taxpayer money (at least that's what we do). And the fruit of their labour should then be owned by the taxpayer. The US does similar things, like government texts, NASA pictures etc used to be owned by the people. And everyone is "the people" from a random student to a big AI company.
It's a bit a special example though, and doesn't translate 1:1 to the private book market.
I believe your regular book author does it the other way around. They aren't commissioned by anyone, they generally write it and only after that does the product get monetized. And I believe that's where your "rent-seeking" comes in. Somehow the author managed to feed themselves for the time it took them to write the book, and now they have it as an asset which they can try to turn into as much money as they can. It's two things mushed together. Their valid desire to eat and be compensated for their labour, plus the rent from the asset which might be huge for popular books and doesn't reflect labour cost. And all of this is very different from a university professor with a salary. It could and should be decoupled for them. But it's straight up impossible for the majority of authors, given our current copyright model. I think that's a fundamental limitation of capitalism.
And I wonder if those regulatory mechanisms are even applied correctly. I had that with the textbooks in university to some lesser extent. School was fine. But I heard in the US for example education is a complete rip-off and we get news articles every year on how parents can't afford the several hundred bucks for school textbooks for their children. And that is despite a different copyright doctrine. Maybe our model here leads to better results some times, I don't really know.
And concerning the Fair Use: Is there law which offers an option for compensation? I thought that was contradictory per definition.
-
Well, it's complicated. And depends on which theoretical option we're talking about. I for example think writing the textbook when you're the professor and selling that to the students is a very bad thing. I'm not fine with that at all. They should be funded mainly by taxpayer money (at least that's what we do). And the fruit of their labour should then be owned by the taxpayer. The US does similar things, like government texts, NASA pictures etc used to be owned by the people. And everyone is "the people" from a random student to a big AI company.
It's a bit a special example though, and doesn't translate 1:1 to the private book market.
I believe your regular book author does it the other way around. They aren't commissioned by anyone, they generally write it and only after that does the product get monetized. And I believe that's where your "rent-seeking" comes in. Somehow the author managed to feed themselves for the time it took them to write the book, and now they have it as an asset which they can try to turn into as much money as they can. It's two things mushed together. Their valid desire to eat and be compensated for their labour, plus the rent from the asset which might be huge for popular books and doesn't reflect labour cost. And all of this is very different from a university professor with a salary. It could and should be decoupled for them. But it's straight up impossible for the majority of authors, given our current copyright model. I think that's a fundamental limitation of capitalism.
And I wonder if those regulatory mechanisms are even applied correctly. I had that with the textbooks in university to some lesser extent. School was fine. But I heard in the US for example education is a complete rip-off and we get news articles every year on how parents can't afford the several hundred bucks for school textbooks for their children. And that is despite a different copyright doctrine. Maybe our model here leads to better results some times, I don't really know.
And concerning the Fair Use: Is there law which offers an option for compensation? I thought that was contradictory per definition.
Hmm. You seem to treat an economic rent as being the same as a return on investment. Any particular reason for that?
I for example think writing the textbook when you’re the professor and selling that to the students is a very bad thing. I’m not fine with that at all.
You aren't fine with that. But why are you fine with the copyright industry doing it to everyone in the country?
-
Hmm. You seem to treat an economic rent as being the same as a return on investment. Any particular reason for that?
I for example think writing the textbook when you’re the professor and selling that to the students is a very bad thing. I’m not fine with that at all.
You aren't fine with that. But why are you fine with the copyright industry doing it to everyone in the country?
wrote last edited by [email protected]I don't think my opinion as some random dude matters here. I could uphold arbitrary stupid believes. But this is kind of a factual question. So whether I personally, as one person, am fine with something is of no concern here. The question is, how do we arrive at a consistent economy model for immaterial goods...
And I think I wrote like 5 times now that I'm NOT fine with that. I said I view it as a (necessary) evil. It is evil in the sense of bad, I'm not fine with it, it comes with severe issues, we should do better than that. However "is" and "should" are two seperate things. We happen to live on a world that came up with copyright. It exists. We made a pact with the devil to address one thing. And I'm merely acknowledging that. Since it does exist, I need to deal with it. That's not agreement from my side. Copyright serves one legitimate purpose. It applies our capitalist economy to immaterial goods. It's supposed to allow individuals and companies to create, and trade with more than just cocoa beans. But it's complicated and we might have come up with a stupid way to do it. And a way that simultaneously has lots of negative side-effects.
And now what? That is the question. Do we abolish it? Do we replace it with something else that handles the one legitimate purpose a better way? Do we retrofit it and try to "patch" it? Do we do that just for AI? Or for more than just one use-case?
And I think I make a point about how return on investment and an economic rent are two distinct things. Yet they're in practice falsely(!) mushed together, which again is bad... Or am I mistaken and I can pay an artist for their investment but not pay a rent? I don't think there is a good way to do it with the current model. That means I get to treat both as the same. You seem to be under the impression I like it. But I don't. It's just that I have to abide by law and that currently mandates me to do it.
-
I don't think my opinion as some random dude matters here. I could uphold arbitrary stupid believes. But this is kind of a factual question. So whether I personally, as one person, am fine with something is of no concern here. The question is, how do we arrive at a consistent economy model for immaterial goods...
And I think I wrote like 5 times now that I'm NOT fine with that. I said I view it as a (necessary) evil. It is evil in the sense of bad, I'm not fine with it, it comes with severe issues, we should do better than that. However "is" and "should" are two seperate things. We happen to live on a world that came up with copyright. It exists. We made a pact with the devil to address one thing. And I'm merely acknowledging that. Since it does exist, I need to deal with it. That's not agreement from my side. Copyright serves one legitimate purpose. It applies our capitalist economy to immaterial goods. It's supposed to allow individuals and companies to create, and trade with more than just cocoa beans. But it's complicated and we might have come up with a stupid way to do it. And a way that simultaneously has lots of negative side-effects.
And now what? That is the question. Do we abolish it? Do we replace it with something else that handles the one legitimate purpose a better way? Do we retrofit it and try to "patch" it? Do we do that just for AI? Or for more than just one use-case?
And I think I make a point about how return on investment and an economic rent are two distinct things. Yet they're in practice falsely(!) mushed together, which again is bad... Or am I mistaken and I can pay an artist for their investment but not pay a rent? I don't think there is a good way to do it with the current model. That means I get to treat both as the same. You seem to be under the impression I like it. But I don't. It's just that I have to abide by law and that currently mandates me to do it.
I see. I think this is the big one:
Or am I mistaken and I can pay an artist for their investment but not pay a rent?
A return on investment is not the same as an economic rent.
Let's go back to the farmer example. You agree that a monopoly on the food supply is a bad thing. It can and will be abused.
Sidenote: You suggested that the government should produce textbooks to prevent abuse. Would that also be your solution here? Would that be preferable to the current arrangement?
Now, let's look at the situation of a farmer more closely. A farmer has to do a lot of work before they can harvest. They also need stuff like seeds, fertilizer, pesticides, fuel, machinery, spare parts and maintenance, and so on.
In the old times, one held back part of a grain harvest as seed grain for next year. That is an investment in the economics sense. You don't consume everything, but keep it so that you have more in the future. The finance meaning is subtly different but never mind.
Farmers gets a return on investment. They invest money and labor so that there is a harvest in the future. They could sell the equipment they already own to have more spending money now.
A ROI is part of a farmers' income but is not economic rent.
Back to authors. An established author will get an advance before they write the next book. That's investment by the publisher. If they don't get an advance, then the author is making the investment, but let's ignore that for simplicity. Investments are always risky. In this case, some books don't sell well and don't make back the money.
As a publisher, how much money would you invest in future books to maximize your profit? It depends on the expected payout and the cost of money.
Cost of money: You could borrow the money. Then the cost of the money is the interest on the loan. Or you could use the money for something else, eg buying safe government bonds. In that case, the cost is an opportunity cost. It's what you miss out on by not investing elsewhere.
Expected payout: It's the average profit/loss on each book. It is something you estimate based on experience.
The more books there are on the market, the lower the average profit. There must be a limit to how much of their income people are willing to spend on books. At some point, you have a lot of similar books chasing the same audience. That lowers the average. To maximize your profit, you invest in the production of more and more books, until the average return on each book is equal to the cost of money.
I'll leave it at that for now.
-
I see. I think this is the big one:
Or am I mistaken and I can pay an artist for their investment but not pay a rent?
A return on investment is not the same as an economic rent.
Let's go back to the farmer example. You agree that a monopoly on the food supply is a bad thing. It can and will be abused.
Sidenote: You suggested that the government should produce textbooks to prevent abuse. Would that also be your solution here? Would that be preferable to the current arrangement?
Now, let's look at the situation of a farmer more closely. A farmer has to do a lot of work before they can harvest. They also need stuff like seeds, fertilizer, pesticides, fuel, machinery, spare parts and maintenance, and so on.
In the old times, one held back part of a grain harvest as seed grain for next year. That is an investment in the economics sense. You don't consume everything, but keep it so that you have more in the future. The finance meaning is subtly different but never mind.
Farmers gets a return on investment. They invest money and labor so that there is a harvest in the future. They could sell the equipment they already own to have more spending money now.
A ROI is part of a farmers' income but is not economic rent.
Back to authors. An established author will get an advance before they write the next book. That's investment by the publisher. If they don't get an advance, then the author is making the investment, but let's ignore that for simplicity. Investments are always risky. In this case, some books don't sell well and don't make back the money.
As a publisher, how much money would you invest in future books to maximize your profit? It depends on the expected payout and the cost of money.
Cost of money: You could borrow the money. Then the cost of the money is the interest on the loan. Or you could use the money for something else, eg buying safe government bonds. In that case, the cost is an opportunity cost. It's what you miss out on by not investing elsewhere.
Expected payout: It's the average profit/loss on each book. It is something you estimate based on experience.
The more books there are on the market, the lower the average profit. There must be a limit to how much of their income people are willing to spend on books. At some point, you have a lot of similar books chasing the same audience. That lowers the average. To maximize your profit, you invest in the production of more and more books, until the average return on each book is equal to the cost of money.
I'll leave it at that for now.
wrote last edited by [email protected]Yes. That's economy and investment how we usually do it today. The conclusion of that is, the "manufacturers" sell their product at the end of the day. I think in the realm of what we're discussing, it means an AI company is then the client of the book authors. And they pay for the books, or more the content within. That's the traditional model and doesn't make sense unless it results in some product being sold.
You suggested that the government should produce textbooks to prevent abuse. Would that also be your solution here? Would that be preferable to the current arrangement?
Now that's a really interesting question. Some intelligent people have proposed similar things, economy being controlled by the government instead of the free market. And we've tried it. Turns out it's tricky to get it right. When they tried applying it to the entire economy, it often resulted in lots of corruption, an underperforming economy, up to outrageous things like famine and starvation in the population. Though I'm making it sound simpler than it is. Lots of different factors were involved with that.
And then sometimes we get it somewhat right. For example education is done by the government. Public infrastructure like roads, trains... And the government already produces books and TV. One example is public broadcasting like the BBC or ARD/ZDF here. I think what they produce is far superior than news in the USA. On the downside it's a very bloated organization and they waste lots and lots of money doing it.
So... My answer to your question is: yes and no. Yes, government should produce books and other content. Like local news from my region, which is not a profitable business so the private companies regularly fail due to that. And education would be another topic. It'd be great if education were accessible to everyone, at no cost. Maybe some other things.
And no, I don't think government should produce all books and content. That'd be kind of a monopoly on information. It's hard to choose which book should be written and which discarded. Which wannabe autor to put on the payroll... We'd need a lot of trust and faith in the government, which we don't have. And it's likely going to fail because of a multitude of reasons. I'd say it's somewhat a nice idea. But I give it zero chance to work as intended in reality. -
Yes. That's economy and investment how we usually do it today. The conclusion of that is, the "manufacturers" sell their product at the end of the day. I think in the realm of what we're discussing, it means an AI company is then the client of the book authors. And they pay for the books, or more the content within. That's the traditional model and doesn't make sense unless it results in some product being sold.
You suggested that the government should produce textbooks to prevent abuse. Would that also be your solution here? Would that be preferable to the current arrangement?
Now that's a really interesting question. Some intelligent people have proposed similar things, economy being controlled by the government instead of the free market. And we've tried it. Turns out it's tricky to get it right. When they tried applying it to the entire economy, it often resulted in lots of corruption, an underperforming economy, up to outrageous things like famine and starvation in the population. Though I'm making it sound simpler than it is. Lots of different factors were involved with that.
And then sometimes we get it somewhat right. For example education is done by the government. Public infrastructure like roads, trains... And the government already produces books and TV. One example is public broadcasting like the BBC or ARD/ZDF here. I think what they produce is far superior than news in the USA. On the downside it's a very bloated organization and they waste lots and lots of money doing it.
So... My answer to your question is: yes and no. Yes, government should produce books and other content. Like local news from my region, which is not a profitable business so the private companies regularly fail due to that. And education would be another topic. It'd be great if education were accessible to everyone, at no cost. Maybe some other things.
And no, I don't think government should produce all books and content. That'd be kind of a monopoly on information. It's hard to choose which book should be written and which discarded. Which wannabe autor to put on the payroll... We'd need a lot of trust and faith in the government, which we don't have. And it's likely going to fail because of a multitude of reasons. I'd say it's somewhat a nice idea. But I give it zero chance to work as intended in reality.I think in the realm of what we’re discussing, it means an AI company is then the client of the book authors
Ahh. But they are not. That's what we're discussing.
Let me make this clear: All intellectual property is arbitrary. I fear many copyright people have convinced themselves otherwise.
The government could grant the exclusive right to sell coffee in an area. That was done at one point. It could give the exclusive right to make shoes to some corporation. That was normal before the time of the French Revolution. The German constitution explicitly protects the right to chose one's profession. The origin of this lies in such feudal practices.
The US Constitution limits copyright because the founders were quite aware of how these feudal privileges were abused. European copyright descends from agreements between mostly monarchical empires. Rent-seeking was/is an intended feature, which is why Europeans are so easily defrauded by the copyright industry.
When you photograph an image, you have to get permission. Makes sense. When that image is in the background of a video, you may have to get permission. Makes less sense. You rarely have to get permission from makeup artists, hairdressers, and clothes designers. Why not, actually? Isn't that "theft" on a grand scale?
Historically, it makes sense. Originally, copyright was for printing. The only images you could print were engravings. It would have been hard to justify that the tailors, maids, or butlers should get a cut. And also, they were not a demographic that could expect to be favored with an economic rent from the elites.
And today? There are many photos that derive more value from the clothes and general appearance of the model than from anything else. And yet, the photographer owns the copyright and only needs to get permission from the model. How should that work?
By the by. Painters and some intellectuals raged against photography in much the same way that they rage against AI now. There is an essay by Charles Baudelaire that illustrates this nicely.
-
I think in the realm of what we’re discussing, it means an AI company is then the client of the book authors
Ahh. But they are not. That's what we're discussing.
Let me make this clear: All intellectual property is arbitrary. I fear many copyright people have convinced themselves otherwise.
The government could grant the exclusive right to sell coffee in an area. That was done at one point. It could give the exclusive right to make shoes to some corporation. That was normal before the time of the French Revolution. The German constitution explicitly protects the right to chose one's profession. The origin of this lies in such feudal practices.
The US Constitution limits copyright because the founders were quite aware of how these feudal privileges were abused. European copyright descends from agreements between mostly monarchical empires. Rent-seeking was/is an intended feature, which is why Europeans are so easily defrauded by the copyright industry.
When you photograph an image, you have to get permission. Makes sense. When that image is in the background of a video, you may have to get permission. Makes less sense. You rarely have to get permission from makeup artists, hairdressers, and clothes designers. Why not, actually? Isn't that "theft" on a grand scale?
Historically, it makes sense. Originally, copyright was for printing. The only images you could print were engravings. It would have been hard to justify that the tailors, maids, or butlers should get a cut. And also, they were not a demographic that could expect to be favored with an economic rent from the elites.
And today? There are many photos that derive more value from the clothes and general appearance of the model than from anything else. And yet, the photographer owns the copyright and only needs to get permission from the model. How should that work?
By the by. Painters and some intellectuals raged against photography in much the same way that they rage against AI now. There is an essay by Charles Baudelaire that illustrates this nicely.
wrote last edited by [email protected]Ahh. But they are not. That's what we're discussing. Let me make this clear: All intellectual property is arbitrary.
I feel we've ran into the exact same issue as before. Now we're talking property. But we were just talking about investment and we've just established those two are distinct and not the same. It's a bit confusing. And I agree, that resulting granted monopoly and rent-seeking is an intended feature, and not contributing to society. But my previous comment was addressing the aspect of the author's investment and ROI, not the resulting property from that. And that's not arbitrary at all. The author sat at his desk for 6 months specifically. Sure the resulting product is arbitrary when selling it for money, but that wasn't what we were talking about.
which is why Europeans are so easily defrauded by the copyright industry
I don't think we're easily defrauded by the copyright industry. As I said, school-books seem like 10x cheaper here. Medication with pharma IP in it is mostly cheaper here, I have my library card for like 30€ a year?! And other than that we use the same Spotify and Netflix subscriptions for a similar price. There's no substantial difference with that. I don't see myself in a less favourable position than an US citizen. We also have access to information here, good books, podcasts, journalism, we have culture, concerts... And I don't think any of that is better or cheaper or more accessible in the US. Correct me if I'm wrong...
photograph
Yeah, some photography rules are absurd. I think it's completely mental that people do copyright infringement when they take a picture of a sculpture. Seems US Fair Use sometimes has weird quirks. We also have stupid rules for pictures in Germany.
[...] feudal practices
Considering feudalism... I'd like to re-define that since wo don't have lords and a king for quite some time now. Today's land holders on the internet are companies like Meta, Google etc. They own the platforms we use on a daily basis. They make the rules, shape the place and lease chunks to us peasants as a service. We even let them shape society. For all intents and purposes, they're the feudal lords of today. And that's kind of the reason for my rejection here and why I said early on, all these AI companies are big multi-billion dollar corporations with motivations far from benefit to society. I believe concepts like Fair Use might have been invented as a means to combat feudalism. But looks to me like the situation is now changing and it's more and more used to the opposite effect by the feudal lords themselves to now contribute to their posessions, wealth and dominance.
I'll grant you the copyright industry is a worthy enemy, since they're villains, too. The copyright business model isn't healthy or beneficial to society overall. We've established that. But I really think of feudalism and a defacto-monopoly when I think of Google and Meta and OpenAI/Microsoft. And I'd really like to avoid making more concessions to my feudal lords.
-
Ahh. But they are not. That's what we're discussing. Let me make this clear: All intellectual property is arbitrary.
I feel we've ran into the exact same issue as before. Now we're talking property. But we were just talking about investment and we've just established those two are distinct and not the same. It's a bit confusing. And I agree, that resulting granted monopoly and rent-seeking is an intended feature, and not contributing to society. But my previous comment was addressing the aspect of the author's investment and ROI, not the resulting property from that. And that's not arbitrary at all. The author sat at his desk for 6 months specifically. Sure the resulting product is arbitrary when selling it for money, but that wasn't what we were talking about.
which is why Europeans are so easily defrauded by the copyright industry
I don't think we're easily defrauded by the copyright industry. As I said, school-books seem like 10x cheaper here. Medication with pharma IP in it is mostly cheaper here, I have my library card for like 30€ a year?! And other than that we use the same Spotify and Netflix subscriptions for a similar price. There's no substantial difference with that. I don't see myself in a less favourable position than an US citizen. We also have access to information here, good books, podcasts, journalism, we have culture, concerts... And I don't think any of that is better or cheaper or more accessible in the US. Correct me if I'm wrong...
photograph
Yeah, some photography rules are absurd. I think it's completely mental that people do copyright infringement when they take a picture of a sculpture. Seems US Fair Use sometimes has weird quirks. We also have stupid rules for pictures in Germany.
[...] feudal practices
Considering feudalism... I'd like to re-define that since wo don't have lords and a king for quite some time now. Today's land holders on the internet are companies like Meta, Google etc. They own the platforms we use on a daily basis. They make the rules, shape the place and lease chunks to us peasants as a service. We even let them shape society. For all intents and purposes, they're the feudal lords of today. And that's kind of the reason for my rejection here and why I said early on, all these AI companies are big multi-billion dollar corporations with motivations far from benefit to society. I believe concepts like Fair Use might have been invented as a means to combat feudalism. But looks to me like the situation is now changing and it's more and more used to the opposite effect by the feudal lords themselves to now contribute to their posessions, wealth and dominance.
I'll grant you the copyright industry is a worthy enemy, since they're villains, too. The copyright business model isn't healthy or beneficial to society overall. We've established that. But I really think of feudalism and a defacto-monopoly when I think of Google and Meta and OpenAI/Microsoft. And I'd really like to avoid making more concessions to my feudal lords.
Hmm. It looks like we are back to narratives again. Systematic analysis does not seem to come easy to you.
Now we’re talking property. But we were just talking about investment and we’ve just established those two are distinct and not the same.
"Investment" and "rent-seeking" are concepts in economics. Like, say, "function" or "variable" are concepts in programming.
"Property" is a legal institution. It relates to "investment" a bit like a machine code instruction relates to programming. They are, sort of, the underlying facts on which higher concepts rest.
And that’s not arbitrary at all. The author sat at his desk for 6 months specifically. Sure the resulting product is arbitrary when selling it for money, but that wasn’t what we were talking about.
I guess you didn't get what I was trying to say. Let me put it like this:
If they wrote a story that takes place in the universe of a video game, then they need to get permission first. They need to ask whoever owns the rights to the video game, or else it is "theft".
Conversely, if the story is original, and anyone wants to make a video game in that universe, then they need the author's permission.
This remains so until 70 years after the death of the creator of the video game/story. At least, it is 70 years now. It may be made longer again at any time.
That is arbitrary, no?
Today’s land holders on the internet are companies like Meta, Google etc.
Not just them, but yes. How do you think they manage that?
And that’s kind of the reason for my rejection here
That seems pretty vibes-based. What do you rationally expect the outcome of your favored policies to be?
-
Hmm. It looks like we are back to narratives again. Systematic analysis does not seem to come easy to you.
Now we’re talking property. But we were just talking about investment and we’ve just established those two are distinct and not the same.
"Investment" and "rent-seeking" are concepts in economics. Like, say, "function" or "variable" are concepts in programming.
"Property" is a legal institution. It relates to "investment" a bit like a machine code instruction relates to programming. They are, sort of, the underlying facts on which higher concepts rest.
And that’s not arbitrary at all. The author sat at his desk for 6 months specifically. Sure the resulting product is arbitrary when selling it for money, but that wasn’t what we were talking about.
I guess you didn't get what I was trying to say. Let me put it like this:
If they wrote a story that takes place in the universe of a video game, then they need to get permission first. They need to ask whoever owns the rights to the video game, or else it is "theft".
Conversely, if the story is original, and anyone wants to make a video game in that universe, then they need the author's permission.
This remains so until 70 years after the death of the creator of the video game/story. At least, it is 70 years now. It may be made longer again at any time.
That is arbitrary, no?
Today’s land holders on the internet are companies like Meta, Google etc.
Not just them, but yes. How do you think they manage that?
And that’s kind of the reason for my rejection here
That seems pretty vibes-based. What do you rationally expect the outcome of your favored policies to be?
wrote last edited by [email protected]Systematic analysis [...] That is arbitrary, no?
Yes. That's arbitrary. But we're conflating several very different things here. There is investment in form of labour. And I'm pretty sure we have to agree that in general, labour needs to be compensated in a capitalist economy. Then there is copyright. And this is intellectual property, which is yet another concept. All of this goes into a book, but they're all very different things. I think IP is the most abstract one (it protects concepts) and kind of moot. I'd be more lax with IP and try to allow everyone to draw a Mickey Mouse, program a Final Fantasy game or write a new Harry Potter book. Patents are a similar thing. Though we have them for a reason.
That's why I say I'm with you with the copyright and the intellectual property. But there's also work going into a book and we're always brushing over that as if it weren't a thing.
How do you think they manage that
It's many factors. Timing, aggressive acquisition strategies, ecosystem building, network effects, then ecosystem lock-in, data harvesting, dominating standards, but also providing genuinely useful services. Economy of scale, massive capital... And I probably forgot dozens of factors, some legitimate, some exploitative.
That seems pretty vibes-based. What do you rationally expect the outcome of your favored policies to be?
- A more level playing field for new players and institutions apart from mega-corporations
- More transparency, since this is a disruptive technology with impact on society
- Expanding on transparency: Mandating transparency in cases like: Why was my loan declined? Why is my insurance now 4x the cost? And is the picture/text on the internet misinformation and fake or real?
- More public research and access to AI. AI shouldn't be just a for-profit service shaped by the tech bros
- Regulation of Black Mirror episode content, like social scoring, total surveillance and mass control, fraud and big-scale manipulation of people, discrimination... And oversight and mandatory standards for dangerous tech, like systems used in healthcare or the arms industry.
- Handle copyright in a way that applies universally. It's unfair and deeply undemocratic to allow Mark Zuckerberg to pirate books because he's rich and has an AI company, while I and other businesses can go to jail for the exact same thing.
- Less ruthless business practices like deliberately abusive data scraping.
- Clarify edge-cases like whether it's okay to impersonate Scarlett Johannsson or David Attenborough. Or generate pornography of Emma Watson.
- Incentives to develop open-weights models (ideally more than that) and to contribute to society and progress.
-
Systematic analysis [...] That is arbitrary, no?
Yes. That's arbitrary. But we're conflating several very different things here. There is investment in form of labour. And I'm pretty sure we have to agree that in general, labour needs to be compensated in a capitalist economy. Then there is copyright. And this is intellectual property, which is yet another concept. All of this goes into a book, but they're all very different things. I think IP is the most abstract one (it protects concepts) and kind of moot. I'd be more lax with IP and try to allow everyone to draw a Mickey Mouse, program a Final Fantasy game or write a new Harry Potter book. Patents are a similar thing. Though we have them for a reason.
That's why I say I'm with you with the copyright and the intellectual property. But there's also work going into a book and we're always brushing over that as if it weren't a thing.
How do you think they manage that
It's many factors. Timing, aggressive acquisition strategies, ecosystem building, network effects, then ecosystem lock-in, data harvesting, dominating standards, but also providing genuinely useful services. Economy of scale, massive capital... And I probably forgot dozens of factors, some legitimate, some exploitative.
That seems pretty vibes-based. What do you rationally expect the outcome of your favored policies to be?
- A more level playing field for new players and institutions apart from mega-corporations
- More transparency, since this is a disruptive technology with impact on society
- Expanding on transparency: Mandating transparency in cases like: Why was my loan declined? Why is my insurance now 4x the cost? And is the picture/text on the internet misinformation and fake or real?
- More public research and access to AI. AI shouldn't be just a for-profit service shaped by the tech bros
- Regulation of Black Mirror episode content, like social scoring, total surveillance and mass control, fraud and big-scale manipulation of people, discrimination... And oversight and mandatory standards for dangerous tech, like systems used in healthcare or the arms industry.
- Handle copyright in a way that applies universally. It's unfair and deeply undemocratic to allow Mark Zuckerberg to pirate books because he's rich and has an AI company, while I and other businesses can go to jail for the exact same thing.
- Less ruthless business practices like deliberately abusive data scraping.
- Clarify edge-cases like whether it's okay to impersonate Scarlett Johannsson or David Attenborough. Or generate pornography of Emma Watson.
- Incentives to develop open-weights models (ideally more than that) and to contribute to society and progress.
Sorry, misunderstanding. I wasn't asking what you hope to happen.
You have ideas on how copyright should work wrt AI training. Make these ideas explicit, and then try to systematically analyze what the economic effects are.
Law can be a little bit like programming. A law has certain conditions. If these conditions are met, then certain legal effects follow.
If certain conditions are met, then someone has the exclusive copyright. If this copyright is violated, then damages must be paid. And of course, there are more rules to determine if copyright was violated or how those damages should be determined.
So under what conditions does AI training violate copyright? What would the legal consequence be? Then, what would that mean for the economic system on the whole?
-
Sorry, misunderstanding. I wasn't asking what you hope to happen.
You have ideas on how copyright should work wrt AI training. Make these ideas explicit, and then try to systematically analyze what the economic effects are.
Law can be a little bit like programming. A law has certain conditions. If these conditions are met, then certain legal effects follow.
If certain conditions are met, then someone has the exclusive copyright. If this copyright is violated, then damages must be paid. And of course, there are more rules to determine if copyright was violated or how those damages should be determined.
So under what conditions does AI training violate copyright? What would the legal consequence be? Then, what would that mean for the economic system on the whole?
wrote last edited by [email protected]That's a tough question. Copyright is showing its age and barely applies in the digital world. Even before AI we had a lot of edge cases and court cases over like a decade to find out how copyright applies to a digital concept. I don't think there is an easy way to retrofit something. At least I can't come up with a good idea. And the general proposal seems to be all or nothing.
What I think doesn't work is saying every normal citizen needs to buy books and Zuckerberg gets to pirate books. In a democracy law has to apply to everyone. And his use-case doesn't matter here. I can also claim I pirated the 10TB of TV shows and movies for transformative or legitimate use. It's still piracy. And other law works the same way. If I steal chocolate in the supermarket, that's also theft no matter what I was planning to do with it. So that's out.
And then we're left with how economy is supposed to work as of today. An AI company needs supplies to manufacture their product, they buy those supplies on the market... In this case that's going to be licensing content. Though, that's going to be hard. A billion dollar company with a service used by millions of people should pay more than a single researcher doing it for 5 people. And implementing that would be impossibly complex. One possible way would be to introduce a collecting society to handle the money and maths. But they're not ideal either.
So it's more or less down to allowing AI companies to use content with some kind of default license. They can take all the public information as they wish. Again, they can not steal in the process. They'll buy one copy of a Terry Pratchett novel at the same price everyone needs to pay.
And to compensate for them not having to contract with the authors an buy special licenses, they need to offer transparency. Tell the authors and everyone what went into the models and if their content is amongst that. And if they scraped my personal data, I need a way to get that deleted from the dataset.
I'd also add an optional opt-out mechanism to appease to the people who hate AI. They can add some machine-readable notice, or file a complaint and their content will be discarded.
And since just taking and not contributing back isn't healthy to society, I'd add something about "composite" works. If something like an AI model is just pieced together by other people's content, that doesn't deserve copyright in my opinion. So all generations are automatically public domain and maybe the models as well.
And we need a definition of AI and transformative. Once we get capable models with a ability to recite an entire novel word by word, that's going to run into copyright again. So yeah.
And intellectual property has to be softened. A generative AI model necessary "contains" a lot of IP, has knowledge about it and can reproduce it. And we need to be alright with that. And in case someone wants to outlaw impersonation and celebrity deepfakes, there needs to be more than a blurry line.
But all of this is more patching copyright and we're going to run into all kinds of issues with that. I think ideally we come up with a grand idea and overhaul the entire thing so it applies to the 21st century.
-
That's a tough question. Copyright is showing its age and barely applies in the digital world. Even before AI we had a lot of edge cases and court cases over like a decade to find out how copyright applies to a digital concept. I don't think there is an easy way to retrofit something. At least I can't come up with a good idea. And the general proposal seems to be all or nothing.
What I think doesn't work is saying every normal citizen needs to buy books and Zuckerberg gets to pirate books. In a democracy law has to apply to everyone. And his use-case doesn't matter here. I can also claim I pirated the 10TB of TV shows and movies for transformative or legitimate use. It's still piracy. And other law works the same way. If I steal chocolate in the supermarket, that's also theft no matter what I was planning to do with it. So that's out.
And then we're left with how economy is supposed to work as of today. An AI company needs supplies to manufacture their product, they buy those supplies on the market... In this case that's going to be licensing content. Though, that's going to be hard. A billion dollar company with a service used by millions of people should pay more than a single researcher doing it for 5 people. And implementing that would be impossibly complex. One possible way would be to introduce a collecting society to handle the money and maths. But they're not ideal either.
So it's more or less down to allowing AI companies to use content with some kind of default license. They can take all the public information as they wish. Again, they can not steal in the process. They'll buy one copy of a Terry Pratchett novel at the same price everyone needs to pay.
And to compensate for them not having to contract with the authors an buy special licenses, they need to offer transparency. Tell the authors and everyone what went into the models and if their content is amongst that. And if they scraped my personal data, I need a way to get that deleted from the dataset.
I'd also add an optional opt-out mechanism to appease to the people who hate AI. They can add some machine-readable notice, or file a complaint and their content will be discarded.
And since just taking and not contributing back isn't healthy to society, I'd add something about "composite" works. If something like an AI model is just pieced together by other people's content, that doesn't deserve copyright in my opinion. So all generations are automatically public domain and maybe the models as well.
And we need a definition of AI and transformative. Once we get capable models with a ability to recite an entire novel word by word, that's going to run into copyright again. So yeah.
And intellectual property has to be softened. A generative AI model necessary "contains" a lot of IP, has knowledge about it and can reproduce it. And we need to be alright with that. And in case someone wants to outlaw impersonation and celebrity deepfakes, there needs to be more than a blurry line.
But all of this is more patching copyright and we're going to run into all kinds of issues with that. I think ideally we come up with a grand idea and overhaul the entire thing so it applies to the 21st century.
That's a good start.
What I think doesn’t work is saying every normal citizen needs to buy books and Zuckerberg gets to pirate books. In a democracy law has to apply to everyone. And his use-case doesn’t matter here. I can also claim I pirated the 10TB of TV shows and movies for transformative or legitimate use. It’s still piracy.
The laws do apply to everyone equally, though few people are able to litigate for years against the copyright industry.
Your concern is obviously the use case. If the use case doesn't matter, then quotes and parody are illegal, as well as historical archiving and scientific analysis.
I guess you just want AI training to not be fair use. That raises the question of how this should work.
Maybe you think that different standards should be applied to Zuckerberg, after all. Your focus on him makes it seem a little like that.
Perhaps you simply have something more european in mind. Europe and in particular Germany do not have fair use. There is a short list of uses that do not require permission. That means that every time some new use becomes desirable, the law must be changes. This is obviously stifling for progress in science and culture. Think of HipHop with its use of samples. It's hard to imagine some artists successfully petitioning the government to legalize the practice before experimenting with it. You couldn't have developed a search engine that simply copies all web pages for indexing. Something like the Internet Archive, or the Wayback Machine, would be impossible. It would just be a few tech geeks against the copyright industry, including the media.
So, how should this be done?
And other law works the same way. If I steal chocolate in the supermarket, that’s also theft no matter what I was planning to do with it. So that’s out.
Actually, no. Theft is prosecuted by the government; police and courts. Copyright infringement is generally a civil matter. Damages are paid but there is no criminal prosecution.
The government only cares for large-scale, industrial infringement, like EG operating a Netflix-like streaming service. Small scale infringement is not even criminal in the US. I believe, even in Europe, people who torrent movies or such are rarely criminally prosecuted.
Maybe you would like to see copyright infringement to be punished more harshly and enforced more strictly?
A billion dollar company with a service used by millions of people should pay more than a single researcher doing it for 5 people.
That's an interesting idea. It's not how we do anything else. You don't usually have to pay more for the same thing, depending on who you are or how much you use it. I expect, it would be quite devastating if that were the rule.
Should this policy idea apply only to copyright or generally? If only copyright, why?
And if they scraped my personal data, I need a way to get that deleted from the dataset.
Should there be exceptions for celebrities and such, or will they be able to demand licensing fees?
I’d also add an optional opt-out mechanism to appease to the people who hate AI. They can add some machine-readable notice, or file a complaint and their content will be discarded.
Then much public content can't be used, after all. The likes of Reddit, Facebook, or Discord will be able to charge licensing fees for their content, after all. It's very typically European. You rage against Meta's monopoly but you also call for laws to enforce and strengthen it. I think it's the echo of feudalism in the culture.
-
That's a good start.
What I think doesn’t work is saying every normal citizen needs to buy books and Zuckerberg gets to pirate books. In a democracy law has to apply to everyone. And his use-case doesn’t matter here. I can also claim I pirated the 10TB of TV shows and movies for transformative or legitimate use. It’s still piracy.
The laws do apply to everyone equally, though few people are able to litigate for years against the copyright industry.
Your concern is obviously the use case. If the use case doesn't matter, then quotes and parody are illegal, as well as historical archiving and scientific analysis.
I guess you just want AI training to not be fair use. That raises the question of how this should work.
Maybe you think that different standards should be applied to Zuckerberg, after all. Your focus on him makes it seem a little like that.
Perhaps you simply have something more european in mind. Europe and in particular Germany do not have fair use. There is a short list of uses that do not require permission. That means that every time some new use becomes desirable, the law must be changes. This is obviously stifling for progress in science and culture. Think of HipHop with its use of samples. It's hard to imagine some artists successfully petitioning the government to legalize the practice before experimenting with it. You couldn't have developed a search engine that simply copies all web pages for indexing. Something like the Internet Archive, or the Wayback Machine, would be impossible. It would just be a few tech geeks against the copyright industry, including the media.
So, how should this be done?
And other law works the same way. If I steal chocolate in the supermarket, that’s also theft no matter what I was planning to do with it. So that’s out.
Actually, no. Theft is prosecuted by the government; police and courts. Copyright infringement is generally a civil matter. Damages are paid but there is no criminal prosecution.
The government only cares for large-scale, industrial infringement, like EG operating a Netflix-like streaming service. Small scale infringement is not even criminal in the US. I believe, even in Europe, people who torrent movies or such are rarely criminally prosecuted.
Maybe you would like to see copyright infringement to be punished more harshly and enforced more strictly?
A billion dollar company with a service used by millions of people should pay more than a single researcher doing it for 5 people.
That's an interesting idea. It's not how we do anything else. You don't usually have to pay more for the same thing, depending on who you are or how much you use it. I expect, it would be quite devastating if that were the rule.
Should this policy idea apply only to copyright or generally? If only copyright, why?
And if they scraped my personal data, I need a way to get that deleted from the dataset.
Should there be exceptions for celebrities and such, or will they be able to demand licensing fees?
I’d also add an optional opt-out mechanism to appease to the people who hate AI. They can add some machine-readable notice, or file a complaint and their content will be discarded.
Then much public content can't be used, after all. The likes of Reddit, Facebook, or Discord will be able to charge licensing fees for their content, after all. It's very typically European. You rage against Meta's monopoly but you also call for laws to enforce and strengthen it. I think it's the echo of feudalism in the culture.
wrote last edited by [email protected]If the use case doesn't matter, then quotes and parody are illegal, as well as historical archiving and scientific analysis.
Well, there is a distinction between use and obtaining it. For stealing, the use doesn't matter. For later use, it does. That's also what licenses are concerned with.
That means that every time some new use becomes desirable, the law must be changes. This is obviously stifling for progress in science and culture.
Yes, that's obviously the wrong way round. Usually things should be allowed per default, unless they're specifically prohibited or handled by law. We got it the wrong way around, here. However, I don't think it's the other way around in the USA either. While Fair Use is a broad limitation/exemption, it's still concerned with specific exemptions. For example AI wouldn't be allowed by default unless it gets incorporated into law, but they're referring back to the already existing, specific exemption to do "transformative" work. Very much alike our exemptions. Just that it is way more broad.
Actually, no. Theft is prosecuted by the government; police and courts. Copyright infringement is generally a civil matter. Damages are paid but there is no criminal prosecution.
Well, it is. In the United States, willful copyright infringement carries a maximum fine of $150,000 per instance. In Germany it seems to be prison sentence up to 3 years or a fine.
I think laws should either be enforced or abolished. The current situation is not healthy.
Maybe you would like to see copyright infringement to be punished more harshly and enforced more strictly?
No, copyright should be toned down. Preferably for regular citizens as well and not just the industry.
That's an interesting idea. It's not how we do anything else. You don't usually have to pay more for the same thing, depending on who you are or how much you use it.
You're wrong here. People do have to pay more if they license a picture to show to their 20 million customers or use it in an advertising campaign, than I do for putting it up in the hallway. Airbus pays like 100x the price for the same set of nuts and bolts than someone else. A kitchen appliance for industrial use costs like 3x the price of an end user kitchen appliance. Because it's more sturdy and made for 24/7 use. A DVD rental business pays more for a DVD than the average customer.
Should there be exceptions for celebrities and such, or will they be able to demand licensing fees?
No exceptions, no licensing, no fees. This is strictly to avoid bad things like doxxing, ruining people's lives...
Then much public content can't be used, after all. The likes of Reddit, Facebook, or Discord will be able to charge licensing fees for their content, after all. [...]
They already do. There's a big war going on in the internet. I've told you how my server was targeted by Alibaba and it nearly took down the database. All other people have implemented countermeasures as well. Try scraping Reddit or downloading 5 Youtube videos. It's a thing of the past, you'll get rate-limited and your downloads will quickly start to fail. Unless you pay. So it is defacto that way already and can barely get worse. And the rich can buy their way into things, the monopolists are already in, while I can't do anything any more. My IP addresses get rate-limited or blocked and my accounts banned for "suspicious activity". Which was me making use of my Fair Use rights or the German version of something like that. But I'm prevented from exercising my rights.
It's very typically European. You rage against Meta's monopoly but you also call for laws to enforce and strengthen it. I think it's the echo of feudalism in the culture.
Well, I think taking authors' livelihood in favour of mega corporations is enforcing and strengthening their monopoly and the echo of feudalism. I'd be less concerned if it was some small research institute doing something for the public or progress. Or if a programming book author was making more than 100,000€ a year and they're "the monopoly". But it's the other way around. This application of Fair Use is in favour of the feudal lord companies and to the detriment of the average person. Also defacto I as a citizen get none of the Fair Use the big companies get, and that's just different rules for different people.
-
I think I used a bit too much sarcasm. I wanted to take a spin on the idea how the AI industry simultaneously uses copyright, and finds ways to "circumvent" the traditional copyright that was written before we had large language models. An AI is neither a telephone-book, nor should every transformative work be Fair Use, no questions asked. And this isn't really settled as of now. We likely need some more court cases and maybe a few new laws. But you're right, law is complicated, there is a lot of nuance to it and it depends on jurisdiction.
Alas, we have reached the max comment depth. I cannot reply to your latest comment.
Well, there is a distinction between use and obtaining it. For stealing, the use doesn’t matter. For later use, it does. That’s also what licenses are concerned with.
I see what you mean now. It's tricky. It's just another way in which copyright talking points cause problems.
You're saying that using/copying something you have in a database for AI training should always be legal. However, copying something to add it to the database should be judged as if it was done for enjoyment. EG everyone who torrents a movie should be treated the same, regardless of purpose. This will certainly cause problems for some scientific datasets.
Whether you downloaded a legal copy depends on whether the party offering the download had the right to do so. Whether that is the case may not be apparent. The first question is: What duty does someone have to check the provenance of content or data?
Torrents of current movies and the like are very obviously not authorized. For older movies, that becomes less clear. The web contains much unauthorized content. For example, the news stories that people copy/past on Lemmy. What duty is there to determine the copyright status of the content before using such data?
When researchers and developers share datasets, what duty do they have to check how the contents were obtained by whoever assembled it?
What happens when something was wrongly included in a dataset? Is that a problem only for the original curator, or also for everyone who got a copy?
What about streams, live TV, radio, and such things? Are you allowed to record those for training or not?
While Fair Use is a broad limitation/exemption, it’s still concerned with specific exemptions.
That's not quite right. Ultimately, Fair Use derives from the US Constitution; from the copyright clause but also freedom of speech. Copyright law spells out 4 factors that must be taken into account. But courts may also consider other factors. There is also no set way in which these factors have to be weighed. It's very open.
Well, it is. In the United States, willful copyright infringement
There are minimum conditions before prosecution is possible. I think uploading can always be prosecuted.
No, copyright should be toned down. Preferably for regular citizens as well and not just the industry.
Well, over the last few decades it has only been going in the other direction.
How does this fit together with calling copyright infringement theft?
Let me make a suggestion. This is your real opinion. This is what you believe based on what you see. The rest is just slogans by the copyright industry, which you repeat without thinking. The problem is that you are basically shouting yourself down; your own opinion. The media, a big part of the copyright industry, puts these slogans out. Their lobbyists demand favors and harsher laws from politicians. And when the politicians look at what voters think, they hear these slogans. That's one thing I mean when I say the copyright industry defrauds us.
Airbus pays like 100x the price for the same set of nuts and bolts than someone else. A kitchen appliance for industrial use costs like 3x the price of an end user kitchen appliance. Because it’s more sturdy and made for 24/7 use.
Exactly, they don't pay more for the same thing. It's almost exclusive to the copyright industry.
People do have to pay more if they license a picture to show to their 20 million customers or use it in an advertising campaign, than I do for putting it up in the hallway.
Actually, even in the copyright industry, such terms are from universal. Of course, you will have to pay more for the right to make copies than for a single copy. And even more for the exclusive copyright. Those things are different. However, it's usually a flat fee. Can you figure out what economic reasons might exist for a creator being paid per copy or per viewer?
No exceptions, no licensing, no fees. This is strictly to avoid bad things like doxxing, ruining people’s lives…
"No exceptions" means, for example, that a LLM would not be able to answer questions about politicians, actors, musicians, maybe not even about historical figures.
You said that there should be a way that you can remove your personal data from the training set. That implies that an AI company can offer money in exchange for people not removing their data. That's basically a licensing fee, however it is framed.
On second thought, I believe many celebrities, business people, politicians, ... will gladly offer more training data that makes them look. They'd only remove data that makes them look bad. Sort of like how the GDPR works. Far from demanding a licensing fee, they'd pay money to be known by the AI.
I’ve told you how my server was targeted by Alibaba and it nearly took down the database. [...] But I’m prevented from exercising my rights.
I agree that the situation is far from ideal. But let me point out that you do not have a right to other people's computer services. That's the issue with Alibaba hitting your server, right? It's a difficult issue. Mind that an opt-out from AI training does not actually address this.
This application of Fair Use is in favour of the feudal lord companies and to the detriment of the average person.
How so?
-
Alas, we have reached the max comment depth. I cannot reply to your latest comment.
Well, there is a distinction between use and obtaining it. For stealing, the use doesn’t matter. For later use, it does. That’s also what licenses are concerned with.
I see what you mean now. It's tricky. It's just another way in which copyright talking points cause problems.
You're saying that using/copying something you have in a database for AI training should always be legal. However, copying something to add it to the database should be judged as if it was done for enjoyment. EG everyone who torrents a movie should be treated the same, regardless of purpose. This will certainly cause problems for some scientific datasets.
Whether you downloaded a legal copy depends on whether the party offering the download had the right to do so. Whether that is the case may not be apparent. The first question is: What duty does someone have to check the provenance of content or data?
Torrents of current movies and the like are very obviously not authorized. For older movies, that becomes less clear. The web contains much unauthorized content. For example, the news stories that people copy/past on Lemmy. What duty is there to determine the copyright status of the content before using such data?
When researchers and developers share datasets, what duty do they have to check how the contents were obtained by whoever assembled it?
What happens when something was wrongly included in a dataset? Is that a problem only for the original curator, or also for everyone who got a copy?
What about streams, live TV, radio, and such things? Are you allowed to record those for training or not?
While Fair Use is a broad limitation/exemption, it’s still concerned with specific exemptions.
That's not quite right. Ultimately, Fair Use derives from the US Constitution; from the copyright clause but also freedom of speech. Copyright law spells out 4 factors that must be taken into account. But courts may also consider other factors. There is also no set way in which these factors have to be weighed. It's very open.
Well, it is. In the United States, willful copyright infringement
There are minimum conditions before prosecution is possible. I think uploading can always be prosecuted.
No, copyright should be toned down. Preferably for regular citizens as well and not just the industry.
Well, over the last few decades it has only been going in the other direction.
How does this fit together with calling copyright infringement theft?
Let me make a suggestion. This is your real opinion. This is what you believe based on what you see. The rest is just slogans by the copyright industry, which you repeat without thinking. The problem is that you are basically shouting yourself down; your own opinion. The media, a big part of the copyright industry, puts these slogans out. Their lobbyists demand favors and harsher laws from politicians. And when the politicians look at what voters think, they hear these slogans. That's one thing I mean when I say the copyright industry defrauds us.
Airbus pays like 100x the price for the same set of nuts and bolts than someone else. A kitchen appliance for industrial use costs like 3x the price of an end user kitchen appliance. Because it’s more sturdy and made for 24/7 use.
Exactly, they don't pay more for the same thing. It's almost exclusive to the copyright industry.
People do have to pay more if they license a picture to show to their 20 million customers or use it in an advertising campaign, than I do for putting it up in the hallway.
Actually, even in the copyright industry, such terms are from universal. Of course, you will have to pay more for the right to make copies than for a single copy. And even more for the exclusive copyright. Those things are different. However, it's usually a flat fee. Can you figure out what economic reasons might exist for a creator being paid per copy or per viewer?
No exceptions, no licensing, no fees. This is strictly to avoid bad things like doxxing, ruining people’s lives…
"No exceptions" means, for example, that a LLM would not be able to answer questions about politicians, actors, musicians, maybe not even about historical figures.
You said that there should be a way that you can remove your personal data from the training set. That implies that an AI company can offer money in exchange for people not removing their data. That's basically a licensing fee, however it is framed.
On second thought, I believe many celebrities, business people, politicians, ... will gladly offer more training data that makes them look. They'd only remove data that makes them look bad. Sort of like how the GDPR works. Far from demanding a licensing fee, they'd pay money to be known by the AI.
I’ve told you how my server was targeted by Alibaba and it nearly took down the database. [...] But I’m prevented from exercising my rights.
I agree that the situation is far from ideal. But let me point out that you do not have a right to other people's computer services. That's the issue with Alibaba hitting your server, right? It's a difficult issue. Mind that an opt-out from AI training does not actually address this.
This application of Fair Use is in favour of the feudal lord companies and to the detriment of the average person.
How so?
Alas, we have reached the max comment depth
Oh, wow.
I mean for some questions, we already have an old way of doing it and it's relatively straightforward to apply it:
Can you figure out what economic reasons might exist for a creator being paid per copy or per viewer?
Selling/Buying something is a very common form of contract. In our economy, the parties themselves decide what's in the contract. I can buy apples, cauliflower or wood screws per piece or per kilogram. That's down to my individual contract between me and the supermarket (or hardware store) and nothing the government is involved in. It's similar with licensing, that's always arbitrary and a matter of negotiation.
What happens when something was wrongly included in a dataset? Is that a problem only for the original curator, or also for everyone who got a copy?
Of course for everyone. If I download a torrented copy of a Hollywood film, that's not "healed" by it being a copy of a copy. It's still the same thing.
It's due diligence. Especially once someone uses (or publishes) something. And it very much depends on circumstances. Did they do it deliberately, specifically ignoring being in violation of something? If they were wrongfully under the assumption it was a legal copy, then it's more analogous to fencing. They're not in trouble for stealing anymore, but can be ordered to let go of the stolen goods. I'd say that's pretty much the same liability as with other things. I kill someone with my car. Now the question is have I been neglectful? Did I know the brakes were faulty but I didn't repair them and used the car nonetheless? Or did the car manufacturer mess up? There might be a case against me, or the manufacturer or both. And both civil and criminal law can be involved in different ways.
When researchers and developers share datasets, what duty do they have to check how the contents were obtained by whoever assembled it?
I'd do it like with shipments in the industry. If you receive a truck load of nuts and bolts, you take 50 of them out and check them before accepting the shipment and integrating the lot into your products.
Whether you downloaded a legal copy depends on whether the party offering the download had the right to do so. Whether that is the case may not be apparent.
Though that is very hypothetical. If the torrent has annas-archive or libgen.is in the title... It's pretty obvious. And that was what happened here. They did it deliberately and we know they knew.
And this week the next lawsuit started, alleging they (Meta) uploaded tons of porn videos (illegally) to be able to download what they were interested in, since Bittorrent has a tit for tat mechanism.
So I believe we first need to address the blatant piracy before talking about hypothetical scenarios. I believe that's going to be easier, though. I proposed to mandate transparency with what a company piled up in a dataset. One of the reasons was to address this. Like with the DMCA and GDPR, this could be a relatively simple mechanism where the provider (or company) gets some leeway, since it indeed is complicated. People will get a procedure to file a complaint and then someone can have a look whether it was wrongly included.
You said that there should be a way that you can remove your personal data from the training set. That implies that an AI company can offer money in exchange for people not removing their data. That's basically a licensing fee, however it is framed.
I wasn't concerned with copyright here. Let's say I'm politically active and someone leaks my address and now people start showing up, throwing eggs at my front door and threatening to kill me. Or someone spreads lies about me and that gets ingested. Or I'm a regular person and someone posted revenge porn of me. Or I'm a victim of a crime and that's always the fist thing that shows up when someone puts in my name and it's ruining my life. That needs to be addressed/removed. Free of charge. And that has nothing to do with licensing fees for content or celebrities. When companies use data, they need to have a complaints department and that will immediately check whether the complaint is valid and then act accordingly. There needs to be a distinction between harmful content and copyright violations.
Ultimately, Fair Use derives from the US Constitution; from the copyright clause but also freedom of speech. Copyright law spells out 4 factors that must be taken into account. But courts may also consider other factors.
Thanks for explaining. I didn't know those were only guidelines. But it makes sense and that's generally different between common law countries and whatever we are called, civil law countries?
Exactly, they don't pay more for the same thing. It's almost exclusive to the copyright industry.
And that is for a good reason. Generally physical things can't be copied easily. So handling copying isn't really necessary with physical goods. That's kind of in the word "copyright". Though when licensing for example immaterial goods, you're also buying a different license and different rights, and not the same thing.
Maybe think more in terms of services and licensing, since that's the main point here. In the material world that'd be something like the difference between renting an excavator for 2 weeks or buying the same one. It'll be exactly the same excavator I get. It's going to be a very different number on the bill, and I get different rights and obligations.
Of course, you will have to pay more for the right to make copies than for a single copy. And even more for the exclusive copyright. Those things are different.
Sure. Since I grew up with the German model, I'd open yet another category for AI training so it can be handled specifically. I mean it doesn't really fit into anything existing. AI is neither making copies, nor a copy, but still it uses it. And it's also not an art form or a citation. So I need a good argument why it needs to be mushed together with something else.
And datasets and model-weights are yet different things. Since we agree that AI training is transformative, we can confine copyright to the datasets and it's not much of an issue with the learned model weights. Or at least it shouldn't be. And I mean we have enough other issues to deal with that arise from the models itself.
"how my server was targeted by Alibaba" I agree that the situation is far from ideal. [...] It's a difficult issue. Mind that an opt-out from AI training does not actually address this.
I think you underestimate the consequences. The AI Fair Use plus the illegitimate scraping lead to a quite substantial war on the internet. Now every entity is fighting for their own. People like me are at the bottom of the chain and we have to protect our servers simply so they don't burn down. Big content platforms wage war as well. They don't want "their" content to be scraped. Leaving it open like before only cuts into their business. They rather sell it themselves. So they started making lots of things inaccessible by technical means, and combat the freedom we had before.
And that's the conundrum. In practice, this leads to the opposite. My own Fair Use of content (and that of other normal people and smaller businesses) is collateral damage. I used to archive some videos and I run a PeerTube instance. And now Google blocks all datacenter IPs, so you can just watch Youtube from a residential internet connection. They introduced rate-limiting. Reddit's API debacle in 2023 was largely about this. Countless other services and platforms have become enshittified due to this. And many more will.
Idk if the average consumer already notices. But it's really bad once you look under the hood. And this is not sustainable. And benefactors of this war are mostly big companies. Like Reddit, who found a way to make profit off of it. And Cloudflare, who were way too big of a dominating central internet instance before, and now they're the arms dealer in the war against scraping and that makes them even bigger.
All the while the internet gets more locked down, enshittified... And everyone who isn't the big content industry or already a monopolist, loses.
"favour of the feudal lord companies and to the detriment of the average person" [...] How so?
See my text above. Even if it was a nice idea, it leads to the opposite in the real world. A few big internet companies "win" in this war with technology, disregarding the idea behind the law, and everyone exept them loses. Cementing monopolies, not helping with them.
And more generally because most AI companies are billion dollar companies, they own half the internet ("land"), and a random nonfiction book author is a random individual with a very moderate income. And Fair Use now says the labour of the small guy is free of charge for the big company.
This is what you believe based on what you see. The rest is just slogans by the copyright industry, which you repeat without thinking. The problem is that you are basically shouting yourself down; your own opinion.
Ultimately, I'm not set on any ideology here. I'm regularly more concerned with making things work. And that's my goal here, too. I want a world that includes the existence of books and TV shows. So I need people to do that job. Now jobs can't be done if the people doing it starve.
Copyright is just a tool trying to achieve that. And kind of an half-way obsolete one with a lot of negative side-effects. I'm not set on it. We just need a way so books and TV shows are still a thing in 20 years. And that's my concern here and why I talk a lot about labour involved, and never about how they deserve to get rich if they're popular or if they manage the stuff.
And I see roughly 3 options for the future: a) Nobody pays them, or b) people who make use of their labour pay them, or c) some people pay, some get a free pass.
And the way I see it is a) is a future where quality and professional content is likely going to vanish big scale. And I'm not sure if the exact pre-copyright model applies to our modern world. Things have changed. For example copying things was an expensive process back then and required very expensive machinery. When it's done at no cost and by everyone in the digital age.
b) is what I'm advocating for. Everyone needs to pay. Preferrably not every taxpayer, but people who actually use the stuff.
And c) is what I called a "subsidy" when everyone gets to use it but only a group of people pays for everyone.I mean what's your idea here? I can't really tell. Let's say we're not set on copyright. How do $90,000 arrive at a book author each year so it's a viable job and they can create something full time? And I'd like a fair solution for society.