OpenAI declares AI race “over” if training on copyrighted works isn’t fair use
-
This post did not contain any content.
Oh no! How could we ever live without AI?
-
This post did not contain any content.
His personal race is over? Oooohhhh, so sorry for him.
AI is not over at all. Maybe he himself will not become the ruler of the world now. No loss.
-
Aaron Swartz was 100% opposed to all copyright laws, you remember that yah?
Yes, and he killed himself after the FBI was throwing the book at him for doing exactly what these AI assholes are doing without repercussion
-
Copyright has not, was not intended to, and does not currently, pay artists.
Wrong in all points.
Copyright has paid artists (though maybe not enough). Copyright was intended to do that (though maybe not that alone). Copyright does currently pay artists (maybe not in your country, I don't know that).
Wrong in all points.
No, actually, I'm not at all. In-fact, I'm totally right:
https://www.youtube.com/watch?v=mhBpI13dxkI
Copyright originated create a monopoly to protect printers, not artists, to create a monopoly around a means of distribution.
How many artists do you know? You must know a few. How many of them have received any income through copyright. I dare you, to in good faith, try and identify even one individual you personally know, engaged in creative work, who makes any meaningful amount of money through copyright.
-
This particular vein of "pro-copyright" thought continuously baffles me. Copyright has not, was not intended to, and does not currently, pay artists.
Its totally valid to hate these AI companies. But its absolutely just industry propaganda to think that copyright was protecting your data on your behalf
Copyright has not, was not intended to, and does not currently, pay artists.
You are correct, copyright is ownership, not income. I own the copyright for all my work (but not work for hire) and what I do with it is my discretion.
What is income, is the content I sell for the price acceptable to the buyer. Copyright (as originally conceived) is my protection so someone doesn't take my work and use it to undermine my skillset. One of the reasons why penalties for copyright infringement don't need actual damages and why Facebook (and other AI companies) are starting to sweat bullets and hire lawyers.
That said, as a creative who relied on artistic income and pays other creatives appropriately, modern copyright law is far, far overreaching and in need of major overhaul. Gatekeeping was never the intent of early copyright and can fuck right off; if I paid for it, they don't get to say no.
-
This post did not contain any content.
Alright, I confess! Almost all of my training in computer programming came from copyrighted material. Put the cuffs on me!
-
This post did not contain any content.
Okay, bye!
-
Sad to see you leave (not really, tho'), love to watch you go!
Edit: I bet if any AI developing company would stop acting and being so damned shady and would just ASK FOR PERMISSION, they'd receive a huge amount of data from all over. There are a lot of people who would like to see AGI become a real thing, but not if it's being developed by greedy and unscrupulous shitheads. As it stands now, I think the only ones who are actually doing it for the R&D and not as eye-candy to glitz away people's money for aesthetically believable nonsense are a handful of start-up-likes with (not in a condescending way) kids who've yet to have their dreams and idealism trampled.
But what data would it be?
Part of the "gobble all the data" perspective is that you need a broad corpus to be meaningfully useful. Not many people are going to give a $892 billion market cap when your model is a genius about a handful of narrow subjects that you could get deep volunteer support on.
OTOH maybe there's probably a sane business in narrow siloed (cheap and efficient and more bounded expectations) AI products: the reinvention of the "expert system" with clear guardrails, the image generator that only does seaside background landscapes but can't generate a cat to save its life, the LLM that's a prettified version of a knowledgebase search and NOTHING MORE
-
I mean if they pay for it like everyone else does I don't think it is a problem. Yes it will cost you billions and billions to do it correctly, but then you basically have the smartest creature on earth (that we know of) and you can replicate/improve on it in perpetuity. We still will have to pay you licensing fees to use it in our daily lives, so you will be making those billions back.
Now I would say let them use anything that is old and freeware, textbooks, etc. government owned stuff - we sponsored it with our learning, taxes - so we get a percentage in all AI companies. Humanity gets a 51% stake in any AI business using humanity's knowledge, so we are then free to vote on how the tech is being used and we have a controlling share, also whatever price is set, we get half of it back in taxes at the end of the year. The more you use it the more you pay and the more you get back.
The owners of the copyrighted works should be paid in perpetuity too though, since part of their work goes into everything the AI spits out.
-
Wrong in all points.
No, actually, I'm not at all. In-fact, I'm totally right:
https://www.youtube.com/watch?v=mhBpI13dxkI
Copyright originated create a monopoly to protect printers, not artists, to create a monopoly around a means of distribution.
How many artists do you know? You must know a few. How many of them have received any income through copyright. I dare you, to in good faith, try and identify even one individual you personally know, engaged in creative work, who makes any meaningful amount of money through copyright.
I know several artists living off of selling their copyrighted work, and no one in the history of the Internet has ever watched a 55 minute YouTube video someone linked to support their argument.
-
This post did not contain any content.
How many pages has a human author read and written before they can produce something worth publishing? I’m pretty sure that’s not even a million pages. Why does an AI require a gazillion pages to learn, but the quality is still unimpressive? I think there’s something fundamentally wrong with the way we teach these models.
-
That's a good litmus test. If asking/paying artists to train your AI destroys your business model, maybe you're the arsehole.
Not only that, but their business model doesn't hold up if they were required to provide their model weights for free because the material that went into it was "free".
-
This post did not contain any content.
If AI gets to use copyrighted material for free and makes a profit off of the results, that means piracy is 1000% Legal.
Excuse me while I go and download a car!! -
How many pages has a human author read and written before they can produce something worth publishing? I’m pretty sure that’s not even a million pages. Why does an AI require a gazillion pages to learn, but the quality is still unimpressive? I think there’s something fundamentally wrong with the way we teach these models.
The more important question is: Why can a human absorb a ton of material in their learning without anyone crying about them "stealing"? Why shouldn't the same go for AI? What's the difference? I really don't understand the common mindset here. Is it because a trained AI is used for profit?
-
This post did not contain any content.
Sam Altman is a grifter, but on this topic he is right.
The reality is, that IP laws in their current form hamper innovation and technological development. Stephan Kinsella has written on this topic for the past 25 years or so and has argued to reform the system.
Here in the Netherlands, we know that it's true. Philips became a great company because they could produce lightbulbs here, which were patented in the UK. We also had a booming margarine business, because we weren't respecting British and French patents and that business laid the foundation for what became Unilever.
And now China is using those exact same tactics to build up their industry. And it gives them a huge competitive advantage.
A good reform would be to revert back to the way copyright and patent law were originally developed, with much shorter terms and requiring a significant fee for a one time extension.
The current terms, lobbied by Disney, are way too restrictive.
-
But what data would it be?
Part of the "gobble all the data" perspective is that you need a broad corpus to be meaningfully useful. Not many people are going to give a $892 billion market cap when your model is a genius about a handful of narrow subjects that you could get deep volunteer support on.
OTOH maybe there's probably a sane business in narrow siloed (cheap and efficient and more bounded expectations) AI products: the reinvention of the "expert system" with clear guardrails, the image generator that only does seaside background landscapes but can't generate a cat to save its life, the LLM that's a prettified version of a knowledgebase search and NOTHING MORE
You've highlighted exactly why I also fundamentally disagree with the current trend of all things AI being for-profit. This should be 100% non-profit and driven purely by scientific goals, in which case using copyrighted data wouldn't even be an issue in the first place... It'd be like literally giving someone access to a public library.
Edit: but to focus on this specific instance, where we have to deal with the here-and-now, I could see them receiving, say, 60-75% of what they have now, hassle-free. At the very least, and uniformly distributed. Again, AI development isn't what irks most people, it's calling plagiarism generators and search engine fuck-ups AI and selling them back to the people who generated the databases - or, worse, working toward replacing those people entirely with LLMs! - they used for those abhorrences.
Train the AI to be factually correct instead and sell it as an easy-to-use knowledge base? Aces! Train the AI to write better code and sell it as an on-board stackoverflow Jr.? Amazing! Even having it as a mini-assistant on your phone so that you have someone to pester you to get the damned laundry out of the washing machine before it starts to stink is a neat thing, but that would require less advertising and shoving down our throats, and more accepting the fact that you can still do that with five taps and a couple of alarm entries.
Edit 2: oh, and another thing which would require a buttload of humility, but would alleviate a lot of tension would be getting it to cite and link to its sources every time! Have it be transformative enough to give you the gist without shifting into plagiarism, then send you to the source for the details!
-
The more important question is: Why can a human absorb a ton of material in their learning without anyone crying about them "stealing"? Why shouldn't the same go for AI? What's the difference? I really don't understand the common mindset here. Is it because a trained AI is used for profit?
It is because a human artist is usually inspired and uses knowledge to create new art and AI is just a mediocre mimic. A human artist doesn't accidentally put six fingers on people on a regular basis. If they put fewer fingers it is intentional.
-
This post did not contain any content.
Corporations trying to profit by closing off vast tracts of human output are bumping into other corporations trying to mine it for profit.
-
Not only that, but their business model doesn't hold up if they were required to provide their model weights for free because the material that went into it was "free".
There's also an argument that if the business was that reliant on free things to start with, then it shouldn't be a business.
No-one would bat their eyes if the CEO of a real estate company was sobbing that it's the end of the rental market, because the company is no longer allowed to get houses for free.
-
Good if AI fails because it can't abuse copyright. Fuck AI.
*except the stuff used for science that isn't trained on copyrighted scraped data, that use is fine
Yeah unfortunately we’ve started calling any LLM “AI”