OpenAI declares AI race “over” if training on copyrighted works isn’t fair use
-
That's a good litmus test. If asking/paying artists to train your AI destroys your business model, maybe you're the arsehole.
Not only that, but their business model doesn't hold up if they were required to provide their model weights for free because the material that went into it was "free".
-
This post did not contain any content.
If AI gets to use copyrighted material for free and makes a profit off of the results, that means piracy is 1000% Legal.
Excuse me while I go and download a car!! -
How many pages has a human author read and written before they can produce something worth publishing? I’m pretty sure that’s not even a million pages. Why does an AI require a gazillion pages to learn, but the quality is still unimpressive? I think there’s something fundamentally wrong with the way we teach these models.
The more important question is: Why can a human absorb a ton of material in their learning without anyone crying about them "stealing"? Why shouldn't the same go for AI? What's the difference? I really don't understand the common mindset here. Is it because a trained AI is used for profit?
-
This post did not contain any content.
Sam Altman is a grifter, but on this topic he is right.
The reality is, that IP laws in their current form hamper innovation and technological development. Stephan Kinsella has written on this topic for the past 25 years or so and has argued to reform the system.
Here in the Netherlands, we know that it's true. Philips became a great company because they could produce lightbulbs here, which were patented in the UK. We also had a booming margarine business, because we weren't respecting British and French patents and that business laid the foundation for what became Unilever.
And now China is using those exact same tactics to build up their industry. And it gives them a huge competitive advantage.
A good reform would be to revert back to the way copyright and patent law were originally developed, with much shorter terms and requiring a significant fee for a one time extension.
The current terms, lobbied by Disney, are way too restrictive.
-
But what data would it be?
Part of the "gobble all the data" perspective is that you need a broad corpus to be meaningfully useful. Not many people are going to give a $892 billion market cap when your model is a genius about a handful of narrow subjects that you could get deep volunteer support on.
OTOH maybe there's probably a sane business in narrow siloed (cheap and efficient and more bounded expectations) AI products: the reinvention of the "expert system" with clear guardrails, the image generator that only does seaside background landscapes but can't generate a cat to save its life, the LLM that's a prettified version of a knowledgebase search and NOTHING MORE
You've highlighted exactly why I also fundamentally disagree with the current trend of all things AI being for-profit. This should be 100% non-profit and driven purely by scientific goals, in which case using copyrighted data wouldn't even be an issue in the first place... It'd be like literally giving someone access to a public library.
Edit: but to focus on this specific instance, where we have to deal with the here-and-now, I could see them receiving, say, 60-75% of what they have now, hassle-free. At the very least, and uniformly distributed. Again, AI development isn't what irks most people, it's calling plagiarism generators and search engine fuck-ups AI and selling them back to the people who generated the databases - or, worse, working toward replacing those people entirely with LLMs! - they used for those abhorrences.
Train the AI to be factually correct instead and sell it as an easy-to-use knowledge base? Aces! Train the AI to write better code and sell it as an on-board stackoverflow Jr.? Amazing! Even having it as a mini-assistant on your phone so that you have someone to pester you to get the damned laundry out of the washing machine before it starts to stink is a neat thing, but that would require less advertising and shoving down our throats, and more accepting the fact that you can still do that with five taps and a couple of alarm entries.
Edit 2: oh, and another thing which would require a buttload of humility, but would alleviate a lot of tension would be getting it to cite and link to its sources every time! Have it be transformative enough to give you the gist without shifting into plagiarism, then send you to the source for the details!
-
The more important question is: Why can a human absorb a ton of material in their learning without anyone crying about them "stealing"? Why shouldn't the same go for AI? What's the difference? I really don't understand the common mindset here. Is it because a trained AI is used for profit?
It is because a human artist is usually inspired and uses knowledge to create new art and AI is just a mediocre mimic. A human artist doesn't accidentally put six fingers on people on a regular basis. If they put fewer fingers it is intentional.
-
This post did not contain any content.
Corporations trying to profit by closing off vast tracts of human output are bumping into other corporations trying to mine it for profit.
-
Not only that, but their business model doesn't hold up if they were required to provide their model weights for free because the material that went into it was "free".
There's also an argument that if the business was that reliant on free things to start with, then it shouldn't be a business.
No-one would bat their eyes if the CEO of a real estate company was sobbing that it's the end of the rental market, because the company is no longer allowed to get houses for free.
-
Good if AI fails because it can't abuse copyright. Fuck AI.
*except the stuff used for science that isn't trained on copyrighted scraped data, that use is fine
Yeah unfortunately we’ve started calling any LLM “AI”
-
This post did not contain any content.
Please, let it be over. Idiotic "ai"....
-
Copyright has not, was not intended to, and does not currently, pay artists.
You are correct, copyright is ownership, not income. I own the copyright for all my work (but not work for hire) and what I do with it is my discretion.
What is income, is the content I sell for the price acceptable to the buyer. Copyright (as originally conceived) is my protection so someone doesn't take my work and use it to undermine my skillset. One of the reasons why penalties for copyright infringement don't need actual damages and why Facebook (and other AI companies) are starting to sweat bullets and hire lawyers.
That said, as a creative who relied on artistic income and pays other creatives appropriately, modern copyright law is far, far overreaching and in need of major overhaul. Gatekeeping was never the intent of early copyright and can fuck right off; if I paid for it, they don't get to say no.
modern copyright law is far, far overreaching and in need of major overhaul.
https://rufuspollock.com/papers/optimal_copyright_term.pdf
This research paper from Rufus Pollock in 2009 suggests that the optimal timeframe for copyright is 15 years. I've been referencing this for, well, 16 years now, a year longer than the optimum copyright range. If I recall correctly I first saw this referenced by Mike Masnick of techdirt.
-
Alright, I confess! Almost all of my training in computer programming came from copyrighted material. Put the cuffs on me!
You were trained and learned and are able to create new things.
AI poorly mimics thngs it has seen before.
-
Aaron Swartz was 100% opposed to all copyright laws, you remember that yah?
I'm not just a copyright abolitionnist, I also abhor all intellectual property. Yes, even trademsrk
-
This post did not contain any content.
-
Wrong in all points.
No, actually, I'm not at all. In-fact, I'm totally right:
https://www.youtube.com/watch?v=mhBpI13dxkI
Copyright originated create a monopoly to protect printers, not artists, to create a monopoly around a means of distribution.
How many artists do you know? You must know a few. How many of them have received any income through copyright. I dare you, to in good faith, try and identify even one individual you personally know, engaged in creative work, who makes any meaningful amount of money through copyright.
I know quite a few people who rely on royalties for a good chunk of their income. That includes musicians, visual artists and film workers.
Saying it doesn’t exist seems very ignorant.
-
This post did not contain any content.
Then die. I don't know what else to tell you.
If your business model is predicated on breaking the law then you don't deserve to exist.
You can't send people to prison for 5 years and charge them $100,000 for downloading a movie and then turn around and let big business do it for free because they need to "train their AI model" and call one of thief but not the other...
-
The more important question is: Why can a human absorb a ton of material in their learning without anyone crying about them "stealing"? Why shouldn't the same go for AI? What's the difference? I really don't understand the common mindset here. Is it because a trained AI is used for profit?
I’ve been thinking about that as well. If an author has bought 500 books, and read them, it’s obviously going to influence the books they write in the future. There’s nothing illegal about that. Then again, they did pay for the books, so I guess that makes it fine.
What if they got the books from a library? Well, they probably also paid taxes, so that makes it ok.
What if they pirated those books? In that case, the pirating part is problematic, but I don’t think anyone will sue the author for copying the style of LOTR in their own works.
-
The more important question is: Why can a human absorb a ton of material in their learning without anyone crying about them "stealing"? Why shouldn't the same go for AI? What's the difference? I really don't understand the common mindset here. Is it because a trained AI is used for profit?
What you're talking about is if AI is actually inventing new work (imo, yes it is), but that's not the issue.
The issue is these models were trained on our collective knowledge & culture without permission, then sold back to us.
Unless they use only proprietary & public training data, every single one of these models should be open sourced/weighted & free for anyone to use, like libraries.
-
This post did not contain any content.
He's afraid of losing his little empire.
OpenAI also had no clue on recreation the happy little accident that gave them chatGPT3. That's mostly because their whole thing was using a simple model and brute forcing it with more data, more power, more nodes and then even more data and power until it produced results.
As expected, this isn't sustainable. It's beyond the point of decreasing returns. But Sam here has no idea on how to fix that with much better models so goes back to the one thing he knows: more data needed, just one more terabyte bro, ignore the copyright!
And now he's blaming the Chinese into forcing him to use even more data.
-
This post did not contain any content.
Many of you are completely two-faced on copyright laws.