OpenAI Says It’s "Over" If It Can’t Steal All Your Copyrighted Work
-
Obligatory: I'm anti-AI, mostly anti-technology
That said, I can't say that I mind LLMs using copyrighted materials that it accesses legally/appropriately (lots of copyrighted content may be freely available to some extent, like news articles or song lyrics)
I'm open to arguments correcting me. I'd prefer to have another reason to be against this technology, not arguing on the side of frauds like Sam Altman. Here's my take:
All content created by humans follows consumption of other content. If I read lots of Vonnegut, I should be able to churn out prose that roughly (or precisely) includes his idiosyncrasies as a writer. We read more than one author; we read dozens or hundreds over our lifetimes. Likewise musicians, film directors, etc etc.
If an LLM consumes the same copyrighted content and learns how to copy its various characteristics, how is it meaningfully different from me doing it and becoming a successful writer?
and learns how to copy its various characteristics
Because you are a human. Not an immortal corporation.
-
Our privacy was long gone well before AI companies were even founded, if people cared about their privacy then none of the largest tech companies would exist because they all spy on you wholesale.
In the US. The EU has proven that you can have perfectly functional privacy laws.
If your reasoning is based o the US not regulating their companies and so that makes it impossible to regulate them, then your reasoning is bad.
My reasoning is based upon observing the current Internet from the perspective of working in cyber security and dealing with privacy issues for global clients.
The GDPR is a step in the right direction, but it doesn't guarantee your digital privacy. It's more of a framework to regulate the trading and collecting of your personal data, not to prevent it.
No matter who or where you are, your data is collected and collated into profiles which are traded between data brokers. Anonymized data is a myth, it's easily deanonymized by data brokers and data retention limits do essentially nothing.
AI didn't steal your privacy. Advertisers and other data consuming entities have structured the entire digital and consumer electronics ecosystem to spy on you decades before transformers or even deep networks were ever used.
-
Fuck Sam Altmann, the fartsniffer who convinced himself & a few other dumb people that his company really has the leverage to make such demands.
Fartsniffer
-
If this passes, piracy websites can rebrand as AI training material websites and we can all run a crappy model locally to train on pirated material.
Fuck it. I'm training my home AI will the world's TV, Movies and Books.
-
gosh Ed Zitron is such an anodyne voice to hear, I felt like I was losing my mind until I listened to some of his stuff
Yeah, he has the ability to articulate what I was already thinking about LLMs and bring in hard data to back up his thesis that it’s all bullshit. Dangerous and expensive bullshit, but bullshit nonetheless.
It’s really sad that his willingness to say the tech industry is full of shit is such an unusual attribute in the tech journalism world.
-
Yeah but I don't sell ripped dvds and copies of other peoples art.
What if I run a filter over it. Transformative works are fine.
-
But when China steals all their (arguably not copywrite-able) work...
Sam Altman hasn't complained surprisingly, he just said there's competition and it will be harder for OpenAI to compete with open source. I think their small lead is essentially gone, and their plan is now to suckle Microsoft's teet.
-
This post did not contain any content.
OpenAI can open their asses and go fuck themselves!
-
This post did not contain any content.
China, the new boogeyman to replace the USSR
-
That information is published freely online.
Do companies have to avoid hiring people who read and were influenced by copyrighted material?
it's ok if you don't know how copyright works. also maybe look into plagiarism. there's a difference between relaying information you've learned and stealing work.
-
This post did not contain any content.
This is a tough one
Open-ai is full of shit and should die but then again, so should copyright law as it currently is
-
This post did not contain any content.
What I’m hearing between the lines here is the origin of a legal “argument.”
If a person’s mind is allowed to read copyrighted works, remember them, be inspired by them, and describe them to others, then surely a different type of “person’s” different type of “mind” must be allowed to do the same thing!
After all, corporations are people, right? Especially any worth trillions of dollars! They are more worthy as people than meatbags worth mere billions!
-
it's ok if you don't know how copyright works. also maybe look into plagiarism. there's a difference between relaying information you've learned and stealing work.
Training on publicly available material is currently legal. It is how your search engine was built and it is considered fair use mostly due to its transformative nature. Google went to court about it and won.
-
What I’m hearing between the lines here is the origin of a legal “argument.”
If a person’s mind is allowed to read copyrighted works, remember them, be inspired by them, and describe them to others, then surely a different type of “person’s” different type of “mind” must be allowed to do the same thing!
After all, corporations are people, right? Especially any worth trillions of dollars! They are more worthy as people than meatbags worth mere billions!
This has been the legal basis of all AI training sets since they began collecting datasets. The US copyright office heard these arguments in 2023: https://www.copyright.gov/ai/listening-sessions.html
MR. LEVEY: Hi there. I'm Curt Levey, President of the Committee for Justice. We're a nonprofit that focuses on a variety of legal and policy issues, including intellectual property, AI, tech policy. There certainly are a number of very interesting questions about AI and copyright. I'd like to focus on one of them, which is the intersection of AI and copyright infringement, which some of the other panelists have already alluded to.
That issue is at the forefront given recent high-profile lawsuits claiming that generative AI, such as DALL-E 2 or Stable Diffusion, are infringing by training their AI models on a set of copyrighted images, such as those owned by Getty Images, one of the plaintiffs in these suits. And I must admit there's some tension in what I think about the issue at the heart of these lawsuits. I and the Committee for Justice favor strong protection for creatives because that's the best way to encourage creativity and innovation.
But, at the same time, I was an AI scientist long ago in the 1990s before I was an attorney, and I have a lot of experience in how AI, that is, the neural networks at the heart of AI, learn from very large numbers of examples, and at a deep level, it's analogous to how human creators learn from a lifetime of examples. And we don't call that infringement when a human does it, so it's hard for me to conclude that it's infringement when done by AI.
Now some might say, why should we analogize to humans? And I would say, for one, we should be intellectually consistent about how we analyze copyright. And number two, I think it's better to borrow from precedents we know that assumed human authorship than to invent the wheel over again for AI. And, look, neither human nor machine learning depends on retaining specific examples that they learn from.
So the lawsuits that I'm alluding to argue that infringement springs from temporary copies made during learning. And I think my number one takeaway would be, like it or not, a distinction between man and machine based on temporary storage will ultimately fail maybe not now but in the near future. Not only are there relatively weak legal arguments in terms of temporary copies, the precedent on that, more importantly, temporary storage of training examples is the easiest way to train an AI model, but it's not fundamentally required and it's not fundamentally different from what humans do, and I'll get into that more later if time permits.
The "temporary copy" idea is pretty central for visual models like Midjourney or DALL-E, whose training sets are full of copyrighted works lol. There is a legal basis for temporary copies too:
The "Ephemeral Copy" Exception (17 U.S.C. § 112 & § 117)
U.S. copyright law recognizes temporary, incidental, and transitory copies as necessary for technological processes. Section 117 allows temporary copies for software operation. Section 112 permits temporary copies for broadcasting and streaming.
-
Training on publicly available material is currently legal. It is how your search engine was built and it is considered fair use mostly due to its transformative nature. Google went to court about it and won.
can you point to the trial they won? I only know about a case that was dismissed.
because what we've seen from ai so far is hardly transformative.
-
China, the new boogeyman to replace the USSR
Has been since 1991
-
Sam Altman hasn't complained surprisingly, he just said there's competition and it will be harder for OpenAI to compete with open source. I think their small lead is essentially gone, and their plan is now to suckle Microsoft's teet.
it will be harder for OpenAI to compete with open source
Can we revoke the word open from their name? Please?
-
This post did not contain any content.
I feel like it would be ok if AI generated images/text would be clearly marked(but i dont think its possible in the case of text)
Who would support something made stealing the hard work of other people if they could tell instantly
-
This is a tough one
Open-ai is full of shit and should die but then again, so should copyright law as it currently is
yes, screw them both. let altman scrape all the copyright material and choke on it
-
Sorry, wasn’t trying to be a dick. Just couldn’t think of it at the time.