Viewing a single comment thread. View all comments

yaosio t1_iuz4uo0 wrote

There is an argument that co-pilot outputting open source code without credit or the license breaks the license. It will output stuff from open source projects verbatim (I can't find the link, maybe it was in Twitter? I can't back this up.), so this isn't a case where the code is inspired by the code, it really is the code and has to abide by the license.

One solution without messing with co-pilot training or output is to have a second program look at code being generated to see if it's coming from any of the open source projects on gitbub and let the user know so they can abide by the license.

36

CapaneusPrime t1_iuzgsq3 wrote

>There is an argument that co-pilot outputting open source code without credit or the license breaks the license. It will output stuff from open source projects verbatim (I can't find the link, maybe it was in Twitter? I can't back this up.), so this isn't a case where the code is inspired by the code, it really is the code and has to abide by the license.

There is an argument that this doesn't matter (from GitHub's perspective).

It's already been pretty well established that AI can be trained on copyrighted photos without issue.

That said image generating AI can produce works which infringe on copyright. So, Copilot could certainly produce code covered by a license. Which would then possibly lead to the Copilot user being in violation of the license.

That said...

While code is copyrighted, the protections of that copyright aren't absolute.

For instance, I don't think anyone would doubt that there are examples of code under license which includes elements lifted from elsewhere without attribution—stackoverflow, etc—for which the author would not have a valid claim of authorship.

But, even for people who wrote their own code, 100% by scratch, there are limitations.

If the copying is a very small element of the whole it's less likely to be problematic.

If the code represents a standard method of doing something or if there's only a few ways to accomplish what the code does, it's not likely to be able to be copyrightable.

Now, the vast majority of my programming work is done in purely functional programming languages—object-oriented languages have much more opportunity for creative expression. I write a lot of code implementing algorithms, most of which are very complex, and I'd be very hard pressed to justify claiming the copyright on most of the code I write.

Regardless of how clever I think some of my code may be, I'm also certain that any other competent person implementing the same algorithm would end up with code >95% essentially identical to mine.

Honestly, I don't see this lawsuit going anywhere, as I understand it, any copied snippets are fairly short and standard.

21

Alikont t1_iv0oi1g wrote

> includes elements lifted from elsewhere without attribution—stackoverflow

Users of Stackoverflow sign that their code snippets are public and no attribution is required as a part of Stackoverflow TOS

9

CapaneusPrime t1_iv0t6pr wrote

You missed the point, I'm not making a spurious "whataboutism" claim.

Attribution is required by copyright.

If I take a snippet of code from stackoverflow and put it in my open source project, that's fine.

Nobody is saying it isn't.

What isn't fine is slapping a license on a file which includes that code without specifying that that code isn't subject to the license—that's claiming ownership of something which isn't yours and trying to attach a license to it.

Beyond that, you really need to re-read the stackoverflow ToS, because they don't quite say what you seem to think.

4

Takahashi_Raya t1_iv0ammn wrote

>It's already been pretty well established that AI can be trained on copyrighted photos without issue.

It hasn't that is why ghetty has blocked ai and the art world is incredibly hatefull against AI and moving in the same way the creators started this lawsuit. There is a reason why university's have law and ethics classes regarding AI where it is explicitly told to not train on anything that is not public domain or licensed.

The fact facial recognition waa trained on millions of foto's that where present on facebook is still a sore sting in many people's minds. Dont confuse AI startups ignoring ethics and laws with reality.

If this lawsuit is a succes expect the ai tech world to be on fire very quickly. IP lawyers are frothing at their mouths for a while to get a slice of this

6

farmingvillein t1_iv1x778 wrote

> It hasn't that is why ghetty has blocked ai

You are right that OP is wrong (re:whether this is a settled legal issue)...but let's not pretend that ghetty [sic] doing so has to do with anything than attempted revenue maximization on their part.

Successful, prolific AI art devalues their portfolio of images, and they know that.

2

Takahashi_Raya t1_iv20ewh wrote

I mean that is very much part of it but it is indeed not the only reason ghetty did that.

1

farmingvillein t1_iv29u6c wrote

Getty and Shutterstock literally turned around and partnered with generative AI companies--who do exactly what you flag as a problem--to sell images on their platforms.

1

Takahashi_Raya t1_iv2c6qi wrote

Getty and shutterstock partnered with OpenAI (creators of Dall-E) and with BRIA. Both company's who's training data has been confirmed to be ethically sourced and only contain public DOMAIN images and images they have licenses too.

the ones who are under scrutiny from community's are Midjourney, stablediffusion & novelAI. when it comes to image gen due to them not adhering to the ethics in AI data usage.

OpenAI is mentioned in the current main topic of Co-Pilot as well due to microsoft using their codex model as part of co-pilot but that doesn't change that Dall-E is ethically used.

1

Ronny_Jotten t1_iv0vo01 wrote

That decision wasn't about copyrighted photos. It was about Google creating a books search index, which was allowed as fair use - just like their scanning of books for previews is. That's an entirely different situation than if Google had trained an AI to write books for sale, that contained snippets or passages from the digitized books.

The latter certainly would not be considered fair use under the reasoning given by the judge in the case. He found that the search algorithm maintained:

> consideration for the rights of authors and other creative individuals, and without adversely impacting the rights of copyright holders

and that its incorporation into the Google Books system works to increase the sales of the copyrighted books by the authors. None of this can be said about Microsoft's product. It would seem to clearly fail the tests for fair use.

3

CapaneusPrime t1_iv1lheh wrote

>That decision wasn't about copyrighted photos.

And every knowledge person agrees this protects images as well.

Training a generative AI does not adversely impact the rights of artists.

This is really transformative fair use.

0

waffles2go2 t1_iv1c4z3 wrote

>https://medium.com/@brianjleeofcl/this-piece-should-be-retracted-ca740d9a36fe

Relevant bits - perhaps spouting off with a N=1 isn't the best look...

In practice, when SCOTUS denies the petition, the ruling made by the relevant appellate court is a legal precedent only within the the district (Second) where the circuit court has made its ruling. This means that a different court—say, the Ninth, which includes Silicon Valley—could go ahead and issue a ruling that directly opposes that of the Second. At this point, it becomes more likely that SCOTUS would grant cert since it would be a problem that under the same federal legal code, two opposing versions of case law could exist; after which the court would hear arguments and then finally issue a decision. Until that hypothetical occurs, there is no precedent set by a SCOTUS decision to note in this matter.

So a programmer who doesn't understand the law should take a harder look at what they post on Reddit unless the like being totally owned...

3

killver t1_iv0rejo wrote

> It's already been pretty well established that AI can be trained on copyrighted photos without issue.

This is one of the biggest misconceptions in AI at this point. This is just not true.

4

killver t1_iv0ycz4 wrote

If you trust a random blog, go ahead.

This ruling was for a very specific use case that cannot be generalized, and also only applies to US, even only a specific district. It is also totally unclear how it applies to generative model, which even the blog cited recognizes.

The AI community just loves to trust this as it is the easy and convenient thing to do.

Also see a reply to this post you shared: https://medium.com/@brianjleeofcl/this-piece-should-be-retracted-ca740d9a36fe

5

multiedge t1_iwiakg3 wrote

There's also a problem of this being used to scam Microsoft. I mean, I could license my code and publish it on github, then create another several github accounts and reuse that licensed code to be picked up by Copilot. I will then have legal grounds to sue them for using my licensed code.

1

chatterbox272 t1_iv4kbwb wrote

>It will output stuff from open source projects verbatim

I've seen this too, however only in pretty artificial circumstances. Usually in empty projects, and with some combination of exact function names/signatures, detailed comments, or trivially easy blocks that will almost never be unique. I've never seen an example posted in-context (in an existing project with it's own conventions) where this occurred.

>One solution without messing with co-pilot training or output is to have a second program look at code being generated to see if it's coming from any of the open source projects on gitbub and let the user know so they can abide by the license.

This kinda exists, there is a setting to block matching open-source code although reportedly it isn't especially effective (then again, I've only seen this talked about by people who also report frequent copy-paste behaviour, something I've not been able to replicate in normal use).

2

multiedge t1_iwib7xx wrote

There's also a problem of this being a scam. I mean, I could license my code and publish it on github, then create another several github accounts and reuse that licensed code to be picked up by Copilot. I will then have legal grounds to sue them for using my licensed code.

1