In the wake of all the questions and worries about models that can generate content nearing (or exceeding, in some cases) the quality of that made of humans, there are a couple mechanisms that companies should provide alongside their models. Both vary in feasibility, but in general, both are pretty doable, at least for what we've seen so far.

A hashing-based system to check whether a given piece of content was generated by the model. This can be accomplished by hashing all of the outputs of the model, and storing them. If it doesn't pose some sort of security risk for the generator, it could also provide the date of generation.
A model for discriminating whether a given piece of content was generated by the model, similar to this model for GPT-2. This is necessary in addition to the simpler hashing mechanism, since it's possible for only a portion of the media to be generated. This would be imperfect, of course, but if nothing else, we should press companies enough that they feel obligated to give it a dedicated try.

These mechanisms need real support - an API for developers, and a UI for less sophisticated users. They should have decent latency, and be hopefully be provided for free, at some level of usage - I understand the compute required could be enormous.

Curious what others think here :)

Comments

You must log in or register to comment.

hjmb t1_j1v0l07 wrote on December 27, 2022 at 4:42 PM

For 1:

This seems feasible to implement but easy to circumvent - a single change and the hash no longer matches. If you instead store an embedding in some semantic space you might be at least able to say "we generated something a lot like this for someone, once", but that's as good as you will get.

A similar idea is to embed watermarks in the artifacts. Stable Diffusion by default does this, (I believe using this package), but forkers down the line have intentionally removed it.

Unfortunately it seems as sensible and feasible (and serves the same purpose of attribution) as cryptographically signing output in other fields, and we haven't had much luck persuading people to do that either.

It also doesn't apply to models run locally.

For 2:

That page you link to points out that the model is not deemed fit for purpose by the researchers (at least, when on its own) and that they expect this problem to only get more challenging as model sizes increase.

For someone trying to circumvent the discriminator; if they have access to the discriminator then they can adjust their artifacts so the discriminator no longer flags them.

I don't believe either solution is robust against bad actors. I also don't think attribution itself solves the problems caused by human-plausible content generation, but that is almost certainly a perspective thing.

Finally: This is not my field. Any corrections to the above please let me know.

Exnur0 OP t1_j1v2zrb wrote on December 27, 2022 at 4:58 PM

Thanks for the insightful comment, I think you helped me get out the rest of my thought.

I definitely agree, both of these have gaping holes in them if anyone with expertise comes along. The second mechanism is meant to plug holes in the first, but people can definitely construct media to get past the second anyway. Ideally it would return a value 0-1 and inform the human's level of suspicion, and not just a boolean, but still, it's likely that certain tricks would end up helping someone get things past it.

I think these are useful mostly because of what they may be able to accomplish with regard to the scale of the problem - any additional effort required to pass off AI work as human is a good thing, as far as I'm concerned, and some of the scariest implications of these kinds of models come from their scale - moderation is more or less impossible, if you have to deal with limitless examples of generated content.

For example, take the problem of misinformation, like what happened on StackOverflow (GPT-generated answers were banned, largely because they're often wrong, is my understanding). Imagine that StackOverflow had access to an API that could reliably point out unedited (or close to it) generated content. In that case, the scope of the problem shrinks to only those people willing to put in the effort to slip things past the discriminator, which hopefully will be much smaller of a set.

I also definitely agree that there are other problems that aren't solved by discrimination at all, even if discrimination was perfect - really, the underlying point is that the labs cranking out powerful generative models could be doing much, much more in terms of accompanying tooling, to try to decrease the negative impacts of their tech. I don't see what I'm describing as bulletproof or as always useful, but it strikes me as kind of a bare-minimum precaution. If nothing else, I should be able to put completely unedited generated media into an API and get back an answer that it was generated.

Featureless_Bug t1_j1uygsw wrote on December 27, 2022 at 4:27 PM

Why should they do it again?

gkaykck t1_j1uzq8f wrote on December 27, 2022 at 4:36 PM

Personally, I'd like to be able to filter out AI generated content from my feeds sometimes.

Featureless_Bug t1_j1v4w14 wrote on December 27, 2022 at 5:10 PM

Sure, some users might be interested in it. Why would OpenAI do it, though? Especially given a wide range of open source alternatives that you can run on your own cluster

gkaykck t1_j1v72ay wrote on December 27, 2022 at 5:25 PM

I think if this is going to be implemented, it has to be at model level, not as an extra layer on top. Just thinking outloud with my not so great ML knowledge, if we mark every image in training data with some special and static "noise" which is unnoticable to human eyes, all the images generated will be marked with the same "noise". So this is for running open source alternatives on your own cluster. I think if this kind of "watermarking" will be implemented, it needs to be done in the model itself.

When it comes to "why would OpenAI do it", it would be nice for them to be able to track where does their generated pictures/content end up to for investors etc. This can also help them "license" the images generated with their models instead of charging per run.

Exnur0 OP t1_j1vgj6f wrote on December 27, 2022 at 6:26 PM

You don't actually have to watermark images in order to know that you generated them, at least not if you're checking exactly the same image - you can just hash the image, or store a low-dimension representation of it as a fingerprint (people sometimes use color histograms, in principle you could use anything). Then, you can look up images against that data to see if it's one of the ones you produced.

Brudaks t1_j21x8ut wrote on December 29, 2022 at 1:39 AM

Thing is, we can't really do that for text, natural language doesn't no free variation where you could insert a sufficient bits of special noise unnoticable to human eyes. Well, you might add some data with various formatting or unicode trickery, but that would be trivially removable by anyone who cared.

Featureless_Bug t1_j1veefz wrote on December 27, 2022 at 6:12 PM

>I think if this is going to be implemented, it has to be at model level, not as an extra layer on top. Just thinking outloud with my not so great ML knowledge, if we mark every image in training data with some special and static "noise" which is unnoticable to human eyes, all the images generated will be marked with the same "noise".

This is already wrong - it might work, it might not work

>So this is for running open source alternatives on your own cluster.

Well, of course the open source models will be trained on data without any noise added, people are not stupid

>When it comes to "why would OpenAI do it", it would be nice for them to be able to track where does their generated pictures/content end up to for investors etc. This can also help them "license" the images generated with their models instead of charging per run.

Well, open AI won't do it because no one wants watermarked images. Consequently, if they tried to watermark their outputs, people will be even more likely to use open-source alternatives. That's why open AI won't do it

Eggy-Toast t1_j1vhtgg wrote on December 27, 2022 at 6:34 PM

“This is already wrong — it might work” disingenuous much?

The point of that proposed watermark is that it can be imperceptible to the human eye but perceptible by some algorithm or model nevertheless. It only adds value to the product, but perhaps not as much as it would take to implement.

I think in your comments though you entirely overlooked the fact that DALLE 2 has watermark implementation and it is in no way subtle, but it can be cropped out.

perta1234 t1_j1w1872 wrote on December 27, 2022 at 8:44 PM

My question too. If a simple request is enough to get the work done, the work was too easy. The standard outputs are quite boring and low quality, but with good additional criteria and limits and requests, the output begins to become interesting. But at that point it is more like cocreation. In my testing, I found that I can save about 30% of the time compared to writing it myself, not more. Since the AI tends to hallucinate when subject is challenging, you need to direct the AI carefully.The main benefit coming from the use of AI is that one needs to think and describe the requested text outline and focus and aim very well. After that is done, writing is very easy anyway. AI just writes better English than some of us.

Superschlenz t1_j1wubjx wrote on December 28, 2022 at 12:06 AM

>A hashing-based system

I guess you mean https://en.wikipedia.org/wiki/Perceptual_hashing and not https://en.wikipedia.org/wiki/Cryptographic_hash_function

Without a prefix most people will assume cryptographic, for example Reddit user u/hjmb.

hjmb t1_j1y37ha wrote on December 28, 2022 at 6:21 AM

I had indeed made that assumption, thanks for pointing that out!

Eggy-Toast t1_j1vg5ly wrote on December 27, 2022 at 6:23 PM

I agree that something like this would be very useful. I tested out what ChatGPT could do regarding cheating in a very minimal way. It relates closely to your point 2. The thesis was to rate how likely an answer was to be generated by AI, similar to how programs will rate how likely an answer was plagiarized from the internet.

I took a refined response to AP test questions from a teacher and a generated response from ChatGPT, both very good. If I simply asked if it was a student or ChatGPT, with the note that some students would be very intelligent or very educated teachers acting like students, it would either say both were generated by a student or ChatGPT. However, if I asked it to rate the likelihood it came from ChatGPT on a scale from 1-100 I got good results with it ranking the teacher’s response lower than ChatGPT. The score for ChatGPT was in the 90s typically where the teacher’s were 70s or so.

I think refining this functionality would be one way to provide a solution which scales, at least theoretically, infinitely.

hopbel t1_j1yndbr wrote on December 28, 2022 at 10:52 AM

#2 runs into the problem where if a specific style becomes popular for AI art (ex. Greg Rutkowski), discriminators will start flagging anything matching that style, including the authentic works

dojoteef t1_j1uy04f wrote on December 27, 2022 at 4:24 PM

Very interesting idea. It could easily be applied to images since digital watermarks already exist. Not sure how feasible it is for AI generated text.

Tbh, I imagine it behooves companies to do this so they are less likely to train on media (text, images, audio, etc) produced from a model. The more ubiquitous the use of AI generation becomes, the more of an issue this poses. Currently that problem is likely quite minimal and probably acts to inject a small bit of noise into training (and the knowledge distillation effect could make slightly improve training efficiency).

Though I guess a new data cleaning step could be running a classification model to classify if the media trained on is likely AI generated, though that would likely be less efficient than a hash produced at the time of generation.