hjmb t1_j1v0l07 wrote on December 27, 2022 at 4:42 PM

For 1:

This seems feasible to implement but easy to circumvent - a single change and the hash no longer matches. If you instead store an embedding in some semantic space you might be at least able to say "we generated something a lot like this for someone, once", but that's as good as you will get.

A similar idea is to embed watermarks in the artifacts. Stable Diffusion by default does this, (I believe using this package), but forkers down the line have intentionally removed it.

Unfortunately it seems as sensible and feasible (and serves the same purpose of attribution) as cryptographically signing output in other fields, and we haven't had much luck persuading people to do that either.

It also doesn't apply to models run locally.

For 2:

That page you link to points out that the model is not deemed fit for purpose by the researchers (at least, when on its own) and that they expect this problem to only get more challenging as model sizes increase.

For someone trying to circumvent the discriminator; if they have access to the discriminator then they can adjust their artifacts so the discriminator no longer flags them.

I don't believe either solution is robust against bad actors. I also don't think attribution itself solves the problems caused by human-plausible content generation, but that is almost certainly a perspective thing.

Finally: This is not my field. Any corrections to the above please let me know.

Exnur0 OP t1_j1v2zrb wrote on December 27, 2022 at 4:58 PM

Thanks for the insightful comment, I think you helped me get out the rest of my thought.

I definitely agree, both of these have gaping holes in them if anyone with expertise comes along. The second mechanism is meant to plug holes in the first, but people can definitely construct media to get past the second anyway. Ideally it would return a value 0-1 and inform the human's level of suspicion, and not just a boolean, but still, it's likely that certain tricks would end up helping someone get things past it.

I think these are useful mostly because of what they may be able to accomplish with regard to the scale of the problem - any additional effort required to pass off AI work as human is a good thing, as far as I'm concerned, and some of the scariest implications of these kinds of models come from their scale - moderation is more or less impossible, if you have to deal with limitless examples of generated content.

For example, take the problem of misinformation, like what happened on StackOverflow (GPT-generated answers were banned, largely because they're often wrong, is my understanding). Imagine that StackOverflow had access to an API that could reliably point out unedited (or close to it) generated content. In that case, the scope of the problem shrinks to only those people willing to put in the effort to slip things past the discriminator, which hopefully will be much smaller of a set.

I also definitely agree that there are other problems that aren't solved by discrimination at all, even if discrimination was perfect - really, the underlying point is that the labs cranking out powerful generative models could be doing much, much more in terms of accompanying tooling, to try to decrease the negative impacts of their tech. I don't see what I'm describing as bulletproof or as always useful, but it strikes me as kind of a bare-minimum precaution. If nothing else, I should be able to put completely unedited generated media into an API and get back an answer that it was generated.