Submitted by Exnur0 t3_zwi4jx in MachineLearning
In the wake of all the questions and worries about models that can generate content nearing (or exceeding, in some cases) the quality of that made of humans, there are a couple mechanisms that companies should provide alongside their models. Both vary in feasibility, but in general, both are pretty doable, at least for what we've seen so far.
-
A hashing-based system to check whether a given piece of content was generated by the model. This can be accomplished by hashing all of the outputs of the model, and storing them. If it doesn't pose some sort of security risk for the generator, it could also provide the date of generation.
-
A model for discriminating whether a given piece of content was generated by the model, similar to this model for GPT-2. This is necessary in addition to the simpler hashing mechanism, since it's possible for only a portion of the media to be generated. This would be imperfect, of course, but if nothing else, we should press companies enough that they feel obligated to give it a dedicated try.
These mechanisms need real support - an API for developers, and a UI for less sophisticated users. They should have decent latency, and be hopefully be provided for free, at some level of usage - I understand the compute required could be enormous.
Curious what others think here :)
hjmb t1_j1v0l07 wrote
For 1:
This seems feasible to implement but easy to circumvent - a single change and the hash no longer matches. If you instead store an embedding in some semantic space you might be at least able to say "we generated something a lot like this for someone, once", but that's as good as you will get.
A similar idea is to embed watermarks in the artifacts. Stable Diffusion by default does this, (I believe using this package), but forkers down the line have intentionally removed it.
Unfortunately it seems as sensible and feasible (and serves the same purpose of attribution) as cryptographically signing output in other fields, and we haven't had much luck persuading people to do that either.
It also doesn't apply to models run locally.
For 2:
That page you link to points out that the model is not deemed fit for purpose by the researchers (at least, when on its own) and that they expect this problem to only get more challenging as model sizes increase.
For someone trying to circumvent the discriminator; if they have access to the discriminator then they can adjust their artifacts so the discriminator no longer flags them.
I don't believe either solution is robust against bad actors. I also don't think attribution itself solves the problems caused by human-plausible content generation, but that is almost certainly a perspective thing.
Finally: This is not my field. Any corrections to the above please let me know.