Submitted by scarynut t3_10ijzi2 in MachineLearning
So LLMs like GPT3 have understandably raised concerns about the disruptiveness of faked texts, faked images and video, faked speech and so on. While this may likely change soon, as of now OpenAI controls the most accessible and competent LLM. And OpenAIs agenda is said in their own words to be to benefit mankind.
If so, wouldn't it make sense to add a sort of watermark to the output? A watermark built into the model parameters so that it could not easily be removed, but still detectable with some key or some other model. While it may not matter in the long run, it would set a precedent to further development and demonstrate some kind of responsibility for the disruptive nature of LLMs/GPTs.
Would it not be technically possible, nä would it make sense?
adt t1_j5erdiz wrote
Already in the works (Scott Aaronson is a scientist with OpenAI):
>>we actually have a working prototype of the watermarking scheme, built by OpenAI engineer Hendrik Kirchner. It seems to work pretty well—empirically, a few hundred tokens seem to be enough to get a reasonable signal that yes, this text came from GPT.
>Now, this can all be defeated with enough effort. For example, if you used another AI to paraphrase GPT’s output—well okay, we’re not going to be able to detect that. On the other hand, if you just insert or delete a few words here and there, or rearrange the order of some sentences, the watermarking signal will still be there. Because it depends only on a sum over n-grams, it’s robust against those sorts of interventions.
https://scottaaronson.blog/?p=6823