Viewing a single comment thread. View all comments

leliner t1_izno4vq wrote

As other people have been pointing out, myself included on twitter, anecdotal evidence on one example tells us nothing. We try 9 different prompts on 600 examples of implicature, we do few-shot prompting including up to 30 examples in-context (filling the context window), we try a contrastive framing of the question. I think you are misunderstanding the paper. Already at the time of publishing the paper the introductory examples in the abstract were properly answered by OpenAI's models, does not change the story. Additionally, chatGPT does much better than Davinci-2 (and -3), but still has a gap with humans, especially on the particularised examples subset (last paragraph section 4.1 in the paper).

4