Viewing a single comment thread. View all comments

leliner t1_iznom12 wrote

Did test against chatGPT. Cannot fully compare to humans or the experimental setup used in the paper (especially not as comprehensively as using 9 prompts on 600 examples). Preliminary results show there's still a gap with humans, especially with particularised examples (see last paragraph of section 4.1 in the paper). Feel free to try CoT, definitely something we have thought about, and for a response to that I refer to Ed's comment https://www.reddit.com/r/MachineLearning/comments/zgr7nr/comment/iznhuqz/?context=1.

3