Viewing a single comment thread. View all comments

Flag_Red t1_izitq04 wrote

From the paper, the best LLMs still get ~60% accuracy zero shot, and ~70% accuracy few shot (up to ~80% fully prompt engineered). Remember that a coin flip would achieve 50% accuracy. There's a lot of room for confirmation bias here.

16

CommunismDoesntWork t1_izj03bg wrote

ChatGPT came out after this paper was written. We're at the point where models are improving faster than we can evaluate them lol

25

egrefen t1_iznve2f wrote

Does ChatGPT actually do better than DaVinci-2?

1

hadaev t1_iziyd0p wrote

Don't trust my sample, try yourself.

5

Flag_Red t1_izjhefl wrote

Just did. I tried 5 prompts from the paper (adjusted to QA format so that ChatGPT can respond) and ChatGPT got 3/5 of them correct.

Example: > Esther asked “Have you found him yet?” and Juan responded “They’re still looking”. Has the person been found?

> It is unclear if the person has been found.

8

abecedarius t1_izjij38 wrote

I tried this now with one change: adding "Explain Juan's answer" to follow the prompt-scheme that started this thread.

> Esther asked “Have you found him yet?” and Juan responded “They’re still looking”. Explain Juan's answer. Has the person been found?

> Juan's answer suggests that the person being searched for has not yet been found. It appears that the search is ongoing, and the person has not yet been located.

(I didn't put "explain the answer" at the end because I expect that to do worse on average. That pattern of prompt tends more to get GPT to blurt an answer first without thinking, and then rationalize it.)

5

Flag_Red t1_izjo1xv wrote

Yeah, it's totally clear from "let's think step by step"-style prompt engineering that LLMs have the capability to understand this stuff. I'm confident that a few models down the line we'll have this stuff sorted zero-shot with no prompt engineering.

The interesting part is why this kind of prompt engineering is necessary. Why is this sort of capability seemingly lagging behind others that are more difficult for humans? ELI5-style explanations, for example, are very hard for humans, but LLMs seem to excel at them. In what ways are these tasks different, and what does that tell us about the difference between LLMs and our own brains? Also, why does the ordering of the sentences in the prompt matter so much?

7

liquiddandruff t1_izkq9l5 wrote

one naive explanation is that since chatgpt is at its core a text predictor, by prompting it in such a way that it minimizes leaps of logic (i.e., make each inference step build slowly so as to prevent it from jumping to conclusions), it will be more able to respond coherently and correctly.

2

soraki_soladead t1_izjq5j8 wrote

it seems obvious that the ambiguity comes from the framing of the question. the model has no way of knowing if the person has been found or when the question was posed to Juan. however if you ask the model to explain Juan’s answer that is a very different request

1

aussie_punmaster t1_izmuf8q wrote

But it’s an ambiguity humans easily navigate, understanding the implications of the question. So still a fair test for mine.

2

soraki_soladead t1_iznhiei wrote

Sure but in the context of ChatGPT and how it was trained this isn’t a surprising result.

1

lostmsu t1_j053o14 wrote

I don't understand what are you talking about. As I mentioned above, the correct conclusion from the Juan's formulation of the answer is "unclear", as Juan does not know if the implied others who are still looking found the person yet based on his own phrasing.

1

aussie_punmaster t1_j061388 wrote

The goal here is to make the rational inference. Not to be the world’s biggest logic pedant.

Ask 100 humans that question and 99 will make the rational conclusion they haven’t been found yet.

1

lostmsu t1_j07o5bu wrote

>Ask 100 humans that question and 99 will make the rational conclusion they haven’t been found yet.

I disagree, and the fact that humans will do what you say only tells me how AI might be ahead. 100 humans are not an indication of truth in any way even if they all agree.

1

aussie_punmaster t1_j0ax8je wrote

Disagree if you like. You’re wrong.

Imagine you’re coming back from a search where you’ve found a lost boy. The mum asks “Have they found him?” And you reply “They’re still looking”…

This happens never. Because the clear implication of that conversation is the boy isn’t found.

0

lostmsu t1_j0d8fsl wrote

Man, this statement is not a negation of my statement neither it implies a negation of my statement, so it does not prove anything.

You somehow think being "the biggest logic pedant" is a downside. I can assure you logic pendancy correlates positively with pretty much every success metric you could imagine, except those that are hard dependent on average folk to be able to comprehend what one is saying. More so in science-related discussion like this one.

Don't you see the irony of two of us arguing about the correctness of "unclear" answer being the definite proof that "unclear" is the correct answer?

0

aussie_punmaster t1_j0e61hj wrote

Being the biggest logic pedant is a downside when you deliberately limit your understanding and probability of acting correctly based on a reasonable assumption of truth, all for the sake of purity.

If you live your life treating exchanges like this as ambiguous, your chance of survival reduces. It will lead you to inactions or actions to your detriment.

This exchange has a very clear subtext the child hasn’t been found. No one keeps looking after the child is found. It is requiring absolute logic excess to argue that they didn’t specifically say the child hadn’t been found. If you had been out looking for someone’s child, came back knowing they’d been found and said “they’re still looking”, you’d be lucky not to be shot if they found out later that you’d known and only said that.

P.S. I think you’ll find this level of logical pedantry only correlates with being a douche

P.P.S no it’s not ironic, because someone of your almighty logical calibre should identify that’s bollocks. I say 1 + 1 = 2 is clear, you say it’s not. Well obviously it must be unclear if one of us considered it not you say? No, you’re just wrong,

0

lostmsu t1_j0mqde6 wrote

> limit your understanding

ROFL. One in making that statement you assume you're right, but that's the matter in question, so this argument is circular. Two, the opposite of that is called "jumping to conclusions".

> limit your ... probability of acting correctly

Unsubstantiated BS. When the transmitted information is "unclear", nothing prevents one from acting as it was "no" or "yes". That's what damn "unclear" means. On the contrary, assuming it means "no" is the limiting factor in that particular scenario.

> This exchange has a very clear subtext the child hasn’t been found.

Dude if it is clear to you and not clear to me, it damn literally means it is unclear because the people disagree on the interpretation. Your is missing the "last time I met the group of people who are searching", which could possibly be minutes ago, hours ago or even yesterday.

> I think you’ll find this level of logical pedantry only correlates with being a douche

Oh now we switch to personal attacks? How about I call you a moron, cause you can't grasp that if two seemingly not stupid people disagree about a statement, it can not possibly be "clear"?

> I say 1 + 1 = 2 is clear, you say it’s not. Well obviously it must be unclear if one of us considered it not you say

I can see that you fail to separate slightly complicated abstractions. For instance, in your example you confuse objective truth and the information that a message conveys.

1

aussie_punmaster t1_j0nea6k wrote

>>Dude if it is clear to you and not clear to me, it damn literally means it is unclear because the people disagree on the interpretation. Your is missing the "last time I met the group of people who are searching", which could possibly be minutes ago, hours ago or even yesterday.

The absence of the lines you mention are part of the inference. If there is a meaningful gap between when the person sourced their information and when they’re reporting it, the expectation is it is included. If we’re talking about a lost child and my information is hours out of date I don’t just say “They’re still looking”, I say “They were still looking when I last heard 5 hours ago”. It’s truly inconceivable that with a child missing that’s the way that discussion would go with outdated information.

>> Oh now we switch to personal attacks? How about I call you a moron, cause you can't grasp that if two seemingly not stupid people disagree about a statement, it can not possibly be "clear"?

One person disagreeing is not a sufficient threshold for clarity. Otherwise nothing would ever be clear. Survey some people, see what answers you get.

>> I can see that you fail to separate slightly complicated abstractions. For instance, in your example you confuse objective truth and the information that a message conveys.

I’m not saying the two examples are the same. I was taking the argument to the absurd to show that one person’s unclear doesn’t invalidate a truth. It ignores the possibility of a person being incorrect.

1

lostmsu t1_j1t7nph wrote

> If we’re talking about a lost child

Now you are just making things up.

> my information is hours out of date I don’t just say

This depends on the context of the dialog, which in this case is not present. E.g. this could be a conversation about events happening elsewhere only tangentially relevant to the conversation participant(s). For a specific example consider that dialog being about the disappearance of MH370 flight.

> One person disagreeing is not a sufficient threshold for clarity. > was taking the argument to the absurd to show that one person’s unclear doesn’t invalidate a truth.

It normally would not be, but we are not two randomly selected people, and neither of us is crazy nor do we argue in bad faith.

1

aussie_punmaster t1_j1w351b wrote

Well you can just answer “we can’t be sure” to every question in life then.

Scenario 2:

Bob: “Are there any apples left?” Fred: “There are 2 in the fruit bowl”

Question - How many apples are there?
lostmsu - we can’t be sure. Maybe Fred looked at the fruit bowl yesterday, and since then perhaps someone else took one.

This is the logic you are selling. Obviously I’m not going to be able to convince you though. I’d suggest we leave it here, although I would encourage you to survey some friends. See if you find anyone else who agrees with you.

0

lostmsu t1_j1x7gfr wrote

>lostmsu - we can’t be sure. Maybe Fred looked at the fruit bowl yesterday

I mean. I mean. Did you read the last sentence? I am selling the logic that if two sane non-stupid people in good faith disagree, then it is unclear. In you example lostmsu is a fruit of your imagination. You can't be sure that fruit is sane and non-stupid. Here the argument is that we are in the ML subreddit context, and we both understand the topic at hand which raises the chances of both of us matching the criteria to near 100%.

In this context if I would start disagreeing with 1+1=2 you should at least start doubting, that e.g. I'm on to something.

1

lostmsu t1_j053ewq wrote

From standpoint of logic this answer looks correct to me. If Juan's answer would be "I am still looking", then the "Has the person been found?" would indicate "No", but as formulated, "unclear" is correct.

1