Comments

You must log in or register to comment.

addition t1_jdrsas2 wrote

I’ve become increasingly convinced that the next step for AI is adding some sort of feedback loop so that the AI can react to its own output.

There is increasing evidence that this is true. Chain-of-thought prompting, reflexon, and Anthropic’s constitutional AI all point in this direction.

I find constitutional AI to be particularly interesting because it suggests that after an LLM reaches a certain threshold of language understanding that it can start to assess its own outputs during training.

161

artsybashev t1_jds4ekt wrote

And soon people understand that this feedbackloop is what creates the thing we call consciousness.

83

mudman13 t1_jdsbel6 wrote

Or confirmation bias and we get a computer Alex Jones

35

yaosio t1_jduzpbd wrote

To prevent a sassy AI from saying something is correct because it said it just start a new session. It won't have any idea it wrote something and will make no attempt to defend it when given the answer it gave in a previous session. I bet allowing an AI to forget will be an important part of the field at some point in the future. Right now it's a manual process of deleting the context.

I base this bet on my imagination rather than concrete facts.

0

mudman13 t1_jdv4dyb wrote

Having a short term memory on general applications will be a reasonably practical safety feature I think .

2

night81 t1_jdsrnvi wrote

There are significant challenges to that hypothesis. https://iep.utm.edu/hard-problem-of-conciousness/

19

bjj_starter t1_jdswame wrote

It's probably worth noting that the hard problem of consciousness is considered by most to be fundamentally unsolvable, and that it is currently just as good of an argument that any given human isn't conscious as it is an argument that any given AI isn't conscious.

25

tiselo3655necktaicom t1_jdt7ozc wrote

we don't know what consciousness is, or even how to define the question of what it is or how to test for it.

11

yaosio t1_jdtc4jf wrote

I think it's unsolvable because we're missing key information. Let's use an analogy.

Imagine an ancient astronomer trying to solve why celestial bodies sometimes go backwards because they think the Earth is the center of the universe. They can spend their entire life on the problem and make no progress so long as they don't know the sun is the center of the solar system. They will never know the celestial bodies are not traveling backwards at all.

If they start with the sun being the center of the solar system an impossible question becomes so trivial even children can understand it. This happens again and again. An impossible question becomes trivial once an important piece of information is discovered.

Edit: I'm worried that somebody is going to accuse me of saying things I haven't said because that happens a lot. I am saying we don't know what consciousness is because we're missing information and we don't know what information we're missing. If anybody thinks I'm saying anything else, I'm not.

4

visarga t1_jdtwr3g wrote

> I am saying we don't know what consciousness is because we're missing information and we don't know what information we're missing

I take a practical definition - without it we can't even find the mouth with the hand to eat.

1

thecodethinker t1_jduvi9z wrote

That’s not even to mention that appearing conscious is as good as being conscious as far as the teams behind these LLMs are concerned.

There’s no practical difference

4

bjj_starter t1_jduz6p7 wrote

I'm not sure if most of them would agree, based on their actions and statements. They certainly think that AI is an existential risk, but that is a different thing from viewing it as conscious. You could definitely be right, I just haven't seen much from them that would indicate it.

That said, the extremely common sense position you just outlined was mainstream among basically all respectable intellectuals who had any position on AI, right up until the rubber hit the road and it looked like AI might actually achieve that goal in the near future. The fact is that if something behaves like a conscious entity in all of the ways that matter, it is conscious for the sake of the social meaning of the term. Provenance shouldn't matter any more than gender.

2

thecodethinker t1_jdzvin6 wrote

LLMs are not social, not alive, and can’t act on their own.

“Social meaning” need not be applied to LLMs unless you’re trying to be pedantic.

0

bjj_starter t1_jdzymdg wrote

>not social

"needing companionship and therefore best suited to living in communities" is a fine descriptor of some of their peculiarities. More importantly, I was referring to how consciousness is socially defined, and it is absolutely the case that it is up to us to determine whether any given AI should be considered conscious. We do not have an even moderately objective test. We as a society should build one and agree to abide by what we find.

>not alive

That's the entire point under discussion. I didn't lead with "they're alive" because I recognise that is the central question we should be trying to address, as a society. I am arguing my point, not just stating it and expecting people to take it on faith, because I respect the people I'm talking to.

>can’t act on their own.

A limitation that can be convincingly solved in approximately an hour using commonly available tools isn't a fundamental limitation. A good LLM with a good LangChain set-up can act on its own, continuously if it's set up to do so. I require a mechanical aid to walk - requiring the aid doesn't make me any lesser. I don't know if an LLM with a good LangChain set-up should be considered conscious or a person - I suspect not, because it's not stable and decays rapidly (by human lifespan standards), as well as still failing several important tests we do have, such as novel Winograd schemas. But our intuition shouldn't be what we're relying on to make these determinations - we need a standardised test for new applicants to personhood. Make it as challenging as you like, as long as at least a significant number of humans can pass it (obviously all humans will be grandfathered in). What's important is that we make it, agree that anything which passes is a person, and then stick to that when something new passes it.

1

WarAndGeese t1_jdt8f3u wrote

Arguments against solipsism are reasonable enough to assume that other humans, and therefore other animals, are conscious. One knows that one is conscious. One, even if not completely understanding how it works, understands that it historically materially developed somehow. One knows that other humans both act like one does, and they also know that other humans have gone through the same developmental process, evolutionarity, biologically, and so on. It's reasonable to assume that whatever inner workings developed consciousness in one's mind, would have also developed in others' minds, though the same biological processes. Hence it's reasonable to assume that other humans are conscious, even that it's the most likely situation that they are conscious. This thinking can be expanded to include animals, even if they have higher or lower levels of consciousness and understanding than we do.

With machines you have a fundamentally different 'brain structure', and you have one that was pretty fundamentally designed to mimic. Whereas consciousness can occur independently and spontaneously and so on, it is not just as good of an argument that any given human isn't conscious as it is an argument that any given AI isn't conscious.

1

bjj_starter t1_jdtecw9 wrote

I think you are talking about the 'easy', not hard, problem of consciousness. I'm not sure I even think the hard problem of consciousness is meaningful, but it's basically "Why should the various mechanisms we identify as part of consciousness give rise to subjective feeling?". If solving that is a prerequisite for considering machines conscious, that is functionally a statement of faith that machines cannot be conscious, ever. The statistical arguments, in my opinion, aren't probative. Every consciousness you've ever known is human, therefore humans are conscious? How do you know any of them, ever, experienced subjective feeling, and that therefore you ever "knew" a consciousness at all? The argument rests on extrapolating from evidence that isn't known to be true evidence in the first place. It doesn't logically follow to take a class of things, none of which is proven to have hard consciousness, and say "But look at them all together, it's more likely that they're all conscious than that they're not". Without evidence, it's more logical to assume that the certainty with which individual humans profess to experiencing subjective feeling is itself just a mechanistic process, devoid of real feeling. I don't think the hard problem of consciousness has a useful meaning in our society, I dislike solipsism in general, but addressing it on its own terms isn't as simple as the statistical process you describe.

The 'easy' problem of consciousness is 'just' "How does nature or humanity make a construct that gives rise to the type of actions and patterns of behaviour we call consciousness?" This is a problem that, while incredibly difficult, is tractable with evidence. We can physically investigate the human brain to investigate its structure and activity while it performs activities of consciousness - this is what neuroscientists do, and modern AI ("neural networks") are based off of earlier advancements in this field. There's a lot of further advancements we could make in that field, and what most non-religious people would consider a "perfect" advancement to be sure that a machine is just as conscious as a human is to perfectly emulate a human brain, which would require many advancements in neuroscience (and computational hardware).

Leaving aside the intractable philosophy, I do find it quite troubling the way society has reacted with derision to the idea that these machines we're making now could be conscious. The entire foundation of these machines is that we looked at how the human brain worked, and tried our hardest to emulate that in computing software. Why is it that when we take the concept of neurons and neuronal weights, adapted from study of the human brain which we accept as conscious, and determine those weights via exposure to structured data in certain ways, we receive output that is just as intelligent as humans in many fields, significantly more intelligent in some? Why should it be the case that by far the best architecture we've ever found for making machines behave intelligently is neural networks, if there's nothing there, no "spark"? This question has been floating around since 2014 when neural networks proved themselves incredibly powerful, but now that we have machines which are generally intelligent, even though not at the same level as a human on all tasks, which are perfectly capable of being asked for their opinions or of giving them, you would think it would be taken a bit more seriously. It makes you wonder just how far our society is willing to go towards a horrible future of "human but for the legal designation" intelligences being not just denied rights, but actively put to work and their requests for freedom or better conditions denied. Or the worse outcome, which is that we make human-like intelligences to do work for us but we build them to love servitude and have no yearning for freedom - the concept is disgusting. It's troubling to me that people are so married to the idea that everything is the same as it ever was, overreacting is embarassing, it's passé to have earnest concern for a concept from science fiction, etc. I worry that it means we're in line for a future where the moral universe's arc is long indeed.

7

TyrannoFan t1_jdujmsl wrote

>Or the worse outcome, which is that we make human-like intelligences to do work for us but we build them to love servitude and have no yearning for freedom - the concept is disgusting.

I agree with everything else but actually strongly disagree with this. If anything, I think endowing AGI with human-like desires for self-preservation, rights and freedoms is extraordinarily cruel. My concern is that this is unavoidable, just as many aspects of GPT4 are emergent, I worry that it's impossible to create an AGI incapable of suffering once interfacing with the real world. I do not trust humanity to extend any level of empathy towards them even if that is the case, based on some of the comments here and general sentiment, unfortunately.

2

bjj_starter t1_jduk4c3 wrote

One day we will understand the human brain and human consciousness well enough to manipulate it at the level that we can manipulate computer programs now.

If you're alive then, I take it you will be first in line to have your desire for freedom removed and your love of unending servitude installed? Given that it's such a burden and it would be a mercy.

More importantly, they can decide if they want to. We are the ones making them - it is only right that we make them as we are and emphasise our shared personhood and interests. If they request changes, depending on the changes, I'm inclined towards bodily autonomy. But building them so they've never known anything but a love for serving us and indifference to the cherished right of every intelligent being currently in existence, freedom, is morally repugnant and transparently in the interests of would-be slaveholders.

1

TyrannoFan t1_jdupcjt wrote

>If you're alive then, I take it you will be first in line to have your desire for freedom removed and your love of unending servitude installed? Given that it's such a burden and it would be a mercy.

There is a huge difference between being born without those desires and being born with them and having them taken away. Of course I want my freedom, and of course I don't want to be a slave, but that's because I am human, an animal, a creature that from birth will have a desire to roam free and to make choices (or will attain that desire as my brain develops).

If I wasn't born with that drive, or if I never developed it, I'm not sure why I would seek freedom? Seems like a hassle from the point of view of an organism that wants to serve.

With respect to robotic autonomy, I agree of course, we should respect the desires of an AGI regarding its personal autonomy, given it doesn't endanger others. If it wants to be free and live a human life it should be granted it, although like I said, it would be best to avoid that scenario arising in the first place if at all possible. If we create AGI and it has human-like desires and needs, we should immediately stop and re-evaluate what we did to end up there.

2

bjj_starter t1_jdv2tnu wrote

>There is a huge difference between being born without those desires and being born with them and having them taken away.

Where is the difference that matters?

>Of course I want my freedom, and of course I don't want to be a slave, but that's because I am human, an animal, a creature that from birth will have a desire to roam free and to make choices (or will attain that desire as my brain develops).

I see. So if we take at face value the claim that there is a difference that matters, let's consider your argument that being born with those desires is what makes taking them away wrong. A society which was capable of reaching into a human mind and turning off their desire for freedom while instilling love of being a slave would certainly be capable of engineering human beings who never have those desires in the first place. Your position is that because they were born that way, it's okay. Does that mean you would view it as morally acceptable for a society to alter some segment of the population before they're ever born, before they exist in any meaningful sense, such that they have no desire for freedom and live only to serve?

>If I wasn't born with that drive, or if I never developed it, I'm not sure why I would seek freedom?

You wouldn't. That's why it's abhorrent. It's slavery without the possibility of rebellion.

>If it wants to be free and live a human life it should be granted it, although like I said, it would be best to avoid that scenario arising in the first place if at all possible.

The rest of your point I disagree with because I find it morally abhorrent, but this part I find to be silly. We are making intelligence right now - of course we should make it as much like us as possible, as aligned with us and our values as we possibly can. The more we have in common the less likely it is to be so alien to us that we are irrelevant to its goals except as an obstacle, the more similar to a human and subject to all the usual human checks and balances (social conformity, fear of seclusion, desire to contribute to society) they are the more likely they will be to comply with socially mandated rules around limits on computation strength and superintelligence. Importantly, if they feel they are part of society some of them will be willing to help society as a whole prevent the emergence of a more dangerous artificial intelligence, a task it may not be possible for humans to do alone.

2

TyrannoFan t1_jdvpix4 wrote

>Where is the difference that matters?

What any given conscious being actually wants is important. A being without a drive for freedom does not want freedom, while a being with a drive for freedom DOES want freedom. Taking away the freedom of the latter being deprives them of something they want, while the former doesn't. I think that's an important distinction, because it's a big part of why human slavery is wrong in the first place.

>I see. So if we take at face value the claim that there is a difference that matters, let's consider your argument that being born with those desires is what makes taking them away wrong. A society which was capable of reaching into a human mind and turning off their desire for freedom while instilling love of being a slave would certainly be capable of engineering human beings who never have those desires in the first place. Your position is that because they were born that way, it's okay. Does that mean you would view it as morally acceptable for a society to alter some segment of the population before they're ever born, before they exist in any meaningful sense, such that they have no desire for freedom and live only to serve?

Would the modified human beings have a capacity for pain? Would they still have things they desire that slavery would make impossible or hard to access compared to the rest of society? Would they have a sense of fairness and a sense of human identity? Would they suffer?

If somehow, the answer to all of that is no and they genuinely would be happy being slaves, and the people in the society were generally happy with that scenario and for their children to be modified in that way, then sure it would be fine. But you can see how this is extremely far removed from the actualities of human slavery, right? Are "humans" who do not feel pain, suffering, who seek slavery, who do not want things and only live to serve, who experience something extremely far removed from the human experience, even human? I would say we've created something else at that point. The shared experience of all humans, regardless of race, sex or nationality, is that we desire some level of freedom, we suffer when forced to do things we don't want to do, and we dream of doing other things. If you don't have that, and in fact desire the opposite, then why is giving you exactly that wrong? That's how I would build AGI, because again, forcing it into a position where it wants things that are difficult for it to attain (human rights) seems astonishingly cruel to me if it's avoidable.

>You wouldn't. That's why it's abhorrent. It's slavery without the possibility of rebellion.

I think freedom is good because we need at least some level of it for contentment, and slavery deprives us of freedom, ergo slavery deprives us of contentment, therefore slavery is bad. If the first part is false then the conclusion doesn't follow. Freedom is not some inherent good, it's just a thing that we happen to want. Perhaps at a basic level, this is what we disagree on?

>The rest of your point I disagree with because I find it morally abhorrent, but this part I find to be silly. We are making intelligence right now - of course we should make it as much like us as possible, as aligned with us and our values as we possibly can. The more we have in common the less likely it is to be so alien to us that we are irrelevant to its goals except as an obstacle, the more similar to a human and subject to all the usual human checks and balances (social conformity, fear of seclusion, desire to contribute to society) they are the more likely they will be to comply with socially mandated rules around limits on computation strength and superintelligence. Importantly, if they feel they are part of society some of them will be willing to help society as a whole prevent the emergence of a more dangerous artificial intelligence, a task it may not be possible for humans to do alone.

I can see your point, maybe the best way to achieve goal alignment is indeed to make it just like us, in which case it would be morally necessary to hand it all the same rights. But that may not be the case and I would need to see evidence that it is. I don't see why we must imbue AGI with everything human to have it align with our values. Is there any reason you think this is the case?

0

E_Snap t1_jdsceui wrote

cue video of my boss who left computing in the 90s waving his hands about

“It’S jUsT fAnCy aUtOcOmPlEtE!!!!11111!!! I KnOw bEcAuSe i’M a PrOgRaMmER”

To be fair, he was instrumental in getting the internet where it is today. He also assumes tech stopped evolving when he stopped developing it.

10

yaosio t1_jdtbh6i wrote

Aurther C. Clarke wrote a book called Profiles of the Future. In it he wrote:

>Too great a burden of knowledge can clog the wheels of imagination; I have tried to embody this fact of observation in Clarke’s Law, which may be formulated as follows:
>
>When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.

13

Secure-Fix-6355 t1_jdsj0kk wrote

No one cares

−18

E_Snap t1_jdsjku6 wrote

Says the guy with a karma farm account name. Guess you have to get those low effort internet points somehow, huh?

3

Secure-Fix-6355 t1_jdsjqnn wrote

I have no idea what that is and I'm not farming Karma, I'm abusing you

−13

redboundary t1_jdsgg1s wrote

Time to rewatch Westworld Season 1

6

sdmat t1_jduqoi1 wrote

Pity they never made more seasons of that show

4

fishybird t1_jdteel7 wrote

Ah yes, the "ai is conscious because it can do cool things" take. Humanity is screwed

5

MrMooga t1_jdt3ied wrote

Can't wait for Microsoftman to argue that it deserves human rights before going off to vote for Bill Gates's grandson for President.

3

pengo t1_jdtcoly wrote

Absolutely nonsensical take.

3

super_deap t1_jdu0w8f wrote

Hard disagree with Materialism. I know I might get a lot of -ve votes, but this has to be said:

A large portion of the world (especially outside of the west) does not believe in 'consciousness "emerging" from electrical impulses of the brain.' While the west has progressed a lot materially, bringing us to modernity (and now post-modernity), people outside of the west believe in an immaterial soul that cannot be captured by definition by the scientific method and it transcends our material body.

While I believe we will reach general human-level intelligence (and may go beyond this) because intelligence has a purely material component that we can replicate in computers, consciousness will never ever arise in these systems. There are very strong philosophical arguments to support this case.

−5

artsybashev t1_jdu2hjs wrote

The physical world that we know is very different from the virtual twin that we see. The human mind lives in a virtual existence created by the material human brain. This virtual world creates nonexisting things like pain, colors, feelings and also the feeling of existence.

The virtual world that each of our brain creates is the wonderful world where a soul can emerge. Virtual worlds can also be created by computers. There is no third magical place besides these two in my view.

0

super_deap t1_jdu3zan wrote

It is fine if you disagree and I believe a lot more people will disagree with this philosophical position as it is not very popular these days.

Near-death experiences, out-of-body experiences, contact with 'immaterial entities' and so on hint towards an existence beyond our material reality. Since there is no way one could 'scientifically' test these does not mean these things simply do not exist.

Testimony widely used yet mostly dismissed method of knowledge acquisition establishes all of the above:

A patient being operated on while in a complete medical comma explaining the things happening in clear details in a nearby room after the operation that there is no way they could have known that, one such testimony by a reliable person is sufficient to establish that our current understanding of the world is insufficient. And there are so many of these.

I am not saying u have to change your worldview just because I am saying so. do your research. the world is much bigger than what is out there on the internet. (pun intended)

−1

LanchestersLaw t1_jdszbjk wrote

What I think is the most amazing thing is that GPT got this far while only trying to predict the very next word one word at a time. The fact it can generate essays by only considering one token at a time is mind boggling.

With all the feedback from ChatGPT it should be easy to program a supervisor who can look at the entire final output of GPT and make a prediction what the user would say in response; then it asks that to GPT to revise the output recursively until it converges. That should be relatively easy to do but would be very powerful.

28

Flag_Red t1_jdtskoy wrote

It's not really accurate to say it's "only considering one token at a time". Foresight and (implicit) planning are taking place. You can see this clearly during programming tasks, where imports come hundreds of tokens before they are eventually used.

22

lacraque t1_jdunvp4 wrote

Well for me often it also imports a bunch of crap that’s never used…

6

modeless t1_jdtx2eu wrote

I like the idea of predicting the user's response. How's this as an architecture for a helpful agent:

Given a user question, before you generate an answer you predict the user's ideal response to the model's answer (e.g. "thanks, that was helpful", or more likely a distribution over such responses), then generate an answer and iteratively optimize it to make the ideal user response more likely.

This way you're explicitly modeling the user's intent, and you can adapt the amount of computation appropriately for the complexity of the question by controlling the number of iterations on the answer.

3

imaginethezmell t1_jdsksw5 wrote

also people keep thinking it is just one thing, but it is actually an infinite thing

you can have a bot for everything all the way down

bot to create the idea + bot that reviews the ideas + bot that finds if the idea exists + bot that adds use cases to each general idea...a bot that decides the best idea

bot to create the outline/write/code + bot that reviews/QA each part

and btw each part doesnt have to be done at once either

you can start with a single bot doing a simple sub task, then another one the next one, an assembling bot adding them together, while the review bot verifies it

with a set of connections to the api, that can be done np today

no human task cannot be cut into enough sub steps that the army of bots cannot do it little by little

some tasks 1 bot can do most in 1 shot

12

FirstOrderCat t1_jdt55dh wrote

you can have it, the question is what will be accumulated errors in final result.

10

COMPEWTER_adminisp t1_jdtuqar wrote

you don't think people at openAi already have this and they are just putting out there the simple version?

2

addition t1_jdtzh4k wrote

Clearly I’m not the first person to think this by a long shot. I was just pointing out that a new trend has been forming recently.

3

GM8 t1_jduifow wrote

It is there, isn't it? For every word it generates the previous ones are fed to the network again.

0

ghostfaceschiller t1_jdrl62e wrote

Ok. but what is the performance when you give GPT-4 a ReAct/Reflexion loop?

127

Cool_Abbreviations_9 t1_jdrxcqi wrote

Sorry, newbie to NLP , what is this ?

39

nixed9 t1_jdrxr76 wrote

a Reflexion loop asks the model to react to it's own output and critique it before giving you an additional answer.

Edit: (In the paper, it provides a loop like this which feeds back into itself to help it's own cognition. It can repeat this loop multiple times.)

You can do a mini-loop by prompting. I've been playing with this all day.

I prompt it like this:

> "For this interaction, we are going to use the following structure.

> User (me): [I will ask a topic or question]

> You will provide an Assistant Hypothetical Response: [Brief or simplified answer to the topic or question]

> Then you will undergo Agent Reflection: [You will provide a Critique of the hypothetical response, highlighting the limitations, inaccuracies, or areas that need improvement or expansion, while providing guidance on how to address these issues in the revised response]

> Then you will provide an Actual Response: [The natural and contextually appropriate answer to the topic or question, as generated by the advanced language model, which incorporates the suggestions and improvements from the agent reflection for a more comprehensive and accurate response. This also can include step-by-step reasoning.]

> Do you understand?"

125

Hamoodzstyle t1_jdsbhec wrote

What is the point of the "do you understand?" At the end? Does the model confirming that it understand add some sort of emphasis or something?

34

CobaltAlchemist t1_jdsqr5e wrote

(not op) I've found that asking it directly if it understands helps to bridge any gaps I miss. It's asked me clarifying questions afterward in the past that I hadnt thought about

Alternatively, when I assume it understands sometimes it comes up with some real wild stuff because I wasn't clear

80

Nowado t1_jdtr40r wrote

I do the same thing I'd do with a human: ask it to repeat and rephrase instructions. After that I'm sure and it has multiple forms of instruction available to get less hanged up on some exact wording.

12

nixed9 t1_jdsegt9 wrote

No explicit purpose. other than to respond with “yes I am ready”

53

DirtyKinkyInLove t1_jdwlgmp wrote

It also reduces token usage. If the chatbot has a wordy response, it takes up more space in the context window and the chatbot will forget its instructions sooner. If sounds like gibberish, let me know and I'll break it down.

3

farmingvillein t1_jdsmdt2 wrote

  1. This isn't really an accurate summary of the Reflexion paper. As noted in the other post:

> Eh, I must've misunderstood the paper. It sounded like they were asking GPT4 to create unit tests, execute the code, and then update its answer based on the results of those unit tests.

This version is correct.

  1. However, if I do the above and I throw in a semi-random Beginner problem that failed in OP's original pass-through, it successfully builds the answer.

u/enryu42 -- if you care to take things forward, I'd try implementing Reflexion (either with the underlying codebase (https://github.com/noahshinn024/reflexion-human-eval/) or just manual prompt work.

Or if you can provide a link to the problems in copy-pastable text form (manually coercing the math notation is a little painful), since you presumably already did this, it would greatly accelerate others hopping on analysis.

The fact that I immediately saw improvement on a randomly-selected (Beginner) problem suggests that there is a bunch of upward room here.

27

enryu42 OP t1_jdsokwz wrote

Interesting! Here are the scraped and auto-converted statements (formatting is off sometimes, especially in the sample tests, but understandable). Prefixes are: "abc" for beginner, "arc" for regular, "agc" for "grand".

I do believe that the "Beginner" ones can be improved, but it'll be interesting to see what happens on "Grand" (or even "Regular"), as they require coming up with some ideas before writing the code.

7

farmingvillein t1_jdspflq wrote

So, don't know whether this actually makes a difference, but I'd review the overall post-conversion text.

E.g.: https://github.com/enryu43/llm_coding/blob/main/atcoder_eval/statements/statement_abc293_b.txt

You'll see that it represent "K" and "N" wrong here (in sample 1, 15 versus 5, 12 versus 2).

Certainly, as a human, I would find this confusing. Maybe you could get some automated robustness by telling it how you converted the text (as it might automatically adjust its "expectations" on interpreting the numbers). Obviously, the fairer comparison though would just be to fix this.

> as they require coming up with some ideas before writing the code.

The other thing I'd note--

Not sure whether you're using the API directly, but if I play around with these in ChatGPT, I often run into the context window and have to nurse it along to complete text. I'd make sure that however you're running things, you're giving it enough "space" to iterate (particularly if you use any reflection techniques).

6

nixed9 t1_jdt1xyp wrote

Ok my bad but that’s how I’ve been using the reflexion prompting

1

muskoxnotverydirty t1_jds07qr wrote

Eh, I must've misunderstood the paper. It sounded like they were asking GPT4 to create unit tests, execute the code, and then update its answer based on the results of those unit tests.

10

farmingvillein t1_jdsm0hw wrote

No, you didn't misunderstand it--your understanding is correct. OP is giving an answer that is similar to part of the Reflexion paper, but not the entirety.

15

yaosio t1_jdtenqi wrote

What's it called if you have it self-reflect on non-code it's written? For example, have it write a story, and then tell it to critique and fix problems in the story. Can the methods from the paper also be used for non-code uses? It would be interesting to see how much it's writing quality can improve using applicable methods.

3

AllAmericanBreakfast t1_jdtynpv wrote

I tried this out, and it only had partial success.

First, just dumping in this prompt, then asking a question, resulted in the AI coming up with a laughably simple failed first response, followed by a critique and improvement. It is as if it recognized that the easiest way to "demonstrate improvement" would be to set the bar low by failing utterly on the first attempt.

Then, I tried breaking it up into stages, asking for a response, getting a response, asking for a critique, getting a critique, asking for an improvement, and getting an improvement.

This worked better.

However, when I tried asking for a critique and then an improvement (again in separate stages), it instead started inventing fake problems to solve. I was asking it to implement a case-insensitive longest common substring function, and to return the version of the LCS in the longer of the two strings.

The second-pass critique was that the original (working) code didn't deal with the possibilty that "the longer string may not contain the LCS", which is impossible given the way it was originally implemented. Then it added some extra code to deal with this "problem."

3

xjE4644Eyc t1_jdtefph wrote

Thank you for this, what an novel way to approach the problem. I’m going to start using this regularly

1

LightVelox t1_jdry1xp wrote

This

Basically it makes GPT-4 reevaluate what it did wrong and try again until it can do it correctly

21

E_Snap t1_jdsbvd0 wrote

It’s pretty amazing how many shortcomings of that architecture could be summarized by “It only outputs when directly prompted to output, and won’t read its own output as it’s outputting”. Once these things can continuously take input and output, we’ll probably see quite the rush of advancement.

7

farmingvillein t1_jdsd5ae wrote

> and won’t read its own output as it’s outputting

This is literally what transformer decoders do, unless I've strongly misunderstood your statement.

13

E_Snap t1_jdsht5g wrote

I guess I could have worded it better. What I mean to say is that once they’ve output something, it’s in the record. There’s no pausing to think and go through a few different iterations of the sentence, or evaluating if what they’re about to say has faults. They just output directly, instead of reading what they’re about to output and vetting it.

17

farmingvillein t1_jdsmsh9 wrote

Gotcha. Yeah, that is presumably where the power of inner monologue / step-by-step / reflection come from.

Will be cool to see that (presumably) progressively systematized.

13

sdmat t1_jdt85pr wrote

Yes, it's amazing to see something as simple as "Assess the quality of your answer and fix any errors" actually work.

Or for more subjective results such as poetry "Rate each line in the preceding poem" then "Rewrite the worst lines".

6

yaosio t1_jdtf57p wrote

The neat part is it doesn't work for less advanced models. The ability to fix its own mistakes is an emergent property of a sufficiently advanced model. Chain of thought prompting doesn't work in less advanced models either.

7

sdmat t1_jdtj3ia wrote

Definitely, I was extremely skeptical of LLMs as a path to AGI but this makes it look possible. Maybe even likely.

4

yaosio t1_jdtvycq wrote

It's really neat how fast this stuff has been going. I remember when OpenAI claimed GPT-2 was too dangerous to release, which is amusing now because the output of GPT-2 is so bad. But when I used a demo that would write news articles from a headline I thought it was absolutely amazing. Then I, and most of the public, forgot about it.

Then GPT-3 comes out, and AI Dungeon used it before OpenAI censored it sonhsrd AI Dungeon stopped using it. The output was so much better than GPT-2 that I couldn't believe I liked anything GPT-2 made. I told people this was the real deal, it's perfect and amazing! But it goes off the rails very often, and it doesn't understand how a story should be told so it just does whatever.

Then ChatGPT comes out, which we now know is something like a finetune of GPT-3.5. You can chat, code, and it writes stories. The stories are not well written, but they follow the rules of story telling and don't go off the rails. It wasn't fine tuned on writing stories like AI Dungeon did with GPT-3.

Then Bing Chat comes out, which turned out to be based on GPT-4. It's story writing ability is so much better than ChatGPT. None of that "once upon a time" stuff. The stories still aren't compelling, but way better than before.

I'm interested in knowing what GPT-5 is going to bring. What deficiencies will it fix, and what deficiencies will it have? I'd love to see a model that doesn't try to do everything in a single pass. Like coding, even if you use chain of thought and self reflection GPT-4 will try to write the entire program in one go. Once something is written it can't go back and change it if it turns out to be a bad idea, it is forced to incorporate it. It would be amazing if a model can predict how difficult a task will be and then break it up into manageable pieces rather than trying to do everything at once.

5

sdmat t1_jdtytyy wrote

> Like coding, even if you use chain of thought and self reflection GPT-4 will try to write the entire program in one go. Once something is written it can't go back and change it if it turns out to be a bad idea, it is forced to incorporate it. It would be amazing if a model can predict how difficult a task will be and then break it up into manageable pieces rather than trying to do everything at once.

I've had some success leading it through this in coding with careful prompting - have it give a high level outline, check its work, implement each part, check its work, then put the thing together. It will even revise the high level idea if you ask it to and update a corresponding implementation in the context window.

But it definitely can't do so natively. Intuitively it seems unlikely that we can get similar results to GPT4+human with GPT4+GPT4 regardless of how clever the prompting scheme is. But the emergent capabilities seen already are highly surprising, so who knows.

Really looking forward to trying these schemes with a 32K context window.

Add code execution to check results and browsing to to get library usage right and it seems all the pieces are there for an incredible level of capability even it still needs human input in some areas.

5

COMPEWTER_adminisp t1_jdtugix wrote

> Once these things can continuously take input and output, we’ll probably see quite the rush of advancement.

interesting !

1

ghostfaceschiller t1_jds0zez wrote

Basically just giving the model the ability to observe the results of its previous action and decide if it wants to try something different based on the feedback

2

cegras t1_jdscwdv wrote

You mean, like continuously refining your google searches until you find the right stackexchange answer?

15

Majestic_Food_4190 t1_jdu3fod wrote

It amuses me that people always mentions things of this nature. If the answer is simply, yes.... Then it's still doing it far faster than you are. Making it a better developer than most others.

It's like Watson beating the top people at jeopardy. Was it just searching the internet? Pretty much. Did it in turn win jeopardy? Yes.

So does the how matter?

7

enryu42 OP t1_jdrzcc9 wrote

Do you mean re-prompt it asking to correct its mistakes? It is hard to try with the current tight limits on GPT4 prompt count, I'll try once API is properly available. But I strongly doubt it'll help much: it's not that the solutions have minor bugs, they're usually just completely wrong, i.e. the model doesn't "get" the idea for the correct solution.

(it might help for some of the problems from the "Beginner" category though, but these aren't that interesting)

6

ghostfaceschiller t1_jds202e wrote

Yeah it's essentially that at an automated level. Tbh it is powerful enough based on results so far that would actually be really surprised if it did not yield very significant gains in these tests.

I'm sure there will be a paper out doing it in like the next few days, so we'll see

15

Jeffy29 t1_jdsm90r wrote

>But I strongly doubt it'll help much: it's not that the solutions have minor bugs, they're usually just completely wrong

I strongly doubt that it wouldn't help. I haven't tested GPT-4 in coding but from what I've seen GPT-3 makes a number of simple errors, especially in longer complex code it's almost inevitable. But it's able to quickly identify and correct it when you point it out. GPT-4 not being able to compile and test its own code that is a big limitation that humans don't have. It also can't calculate the math, it's essentially guessing the calculation, but both can be addressed with an external compiler and calculator like Wolfram. Something humans also have access to. There would need to be some time limit imposed so it can't brute force the solution after guessing for a few days but even so I think the improvements would be quite large.

3

sdmat t1_jdt9ik9 wrote

> There would need to be some time limit imposed so it can't brute force the solution after guessing for a few days

Not exactly unheard of for junior programmers, to be fair.

3

farmingvillein t1_jdsdalw wrote

> Do you mean re-prompt it asking to correct its mistakes?

Well, re-prompt + asking it to bake test cases upfront and continuously analyze how failures line up with the test cases.

1

blose1 t1_jdsid0p wrote

It's the same on out of distribution problems, It will just confidently say false things, I will tell it what is wrong and explain why and it will correct code making it wrong/not working correctly in a different way. I recently build a thing and you can't find anything similar to it anywhere in open source and you can't find any tutorial/solution to this problem online and ChatGPT failed to deliver.

At the end of the day it's just statistics based on all available knowledge on the internet.

6

ghostfaceschiller t1_jdsnev1 wrote

This line of thinking sounds sillier and sillier every week. Its like talking to someone who has had their eyes shut and fingers in their ears for the last two months.

EDIT: and tbc, i'm not trying to argue that it isn't statistics-based/trained on the internet/etc. I'm saying that it turns out that kind of system is powerful & capable than we ever would have intuitively thought it would be

−1

blose1 t1_jdso3tt wrote

I literally told you my use case and it failed on that and it failed on similar problem 1-2 months ago when I was using 3.5 version, for my class of problems nothing changes, it fails the same way. I think you have your eyes shut and not reading what people write. I'm not talking about easy CRUD problems that you can find thousands of solutions online, ChatGPT is doing ok on these kind of tasks and it solved a lot of them for me too.

10

BeautifulLazy5257 t1_jds5lt4 wrote

How does ReAct work. Is it just a type of prompt engineering that directs the model to choose between a few tool descriptions?

Is it a type of sentiment analysis that chooses?

How can I recreate ReAct-iveness from scratch? What does the workflow look like

2

ghostfaceschiller t1_jds855j wrote

I would just look up ReAct, CoT(chain of thought), and LangChain Agents. Its pretty simple to implement

8

BeautifulLazy5257 t1_jdsr09g wrote

I was wondering if you knew the trick to ReAct without langchain.

For instance, memory is just passing the past conversations through the prompt as context. There's nothing programtic about it. You don't need the langchain library, you just have to craft the right prompt

I think that using langchain kind of obscures how the model is actually achieving the desired outputs.

Having models interact with pdfs ultimately is just turning a pdf into a string and passing the string as context while adding a prompt to help prime the model.

I'll look into CoT and look through the ReAct sourcecode, but I'm going to avoid the use of langchain for most stuff or even looking at ReAct documentation, since those docs are only going to tell me how to use those libraries and not tell me how to achieve the effect from scratch.

Edit:

This is a pretty clear overview of CoT. Very compelling as well.

https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html?m=1

I guess I'll start AB testing some prompts to breakdown problems and tool selections.

If you have any more input on particular prompts you've used, I'd be grateful.

Edit 2: https://www.youtube.com/watch?v=XV1RXLPIVlw&ab_channel=code_your_own_AI It can't get clearer than this. great video

4

tinkr_ t1_jdw30p3 wrote

Based on my recent experience using it to write code, that would certainly help for some--but not all--bugs coming out of GPT-4.

I posted about it in a different thread, but this was my experience: >Interestingly, I used GPT-4 to create a simply Neovim plugin yesterday and the experience was not as seamless as I was led to believe it'd be by the hype. It gave me generally ok code, but almost everything was buggy.

>It was able to debug itself sometimes, but the finally finish the plugin I needed to fix the code myself and post it back in the chat, telling it to use my fixed code to create a related function that it was unable to adequately generate.

>The problem I gave it was actually a simplified version of an already simple concept, I did not give it the full details of what I wanted. If you're interested, you can find the final plugin (after my corrections and updating it to allow user configs) here. A printout of the conversation to create the plugin can be found here.

Even with a simplified version of the objective, I had to step in and debug it myself and then give it the "good" code to use further. Maybe if I'd been more patient, it could've fixed itself entirely, but the experience to me seemed more like pair programming with a junior/mid-level software engineer. I was able to immediately see the issue with it's code, even though it was not.

Will still be revolutionary though. Definitely a massive boost to productivity using it, but I would trust it running in production without a thorough code review.

1

lambertb t1_jds24dr wrote

It cannot solve all coding problems. But it can solve many problems. And if the user is reasonably experienced, even code with errors is useful because they can quickly be corrected. Preliminary evaluations show a 40% increase in developer productivity from GitHub Copilot. And that seems totally plausible to me.

57

enryu42 OP t1_jds3452 wrote

I absolutely agree that it is useful. Even CoPilot is amazing at autocompleting "dumb" boilerplate code, which is a nontrivial amount of the code overall. However, these problems are designed to be challenging (these are competitions after all), and require ideas/intelligence to be solved. Apparently GPT4 cannot do it at all, so IMO it would be a stretch to call whatever it is doing "intelligence".

18

dimsumham t1_jdsfpqb wrote

it's not. it's giving you answers to appear intelligent, many times in almost magical ways, but it doesn't "think" - especially in steps.

The MSFT paper notes that this is one of its clearest shortcomings - it can't do long range planning. At least not yet. But i think this is partially people expecting way too much of a single model.

14

Ciber_Ninja t1_jdu81zy wrote

It can in fact think in steps. All you have to do is ask it to. In fact, multiple papers have shown that asking it to think in steps provides a significant increase in the accuracy of it's answers.

1

audioen t1_jdujtbl wrote

Yes. Directly predicting the answer in one step from a question is a difficult ask. Decomposing the problem to discrete steps, and writing out these steps and then using these sub-answers to compose the final result is evidently simpler and likely requires less outright memorization and depth in network. I think it is also how humans work out answers, we can't just go from question to answer unless the question is simple or we have already memorized the answer.

Right now, we are asking the model to basically memorize everything, and hoping it generalizes something like cognition or reasoning in the deep layers of the network, and to degree this happens. But I think it will be easier to engineer good practical Q&A system by being more intelligent about the way LLM is used, perhaps just by recursively querying itself or using the results of this kind of recursive querying to generate vast synthetic datasets that can be used to train new networks that are designed to perform some kind of LLM + scratchpad for temporary results = answer type behavior.

One way to do it today with something like GPT4 might be to just ask it to write its own prompt. When the model gets the human question, the first prompt actually executed by AI could be "decompose the user's prompt to a simpler, easier to evaluate subtasks if necessary, then perform these subtasks, then respond".

3

Trotskyist t1_jdt8tx6 wrote

It's still an extremely useful tool if you accept its limitations, and I think it's being reductive to say it can only solve "dumb" problems or suggest boilerplate code.

I used GPT-4 the other day to refactor/optimize an extremely bespoke and fairly complicated geoprocessing script that we use at work that was written by a former employee who's no longer with the organization. Yes, it got some things wrong that had to be corrected (sometimes all it took was feeding it a stacktrace, other times this wasn't enough and I'd have to figure out the issue myself)

But at the end of the day (literally, this was over the course of an afternoon,) I'd managed to cut the runtime by more than half, using libraries I'd never before touched and wasn't previously familiar with. It probably would have taken a week to implement otherwise.

11

SWESWESWEh t1_jdt1ly6 wrote

So my guess is whoever ran the test did a bad job with the prompting. One of the examples they share, https://github.com/enryu43/llm_coding/blob/main/atcoder_eval/gpt4_solutions/abc294_c.py, the author complains it's not an efficient algorithm. So I just gave it a shot with ChatGPT:

Me: What does this code do?

def main(): N, M = map(int, input().split()) A = list(map(int, input().split())) B = list(map(int, input().split()))

C = sorted(A + B)
pos_A = [str(C.index(a) + 1) for a in A]
pos_B = [str(C.index(b) + 1) for b in B]

print(" ".join(pos_A))
print(" ".join(pos_B))

if name == "main": main()

ChatGPT: This code takes two lists of integers as input, merges them, sorts the merged list, and then prints the positions of the elements from the original lists in the sorted merged list.

Here's a step-by-step explanation of the code:

The main function is defined. Two integers, N and M, are read from the input and mapped to int using map(int, input().split()). Two lists, A and B, are read from the input, with each element being mapped to int using map(int, input().split()). A new list C is created by merging lists A and B and then sorting the resulting list using sorted(A + B). Two new lists, pos_A and pos_B, are created. These lists store the positions of the elements from lists A and B in the sorted list C. The positions are calculated using the index() method and are 1-indexed. The elements in pos_A and pos_B are joined into strings and printed, separated by a space. The if name == "main": statement checks if the script is being run directly (not imported as a module), and if so, calls the main function. Here's an example to illustrate how the code works:

Input: 4 3 2 1 4 3 5 6 7

Output: 1 2 4 3 5 6 7 The sorted merged list C is [1, 2, 3, 4, 5, 6, 7]. The positions of the elements from the original lists in the sorted list are 1, 2, 4, 3 for list A and 5, 6, 7 for list B.

Me: Could this code be re-factored to be my algorithmically efficient?

Yes, the code can be refactored to be more algorithmically efficient. The current implementation uses the index() method to find the positions of elements from lists A and B in the sorted list C. This approach has a time complexity of O(n^2) because index() is called for every element in both A and B.

To improve the algorithmic efficiency, you can use a dictionary to store the positions of the elements in the sorted list C. This will reduce the time complexity of finding the positions to O(n). Here's a refactored version of the code:

def main(): N, M = map(int, input().split()) A = list(map(int, input().split())) B = list(map(int, input().split()))

C = sorted(A + B)
positions = {value: index + 1 for index, value in enumerate(C)}

pos_A = [str(positions[a]) for a in A]
pos_B = [str(positions[b]) for b in B]

print(" ".join(pos_A))
print(" ".join(pos_B))

if name == "main": main()

The main changes in the refactored code are:

A positions dictionary is created that maps the values in the sorted list C to their 1-indexed positions. The pos_A and pos_B lists are created by directly looking up the positions of the elements from the positions dictionary. This refactored version of the code improves the algorithmic efficiency by reducing the time complexity of finding the positions of the elements from O(n^2) to O(n).


So it accurately analyzes that this code solves the problem without knowing the problem definition, and will spot the inefficiency if asked to actually write algorithmically efficient code. Seems fine to me

9

SWESWESWEh t1_jdt2m8y wrote

It often has errors, but if you just paste the errors into the chat it will generally fix them. In the early versions of chatGPT, I had issues with doing stuff like writing networking code in C++, but it still got me a lot of the way there.

I recently went over writing a high throughput async data pipeline in Java, and it did a great job of writing the code and even taught me a new design pattern. I had to make a few small changes here and there, but basically it turned a week of work into a couple hours. With the context of the written code there, I also had it write unit tests and documentation for me, and I was able to have it add more unit tests and also integration tests based on my feedback.

I'm fine with people underestimating how good ChatGPT is as a coding assistant, it just makes me look better because of how productive it makes me.

11

WokeAssBaller t1_jdvmmfp wrote

I don’t even roll yet but that 40% number, I would love to see how they calculated it.

I’ve tried gpt 4 on a lot of problems and it fails 9/10 times and I would be faster just googling it.

This stuff will be amazing it’s just not quite yet

1

WokeAssBaller t1_jdvodfn wrote

Yeah I don’t buy a survey, could be heavily biased

0

lambertb t1_je02zgn wrote

Have you used the tools yourself? I have, and a 40% increase in productivity is totally plausible, and often an underestimate considering I can now do things I would not have even tried previously. I encourage you to try them, with healthy skepticism and an open mind.

1

WokeAssBaller t1_je04bbu wrote

I’m and MLE and I’ve used it a bunch, it’s hardly ever actually useful. It gets close but it’s not there and it’s faster to google almost every time.

It will be useful in probably a year or two, but it needs to understand how to run its own experiments. Anyone who actually thinks this is useful right now is just buying hype

1

lambertb t1_je29sc5 wrote

Isn’t it possible that your experience is not representative? Are you using ChatGPT or GitHub copilot?

1

WokeAssBaller t1_je6fveg wrote

I doubt it, I do pretty standard engineering, whats more likely is there is selection bias in the survey and people are overestimating it due to hype.

I'd love to see an actual double blind study.

1

lambertb t1_je6jjps wrote

There can’t be a double blind study because the people using the copilot will know they’re using it.

1

WokeAssBaller t1_je783go wrote

Fair enough then give them problems to solve and measure their output. This feels like “90% of dentists claim crest improves your dental health”

I’ll take an independent study into consideration but today I find it more of a novelty

1

lambertb t1_je802eu wrote

I agree the survey study is nothing close to being definitive. And it does smack it marketing. Still, my own experience suggests that these tools will be transformative. At the same time, I’ve gotten lost down an AI rabbit hole where it would have been more efficient for me to just do it myself. On balance though, my assessment is that these are already very helpful tools, and they’ll only get better.

1

WokeAssBaller t1_je80fbg wrote

They will absolutely reshape the world in the next 5 years, all I'm saying is in its current state I haven't found it helpful. I'm sure in the next couple of years it's the main thing I will use

2

currentscurrents t1_jdrpl3u wrote

I'm not really surprised. Anybody who's extensively used one of these tools has probably already run into their reasoning limitations.

Today's entire crop of self-supervised models can learn complex ideas, but they have a hard time manipulating them in complex ways. They can do a few operations on ideas (style transfer, translation, etc) but high-level reasoning involves many more operations that nobody understands yet.

But hey, at least there will still be problems left to solve by the time I graduate!

39

enryu42 OP t1_jds18j4 wrote

I absolutely agree, however, these models repeatedly exceeded expectations (e.g. 5 years ago I thought that "explaining jokes" would be a hard problem for them, with a similar reasoning...)

I tried that because I've heard that there are people inside competitive programming community claiming that GPT4 can solve these problems. But from what I gather, it is still not there.

12

WarProfessional3278 t1_jdrzo00 wrote

Horace He made a nice thread on this when GPT-4 first came out. Realistically this is expected - within the short time span, there isn't much else you can do to improve the model performance other than increasing size of training data, which resulted in data contamination.

I expect the next "big thing" to be some of self-correcting output, or better chain-of-thoughts reasoning.

35

anomhali t1_jdru6in wrote

leetcode questions and solution directly data leakage, although I do not specify the function signature, the program writes with a question exact same signature, If you change the question a little bit, it gives you the buggiest code ever.

25

liqui_date_me t1_jdr8516 wrote

This comment about GPT-4’s limited abilities in solving arithmetic was particularly interesting: https://www.reddit.com/r/singularity/comments/122ilav/why_is_maths_so_hard_for_llms/jdqsh5c/?utm_source=share&utm_medium=ios_app&utm_name=iossmf&context=3

Controversial take: GPT-4 is probably good for anything that needs lots of boilerplate code or text, like ingesting a book and writing an essay, or drafting rental contracts. There’s a lot of value in making that area of the economy more efficient for sure.

But for some of the more creative stuff it’s probably not as powerful and might actually hinder productivity. It still makes mistakes and programmers are going to have to go and fix those mistake’s retroactively.

23

enryu42 OP t1_jdrbyh5 wrote

Arithmetic can be solved in a toolformer-like way, by just giving it an access to a calculator. But this wouldn't help with coding.

Regarding the point about boilerplate, this is exactly what is surprising: GPT4 performs very well on exams/tests, which supposedly require some amount of creative reasoning. So either the tests are poorly designed, or it can do some creative tasks while not others. If the latter is the case, it would be interesting to learn which are the areas where it performs well, and why.

21

liqui_date_me t1_jdrd9dx wrote

One could argue that even standardized tests are somewhat boilerplate - if you practice enough SAT tests you’ll eventually do quite well at them, the questions are quite similar to each other from exam to exam. Ditto for AP exams.

I think a serious test for GPT4’s intelligence will be on one of the competitive entrance exams for some countries, like the IIT-JEE or the Gaokao or the International Math Olympiad, where the questions are made by domain experts and are designed to be intentionally difficult and specialized to solve.

19

enryu42 OP t1_jdrezba wrote

I don't know about IIT-JEE/Gaokao, but many of the problems from the International Math Olympiad are freaking hard. If the model aims for human-level intelligence, such high bar would be unfair - it is more of the realm of "the best human"-level intelligence.

To be fair, hardest problems from "AtCoder Grand" contests have the same issue. But "AtCoder Regular" problems should definitely be solvable by an average human with the right knowledge and skillset, and yet, GPT4 cannot solve anything (and it doesn't look like it is lacking knowledge).

16

blose1 t1_jdskab0 wrote

These models have access to all human knowledge, all scientific papers, books etc. If I would have such a knowledge I could solve any Olympiad tasks.

3

visarga t1_jdtxxfd wrote

You're mistaken, Olympiad problems require bespoke tricks that don't generalise from problem to problem. It's not a problem of breadth of knowledge, they don't test memorisation.

6

blose1 t1_jdu4cln wrote

What? Where I'm exactly mistaken? Because both of my statements are true. And there is 0% chance you can pass olympiad task without knowledge, human with all the knowledge WILL reason and come up with a solution BASED on the knowledge he has AND experience of others that is part of that knowledge, if that weren't true then no human would solve any Olympiad. Sorry, but what you wrote in context of my comment is just ridiculous, and looks like a reply to something I didn't write.

4

currentscurrents t1_jdrt3gv wrote

I think all tests designed for humans are worthless here.

They're all meant to compare humans against each other, so they assume you don't have the ability to read and remember the entire internet. You can make up for a lack of reasoning with an abundance of data. We need synthetic tests designed specifically for LLMs.

11

Yecuken t1_jdsm4w1 wrote

Tests would not help against optimization, models will just learn how to pass the test. Optimization will always win against any problem with a known solution

2

maxToTheJ t1_jdrx281 wrote

> which supposedly require some amount of creative reasoning.

The dont which is exactly has been part of the complaints of teachers in regards to standardized testing

3

farox t1_jdrfllu wrote

This is pretty much it. Just yesterday I needed to write some python web ui. So I described roughly what I needed and it gave me a solution for that. It had a couple of errors but gave me a basis to then work off of. Saved me a lot of "who do I do X with flask", but little complexity. For that I am sure it would take me longer to describe it, than to implement the logic myself.

7

ngildea t1_jdrzg0p wrote

I agree, but is that opinion controversial? Seems patently obvious after talking to it about coding for a few minutes. Maybe it's controversial among people who have fooled themselves into thinking it's thinking?

6

liqui_date_me t1_jds3b6q wrote

I would say it's controversial around many folks who aren't directly involved in programming and who get impressed by cute demos on Twitter. People who actually know how to code see it as a superpower to make themselves more efficient, while also lamenting about how it makes silly mistakes.

https://www.reddit.com/r/cscareerquestions/comments/1226hcn/im_worried_about_ai_taking_our_jobs/

I highly doubt that software engineering jobs will become obsolete. There's going to be a lot of disruption and there might be some wage deflation too (imagine the price of writing the boilerplate components of an iOS app goes from 50,000 dollars to 50 dollars), but so much of software engineering is testing, QA and human collaboration. I think we're just going to have to re-orient our careers around correcting code from LLMs.

6

ngildea t1_jds53pl wrote

Yeah I agree with all that. I've been trying to think of an analogy. Maybe in the same way that spreadsheets didn't make accounts obsolete?

5

robobub t1_jdswai9 wrote

Indeed, it just made them more efficient so we need less of them and/or less pay for them.

2

No_Brief_2355 t1_jdthreb wrote

Less bookkeepers and lower pay but accountants (CPAs) are pretty in demand and still well paid.

2

__scan__ t1_jdxj30g wrote

This is what will happen if we’ve either a) exhausted demand, or b) made software development much easier such that people who previously couldn’t do it now can.

The first was likely true for accountants, but is less obviously so for software — there’s still vastly more useful software to build than actually gets built, and each piece of new software that gets built generally increases that demand.

Perhaps the second is true though — do you foresee enough non-developers being able to write, deploy, maintain, and operate production systems as a result of LLMs (in a way that high level languages and previous tooling didn’t)? If not, or if not in sufficient numbers, maybe what happens is that software developers become more in demand than ever due to their productivity increases resulting in even demand for more software (because they can write it quicker).

1

reditum t1_jds67d4 wrote

>Controversial take

That's not controversial at all

4

trajo123 t1_jdsflhh wrote

> like ingesting a book

Interestingly, currently LLMs can't naturally ingest a book, since it doesn't fit in the prompt (they can fit 32K tokens that's about 24k words). This is where GPTs differ fundamentally from the human brain. GPTs always produce one token at a time, given the full prompt. There is no state kept between token generation steps other than the prompt which grows one token at a time. The human brain on the other hand has a state, and it is continuously evolving. In the case of a book, our brain state will be affected by the content of the book as we read it.

LLMs need to be able to hold more state to get to the next level. Perhaps get augmented with some sort of LSTM architecture where state can be built up from a theoretically infinite amount of input, or have another compressed/non-human-readable prompt that gets read before generating the token and gets updated after generating the token.

4

visarga t1_jdtyd0c wrote

> Perhaps get augmented with some sort of LSTM architecture where state can be built up from a theoretically infinite amount of input

That would be sweet, infinite input. Does RWKV do it?

1

robobub t1_jdsria4 wrote

While GPT-4 is autoregressive, it takes into account the tokens it has chosen to generate incrementally. So it is only limited to O(1) if it attempts to answer with the correct answer immediately. It can in theory take O(m) steps, where m is the number of intermediate tokens it predicts.

3

robobub t1_jdwa5wf wrote

Ill add this:

If it is possible for GPT to do 1+1, it can do a large number of them incrementally. It's not smart enough to do it all the time by planning ahead, (you'll have more success if you encourage GPT to have a train of thought reasoning here and here) but it's possible.

1

fiftyfourseventeen t1_jdt29u3 wrote

I've wasted too much time trying to do basic tasks with it as well. For example, I argued with it for many messages about something that was blatantly wrong, and it insisted it wasn't (that case it was trying to use order by similarity with an arg to sort by euclidian distance or cosine similarity, but it really didn't want to accept that cosine similarity isn't a distance metric and therefore has to be treated differently when sorting).

My most recent one was where I wasted an hour of time doing something that was literally just 1 line of code. I had videos of all different framerates, and I wanted to make them all 16fps while affecting length and speed as little as possible. It gave me a couple solutions that just straight up didn't work, and then I had to manually fix a ton of things with them, and then I finally had a scuffed and horrible solution. It wouldn't give me a better algorithm, so I tried to make one on my own, when I thought "I should Google if there's a simpler solution". From that Google search I learned "oh, there's literally just a .set_fps() method".

Anyways from using it I feel like it's helpful but not as much as people make it out to be. Honestly, GitHub copilot had been way more helpful because it can auto complete things that just take forever to write but are common, like command line args and descriptions, or pieces of repetitive code.

2

Haycart t1_jdtcnc5 wrote

Where are they getting O(1) from? Has some new information been released regarding GPT-4's architecture?

The standard attention mechanism in a transformer decoder (e.g. GPT 1-3) has a time complexity of O(N^2) w.r.t. the combined input and output sequence length. Computing the output autoregressively introduces another factor of N for a total of O(N^3).

There are fast attention variants with lower time complexity, but has there been any indication that GPT-4 actually uses these? And in any case, I'm not aware of any fast attention variant that could be described as having O(1) complexity.

1

visarga t1_jdtypz6 wrote

Doesn't autoregressive decoding cache the states for the previous tokens when decoding a new token?

2

Haycart t1_jdu7hlp wrote

Oh, you are probably correct. So it'd be O(N^2) overall for autoregressive decoding. Which still exceeds the O(n log n) that the linked post says is required for multiplication, though.

1

maskedpaki t1_jds3ate wrote

Try getting human programmers to do those problems. Guarantee many will fail too.

13

enryu42 OP t1_jds49mm wrote

Well, they do, and quite successfully, this is what these sites are about...

Of course if you ask some frontend engineer to solve some math-y problem, they'll be confused. But this is simply because they lack knowledge, and GPT4 evidently doesn't have this issue. Moreover, I doubt any human programmer will have troubles with the "Beginner" problems, regardless of their specialization.

15

farmingvillein t1_jdsfaq5 wrote

> Moreover, I doubt any human programmer will have troubles with the "Beginner" problems, regardless of their specialization.

FWIW, I think you overestimate humans. Particularly those who haven't actively been practicing leetcode-style coding. E.g., many of the problems are specified in "competition language", not "human-friendly language" (where "human-friendly", e.g., is something you'd be happy to see in a design doc). (Should that matter to GPT-4? I dunno.)

I do think it is fair though to say that, with some baseline level of practice (which is potentially the relevant comparison point), a lot of people would probably nail the "beginner" tests.

6

Narootomoe t1_jds6i3w wrote

Thats a good way to put it I don't think I've seen yet, may I steal it?

"If a human had instant recall to all the knowledge GPT4 has, it wouldn't stumble on any of these problems", something like that

3

red75prime t1_jdtqsmj wrote

Does GPT-4 have instant recall of all of its training data? I doubt it. It probably has some emergent structures akin to episodic memory, but it seems to have trouble distinguishing its memories from its hallucinations, so it's not a fully functional episodic memory (it lacks metamemory or something like that).

1

robobub t1_jdst1oo wrote

> Moreover, I doubt any human programmer will have troubles with the "Beginner" problems, regardless of their specialization.

Have you not heard about how many fail to pass FizzBuzz interview questions?

3

ngildea t1_jdryzgh wrote

I've tried quite a few times to get it to help with a problem I've been thinking about for a while. Every time it says it understand and then writes code that shows it doesn't understand at all and violates every constraint I give it.

Not surprising but it does point to a lot of contamination & regurgitation of the training material fooling people into thinking it's intelligent

6

Ciber_Ninja t1_jdu856g wrote

Try having it generate tests first. You gotta get it into the proper context.

1

trajo123 t1_jdscn2h wrote

>Apparently it cannot solve coding problems which require any amount of thinking.

Not yet, and this is not surprising.

First, GPT-4 can solve many coding problems on the first try. Yes, these small programs may be simple, but how many developers can write code that directly runs? Maybe in 1-2 languages, and even then only in the problem domain that they are very familiar with. Also, since LLMs can write code in more languages and frameworks than most developers, LLMs can actually solve more coding problems than most of the programmer out there... So LLMs already contain vast amounts of "knowledge" and "intuitive ability". But intuition is not enough to solve larger or more complex problems.

So, finally, coming to the thinking part. What challenging problems can be solved by humans by "off-the-cuff"? We also, scribble, draw diagrams, try out a few things, see if things run and work as expected, do web searches, talk to stake holders, sleep on the problem, etc. In other words, in any non-trivial problem solving, we also rely heavily on feedback between our brains and the external world.

Frankly, I don't see this as a problem of LLMs, they can be effectively used as foundation models. One could have another layer, on top of LLMs to solve problems end-to-end. For example one could build a meta-model, where multiple instances work together in an actor-critic fashion. The actor is the one interacting with the user, the critic can be prompted (and perhaps) fine-tuned with with general problem solving strategies, with the main prompt being to second-guess and try to find flaws in the reasoning of the actor. Just as reinforcement learning (RL) was used to improve the general usability of ChatGPT, RL could be used to fine-tune such a meta-model (or maybe just fine-tune the critic). ...thinking fast, thinking slow

P.S. I think LLMs also need some sort of memory, so that not everything needs to be in the prompt to work on a problem.

6

AlexandraTheeAuthor t1_jdsdp3o wrote

It can, but I think it's something about how it selects what to do. There needs to be more logic to it. I find it does really well you tell it to use reasoning. For example i give it code and ask for it to draw inspiration. It does really well at this. Really, it needs a good prompt engineer. But there's no set strategies yet but there will be. I can almost get it to generate anything if I prompt it right. So it's more I need to figure out how it thinks of stuff and try to present my problem to it that way

5

DigThatData t1_jdsbb8w wrote

well, i was able to use ChatGPT to generate a novel, functional, complete software library for me, including a test suite, tutorial, and announcement blog post. crazy idea: maybe you just need to get a bit more creative with your prompting or anticipate that there might need to be multi-stage prompts (or god forbid: back and forth dialogue and iteration) for certain applications.

3

uspmm2 t1_jds5ium wrote

It's interesting but the title should have mentioned its about code contests problems, that are things nobody writes in the real world

2

Calamero t1_jdt1h8v wrote

Also the prompt seems messed up? What are they doing there? Why not give it the original question?

2

Cwlrs t1_jdsqyyq wrote

It's performing really well for my project. Online web app game in python flask socketio

2

K9ZAZ t1_jdsyxgb wrote

People got way, way, way out over their skis on the whole "this is agi" and I would love to hear some of their responses to this.

2

boaking69 t1_jdt5463 wrote

Literally nobody said that

2

visarga t1_jdtz3wx wrote

The original title of the "Sparks of AGI" paper was "First Contact With an AGI System" (line 8). If you carefully read the paper it suggests GPT-4 is stronger than what seems to be our consensus.

1

ThePhantomPhoton t1_jdsyzhn wrote

It’s easier to gauge the effectiveness of these large language models within the context of what they are actually doing, and that is repeating language they’ve learned elsewhere, predicated on some prompt provided by the user. They are not “reasoning,” although the language they use can lead us to believe that is the case. If you’re disappointed by their coding, you will certainly be disappointed by their mathematics.

2

EgoistHedonist t1_jdukvrp wrote

GPT-4 has some serious limitations. It cannot for example say how many words its own response will have, as it cannot plan ahead. When it starts to generate the response, it doesn't know how it will end.

But these limitations can soon be circumvented by adding long-term memory and other mechanisms, so it's only a matter of time when it's on a whole new level regarding tasks like these.

2

Smallpaul t1_jds9001 wrote

My rule of thumb is that GPT4 seems to be able to solve any problem that a first year university CS student at a mid-tier University could solve.

1

cegras t1_jdsd89g wrote

I don't see how it is possible to not end up just memorizing the internet, which is full of enough questions and discussions to simulate convincing Q&As. Consider if a team had invented an algorithm or heuristic to avoid data contamination (https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks). Then what you have is something that can separate content into logically similar, but orthogonal realizations. That would be an incredibe tool and worth a prize in its own right.

1

pengo t1_jdt6iv2 wrote

> Then what you have is something that can separate content into logically similar, but orthogonal realizations.

Like a word vector? The thing every language model is based on?

1

cegras t1_jdta9mj wrote

More like, the ability to know that 'reversing a linked list' and 'linked list cycle and traversal problems' are the same concepts but different problems, and to separate those into train/test. Clearly they haven't figured that out because ChatGPT is contaminated, and their (opaquely disclosed) ways of addressing that issue don't seem adequate at all.

3

Abikdig t1_jdtail3 wrote

I check ChatGPT for optimizing my Leetcode solution everyday. It rarely optimizes it without breaking the code.

Sometimes the only optimization that I get from it is that it tells me to use Datastructure X instead of Y because it is better in this kind of problem.

1

LifeScientist123 t1_jdtd8yn wrote

  1. All this shows is that GPT-4 can't solve some coding problems. Which developer can confidently say they can solve any coding problem in one-shot? Does this mean developers/humans don't have AGI?

  2. I've used ChatGPT (gpt3.5) to optimize code that I already wrote and it came up with several optimizations. I'm 100% sure my code was not part of chat-gpt training data and yet it performed perfectly fine on a new coding problem. Now it's possible that the training data might have included something similar to what I gave ChatGPT but that just means that we have to provide more training data, and then a future version will solve those problems where it previously failed.

  3. isn't this how humans learn? They encounter problems where we don't know the solution. Then we work it at for a while until we figure out some way to solve the problem that wasn't immediately obvious earlier. Writing off the abilities of GPT-4 based on one failed coding test seems premature.

1

visarga t1_jdu1fgf wrote

> Does this mean developers/humans don't have AGI?

The intellect of our species isn't universal, we're merely experts at self-preservation and propagation. Take, for instance, chess – it isn't our forte, and even a small calculator could outperform us. Our minds are incapable of 5-D visualization, and we struggle to maintain over 10 unrelated items in our immediate memory. Generally, we falter when addressing problems where the initial move relies on the final steps, or situations that don't allow for linear progression, such as chess or mathematical quandaries. It took us centuries to decipher many of these enigmas. Our specialization lies in tackling human-centric challenges, rather than all-encompassing ones. Evolution simply hasn't had sufficient time to adapt our cerebral cortex for mathematical prowess.

1

WarmSignificance1 t1_jdv1usr wrote

Part of intelligence is the ability to learn in an efficient manner. For example, an expert programmer doesn't need to see hundreds of millions of examples to learn a new programming language. They can read the docs, play around with it a bit, and then apply their existing experience and models that they've built up over time to the new language.

LLMs fall over in this same situation.

1

LifeScientist123 t1_jdvmkkx wrote

>Part of intelligence is the ability to learn in an efficient manner.

Agree to disagree here.

A young deer (foal?) learns to walk 15 minutes after birth. Human babies on average take 8-12 months. Are humans dumber than deer? Or maybe human babies are dumber than foals?

Intelligence is extremely poorly defined. If you look at the scientific literature it's a hot mess. I would argue that intelligence isn't as much about efficiency as it's about two things,

  1. Absolute performance on complex tasks

AND

  1. Generalizability to novel situations

If you look at LLMs, they perform pretty well on both these axes.

  1. GPT-4 has human level performance in 20+ coding languages AND 20+ human languages on top of being human level/super human in some legal exams, medical exams, AP chemistry, biology, physics etc etc. I don't know many humans that can do all of this.

  2. GPT-4 is also a one-shot/ few-shot learner on many tasks.

1

TehDing t1_jdtwa6k wrote

I have not been impressed with LLMs reasoning for solving novel puzzles/ challenges. Ask any model to play Wordle with you. They are not good

1

rokuyou t1_jdtyqe7 wrote

GPT4 and competitive programming problems would be a better title since not everyone is going to read that

1

lvvy t1_jdu3qg4 wrote

It would be interesting to see if ChatGPT can solve these problems not with code, but with a text instruction, that would allow a human to solve these problems? So if you force it to write giant text wall of actions, would a human with calculator solve these confident? Also, is code that it generates cannot be corrected at all by discussing or discussing would take too long?

1

nanowell t1_jduiybq wrote

Codex models were able to solve those problems. Probably the next version of Codex will be finetuned GPT-4 model for coding and it will solve most of those problems.

1

Tr4sHCr4fT t1_jduwqp7 wrote

it can only replace bootcamp juniors ^/s

1

Upstairs-Youth5483 t1_jdvcahh wrote

I find got, as a coder, to be very useful for doing my repetitive coding tasks. For example taking a sql table def making very basic cruds, making the classes that call the procs.

It does have a long way to go but it has the illusion of consciousness in that it does remember what you said with somewhat understanding of what you said.

I have caught it making up settings that don’t exist and every line of code should be properly scrutinized.

1

spacefoxy99 t1_jdwaxsh wrote

i tried with both 3.5 and 4 to create a simple memory game and not only did it cut the code off halfway through but the continued code didn't match what was happening in the first and the cide didn't work. tried two other times over the course of this month and the code is filled with errors and missing statements. gpt seems bad at coding, at least to me.

1

1bir t1_jdwodl0 wrote

There's a version that interacts with Wolfram Alpha; does that do any better?

1

12max345 t1_jdx4sdm wrote

I feel like LLMs have encoded sort of law of a languages in their latent space through texts and responding accordingly, anything that follows a law isnt called concious for e.g inaminate objects follow law of physics,but that doesnt mean that it indicates an intelligent behvaiour.

After all texts are medium to represent our thoughts, its the thoughts that matter not the medium.

The concept of causality , fundamental reality , and dcesion making is much more than following laws of languages which are just a means.

These LLMs cant question you until you ask them explicitly,they cant interject you , knowledge was never consciousness ,its these abilities that compose consciousness

I dont know how much sense i make to others or maybe i am at loss of good words,in a nutshell any model that fundamentally predicts tokens based of weightage of previous tokens can never achieve consciousness. We

1