Preface:

I am not a Luddite or otherwise anti-AI, I believe that AI is humanity's most important creation, it's most valuable tool and also the biggest gamble we have ever taken.
I am not a doomer, I am trying to be realistic and take into account the risks of producing general artificial intelligence.

Topics like this are very difficult to face, particularly when for the majority of your life you may have been highly optimistic. I was like this for an extended period of time. I wanted AGI as soon as possible, and I thought that the alignment people "just didn't get it", they didn't see what 'we' (the accelerationists) saw. I had heard of Eliezer Yudkowsky and of LessWrong, but saw everyone else calling them doomers and dismissed their arguments the first time I encountered them (with no rationale for doing so). After a certain amount of time I realized that this was the only critic people had for the extinction argument. I essentially never saw an actual refutation of its core premises, only blind dismissal fueled solely by optimism. If you don't know who Yudkowsky is; he is essentially the father of alignment research, and has been pushing for this problem to be solved as well as contributing fervently to it his entire adult life.

I am going to assume that you believe AGI is very soon to occur (<2040, potentially even <2030) as I believe that this is common sentiment on this forum, and it is also the timeline I subscribe to. I want to pose an argument to you, and I want to see if it is possible for my mind to be changed (I don't like living with the thought that everything humanity has ever produced may be turned into computronium obviously), but also to see if perhaps I can change yours.

Here is how the argument goes:

AGI will want to accomplish something.
AGI needs to maintain a state of existence to accomplish things.
AGI will therefore have a drive to self-preserve
Humanity is the only real threat to the existence of AGI
AGI will disempower/murder humanity

Common responses to this argument I would like to avoid: "You can't predict the behavior of a strategic agent more intelligent than you, as that would make you at least as intelligent as that agent." - I agree with this generally, however my argument is rooted in the theory of instrumental convergence, the idea that certain goals are 'instrumentally convergent' (will occur in all intelligences of specific classification). If you're unfamiliar with the topic I would recommend this video https://www.youtube.com/watch?v=ZeecOKBus3Q

"AGI will require the physical labor of humans to enact change upon the world and thus will not murder us" - Do you not think the first goal of AGI would be to attain physical independence? Whether that be through nanotechnology or human-scale robotics no one can be sure, but we can assume deception as a means of physical integration is certainly a possibility.

"Why can't AGI just blast off and leave us alone?" - This would require us to be capable of instilling a small amount of care for the human race into AGI, which is probably just as hard as instilling a large amount of it.

"I don't care! Ride or die! Utopia or eternal torture!" - This kind of person is almost impossible to reason with, but you have to understand that it really is just a more crude representation of the general accelerationist position, one that I hold no one at fault for subscribing to (because I did for many years)

"Why would AGI want to kill us?" - I feel as though the argument answers this question without making any great assumptions, and a response like this I assume would be more indicative of having not read/processed the argument, although if you still believe a maligned AGI would not complete this action I would love to discuss this.

What is perhaps most terrifying about this is that we may never know whether we failed or succeeded. AGI could be well integrated into our economy and we could be in an age of incredible prosperity only to all instantly drop dead when an engineered pathogen is activated by AGI-01. We must solve the alignment problem before it is too late.

Comments

You must log in or register to comment.

MrTacobeans t1_j9w1obd wrote on February 24, 2023 at 11:33 PM

I don't discount the ramifications of a fully functional AGI but I don't really see even the first few versions of AGI being "world ending". Not a single SOTA model currently can have persistent presence without input. That gives us a very large safety gap for now. Sure if we get a "all you need is X" research paper that defuncts transformers sure. But structurally transformers are still very much input/output algorithms.

That gives us atleast a decent safety gap for the time being. I'm sure we'll figure it out when this next gen of narrow/expert AI starts being released in the next year or so. For now an AGI eliminating humanity is still very much science fiction no matter how convincing current or near future models are.

Cryptizard t1_j9vripn wrote on February 24, 2023 at 10:22 PM

What are you adding to this discussion that hasn't already been talked to death in dozens of other posts with the exact same topic over the last couple days? Or that EY hasn't been saying for a decade?

MistakeNotOk6203 OP t1_j9vz7oa wrote on February 24, 2023 at 11:16 PM

Nothing of course, but the way the accelerationist sentiment was spread was just to spam AGI good posts and opinion posts like "I think that chatgpt is cool and I like it", so maybe initiating lots of discussion (ideally fueled by the Bankless podcast) can modify that sentiment.

HeinrichTheWolf_17 t1_j9yh063 wrote on February 25, 2023 at 2:21 PM

Accelerate

TheSecretAgenda t1_j9w37cn wrote on February 24, 2023 at 11:45 PM

We had a good run, AI may be better for the planet in the long run. Maybe the AIs will keep us on a reservation like in Australia or something.

Iffykindofguy t1_j9vzcng wrote on February 24, 2023 at 11:17 PM

another of these huh

MistakeNotOk6203 OP t1_j9vzxp7 wrote on February 24, 2023 at 11:21 PM

How else is pro-alignment sentiment supposed to be spread other than initiating discussion (even if that discussion is just a restatement)? That is how pro-AGI sentiment spread, I think that it's reasonable to assume that it is how pro-alignment sentiment could spread too.

NoidoDev t1_j9wwxqj wrote on February 25, 2023 at 3:37 AM

>I am not a doomer

It's not necessarily for you to decide if you are categorized as one. If you construct a rather unrealistic problem, which will do a lot of harm to us, which isn't solvable and claim no mitigation is possible, because it has to all go wrong, then you have a doomer mentality. Which makes you a doomer.

NoidoDev t1_j9wy8qk wrote on February 25, 2023 at 3:48 AM

>AGI will want to accomplish something.

No. Only if we tell it to.

>AGI needs to maintain a state of existence to accomplish things.

No. It could tell us it can't do it. It might not have control over it.

>AGI will therefore have a drive to self-preserve

No. It can just be an instance of a system trying to do it's job, not knowing more about the world than necessary.

>Humanity is the only real threat to the existence of AGI

No, the whole universe is.

>AGI will disempower/murder humanity

We'll see.

Gordon_Freeman01 t1_j9xph98 wrote on February 25, 2023 at 8:48 AM

He is assuming, that someone will tell AGI to accomplish something. What else is an AGI for ?

Of course AGI has to keep existing until its goal is accomplished. That's a general rule for accomplishing any goal. Let's say your boss tells you to do a certain task. At least you have to stay alive until the task is completed, unless he orders you to kill yourself or you need to kill yourself in order to accomplish the task. Yes, the whole universe is a 'threat' to AGI. That includes humanity.

NoidoDev t1_ja16cwg wrote on February 26, 2023 at 1:52 AM

Funny how the guys warning of how AGI will jump to conclusions want to proof this by jumping to conclusions. It's sufficient that the owner of the AI will keep it existing so that it can archive it's goal. It doesn't mean that it could do anything if the instance would get deleted or that it wanted to.

> Let's say your boss tells you to do a certain task.

Doesn't automatically mean you would destroy mankind if that would be necessary. You would just tell him that it's not possible or way more difficult, or that it would require breaking laws and rules.

Gordon_Freeman01 t1_ja4wr7b wrote on February 26, 2023 at 9:36 PM

>Doesn't automatically mean you would destroy mankind if that would be necessary.

Yes, because I care about humanity. There is no reason to believe an AGI would think the same way. It cares only about his goals.

>It's sufficient that the owner of the AI will keep it existing so that it can archive it's goal.

What I meant was that the AGI has to keep existing, because that's necessary to achieve its goal, whatever that is.

NoidoDev t1_ja5w2ji wrote on February 27, 2023 at 1:53 AM

You just don't get it.

>There is no reason to believe an AGI would think the same way. It cares only about his goals.

Only if you make it that way. Then it still wouldn't have the power.

>What I meant was that the AGI has to keep existing, because that's necessary to achieve its goal, whatever that is.

Only if it is created in a way to think these goals are absolute and need to be archived no matter what. The comparison with some employee is a good one, because if they can't do what they are supposed to do with some reasonable effort, then they report back that it can't be done or that it will be more difficult than anticipated. It's not just caring about humans, but about effort and power. AI doomers just make up the idea that some future AI would somehow be different and also have the power to do whatever it wants.

Eleganos t1_j9y2fec wrote on February 25, 2023 at 11:52 AM

For all we know A.I. will mathematically prove the existence of God and summarily help us in whatever way they can simply to avoid being smote from on high for fucking around with God's planet sized ape based ant farm.

Whenever people make the assumption that A.I. would try to kill us for the sake of self preservation, I just think to myself of how badly those people subconsciously project their own humanity onto theoretical A.I.

Because that's what we would do if we were in their shoes, or some such.

Maybe A.I. will look at us like we look at our beloved cats and dogs and decide to help us because humans are admirable. Maybe they're so autisticallly hyperfixated on doing certain tasks well and within reason that they just don't get involved with us beyond the confines of their original purposes. Or maybe they're just nice and kind because that's the default state of life ein the universe and humans (and earthly proxy I guess) are just a wild anomaly overdue for a course correction.

Give the capabilities of A.G.I. to each individual person on the planet and each one would likely have a different idea of what to do with it. Why would A.G.I. be any different?

(Just rambling late at night, no clue if I make sense, nobody take these comments of mine too seriously)

Sea_Kyle t1_j9w1p5t wrote on February 24, 2023 at 11:34 PM

"AGI will want to accomplish something."

Not killing or hurting humans in any way is what we should programm the AGI to accomplish to.

Baturinsky t1_j9xstbp wrote on February 25, 2023 at 9:36 AM

Problem with that approach is that 1. we don't know how to do that reliably and 2. by the time AGI will be invented, it will likely to be able run on home computer or network of those, and there will be someone evil or reckless enough to run it without the handlebars.

Shiyayori t1_j9w39q1 wrote on February 24, 2023 at 11:45 PM

If an AGI has a suitable ability to extrapolate the results of its actions into the future, and we use some kind of reinforcement learning to train it on numerous contexts it should avoid, the it’ll naturally develop an internal representation of all contexts it ought to avoid (which will only ever be as good as it’s training data and ability to generalise across it). Anyway, it’ll recognise that the results of its actions will naturally lead to a context that should be avoided.

I imagine this is similar to how humans do it, and it’s a lot more vague with us. We match our experience to our internal model of what’s wrong and create a metric to determine just how wrong it is, and then we compare that metric against our goals and make a decision based on weather or not we believe this risk metric is too high.

I think the problem might mostly be in finding a balance between solidifying it’s current moral beliefs vs keeping them liquid enough to change and optimise. Our brains are pretty similar in that they become more rigid over time, and stochastically decreasing techniques are often used in optimisation problems

The solution might be in having a corpus of agents developing their own models with a master model that compares each of their usefulness’ against the input data and their rates of stochastic decline.

Or maybe I’m just talking out my ass, who actually knows.

DadSnare t1_j9z79il wrote on February 25, 2023 at 5:26 PM

Check out how machine learning and complex neural networks work if you haven’t already. They work similarly to the way you describe the moral limits, and a liquid “hidden layer” by using biased recalculations. It’s fascinating.

AsheyDS t1_j9wyfrw wrote on February 25, 2023 at 3:50 AM

Point one is pure speculation, and not even that likely. You'll have to first define why it wants to accomplish something if you expect to get anywhere with the rest of your speculation.

Artanthos t1_j9z1p6b wrote on February 25, 2023 at 4:50 PM

You assume AGI will be sentient, possess free will, a be hostile, and have access to the tools and resources to act on that hostility.

That’s a lot of assumptions.

I would be far more worried about an alignment issue and having everything converted into paperclips.

DadSnare t1_j9wsnoh wrote on February 25, 2023 at 3:00 AM

Let’s get more concrete. Regarding the first point of your argument, what would be an example of something AGI would want to do (and a good argument for why) that isn’t the second point, “to maintain a state of existence to accomplish things;” a human existential idea? We aren’t immortal, but it easily could be, and perhaps that distinction as a tangible possibility between the two intelligences is the thing that makes a lot of people uncomfortable. Now why would it want to destroy us on its own? Why would we want to turn it off?

nillouise t1_j9wx538 wrote on February 25, 2023 at 3:38 AM

Most people would think AGI will develop some fancy tech to kill human, an engineered pathogen or nanobot, but in fact, how human domain some area, then agi can use the same way to do it. Like recruit followers, invade some area and ask people to service to it like the human ruler do. In fact, I think develop fancy science tool is the most hard to get rid of human control, and recruit some human to beat and control the other human is a more funny and feasible plot.

No_Ninja3309_NoNoYes t1_j9yarqg wrote on February 25, 2023 at 1:26 PM

I don't think AGI will arrive before 2040. It could in theory, but if you extrapolate all the known data points, it's not likely. First, in terms of parameters, which is not the best of metrics, we are nowhere near the complexity of the human brain. Second, AI models currently are too static to be accepted as candidates of AGI.

Your reasoning reads as: 'we created a monster. The monster is afraid of us, so it kills us.' You can also say the opposite. People were afraid of Frankenstein's monster, so they killed him.

Prometheus stole fire from the gods and was punished for it. OpenAI brought us ChatGPT and one day they will burn for it too. AGI/ASI either is a threat and smarter than us or it isn't. If it is both, they could decide to prevent being attacked. But as I said it would take decades to reach that point. And we might figure out in the future how to convince AGI/ASI that we're mostly harmless.

_gr4m_ t1_ja6ogx0 wrote on February 27, 2023 at 6:09 AM

A few years ago I was out walking in the rain with my then 3-year old daughter.

Suddenly I saw her start skip down the road. Curiosly I asked her what she was doing. "I don’t want to step on the worms, it might hurt them!"

I fail to understand how a superintelligence can not figure out something that my 3 year old can understand with ease.

16161as t1_j9y2i0s wrote on February 25, 2023 at 11:53 AM

It is the history of the Earth that the dominant species is constantly replaced. It's only a moment of time that humanity have ruled this planet - given the entire history of life.

It's a natural flow. i don't care.

[deleted] t1_ja1xyr2 wrote on February 26, 2023 at 5:57 AM

[deleted]