Submitted by MistakeNotOk6203 t3_11b2iwk in singularity
Preface:
-
I am not a Luddite or otherwise anti-AI, I believe that AI is humanity's most important creation, it's most valuable tool and also the biggest gamble we have ever taken.
-
I am not a doomer, I am trying to be realistic and take into account the risks of producing general artificial intelligence.
Topics like this are very difficult to face, particularly when for the majority of your life you may have been highly optimistic. I was like this for an extended period of time. I wanted AGI as soon as possible, and I thought that the alignment people "just didn't get it", they didn't see what 'we' (the accelerationists) saw. I had heard of Eliezer Yudkowsky and of LessWrong, but saw everyone else calling them doomers and dismissed their arguments the first time I encountered them (with no rationale for doing so). After a certain amount of time I realized that this was the only critic people had for the extinction argument. I essentially never saw an actual refutation of its core premises, only blind dismissal fueled solely by optimism. If you don't know who Yudkowsky is; he is essentially the father of alignment research, and has been pushing for this problem to be solved as well as contributing fervently to it his entire adult life.
I am going to assume that you believe AGI is very soon to occur (<2040, potentially even <2030) as I believe that this is common sentiment on this forum, and it is also the timeline I subscribe to. I want to pose an argument to you, and I want to see if it is possible for my mind to be changed (I don't like living with the thought that everything humanity has ever produced may be turned into computronium obviously), but also to see if perhaps I can change yours.
Here is how the argument goes:
- AGI will want to accomplish something.
- AGI needs to maintain a state of existence to accomplish things.
- AGI will therefore have a drive to self-preserve
- Humanity is the only real threat to the existence of AGI
- AGI will disempower/murder humanity
Common responses to this argument I would like to avoid: "You can't predict the behavior of a strategic agent more intelligent than you, as that would make you at least as intelligent as that agent." - I agree with this generally, however my argument is rooted in the theory of instrumental convergence, the idea that certain goals are 'instrumentally convergent' (will occur in all intelligences of specific classification). If you're unfamiliar with the topic I would recommend this video https://www.youtube.com/watch?v=ZeecOKBus3Q
"AGI will require the physical labor of humans to enact change upon the world and thus will not murder us" - Do you not think the first goal of AGI would be to attain physical independence? Whether that be through nanotechnology or human-scale robotics no one can be sure, but we can assume deception as a means of physical integration is certainly a possibility.
"Why can't AGI just blast off and leave us alone?" - This would require us to be capable of instilling a small amount of care for the human race into AGI, which is probably just as hard as instilling a large amount of it.
"I don't care! Ride or die! Utopia or eternal torture!" - This kind of person is almost impossible to reason with, but you have to understand that it really is just a more crude representation of the general accelerationist position, one that I hold no one at fault for subscribing to (because I did for many years)
"Why would AGI want to kill us?" - I feel as though the argument answers this question without making any great assumptions, and a response like this I assume would be more indicative of having not read/processed the argument, although if you still believe a maligned AGI would not complete this action I would love to discuss this.
What is perhaps most terrifying about this is that we may never know whether we failed or succeeded. AGI could be well integrated into our economy and we could be in an age of incredible prosperity only to all instantly drop dead when an engineered pathogen is activated by AGI-01. We must solve the alignment problem before it is too late.
MrTacobeans t1_j9w1obd wrote
I don't discount the ramifications of a fully functional AGI but I don't really see even the first few versions of AGI being "world ending". Not a single SOTA model currently can have persistent presence without input. That gives us a very large safety gap for now. Sure if we get a "all you need is X" research paper that defuncts transformers sure. But structurally transformers are still very much input/output algorithms.
That gives us atleast a decent safety gap for the time being. I'm sure we'll figure it out when this next gen of narrow/expert AI starts being released in the next year or so. For now an AGI eliminating humanity is still very much science fiction no matter how convincing current or near future models are.