Submitted by MistakeNotOk6203 t3_11b2iwk in singularity
Shiyayori t1_j9w39q1 wrote
If an AGI has a suitable ability to extrapolate the results of its actions into the future, and we use some kind of reinforcement learning to train it on numerous contexts it should avoid, the it’ll naturally develop an internal representation of all contexts it ought to avoid (which will only ever be as good as it’s training data and ability to generalise across it). Anyway, it’ll recognise that the results of its actions will naturally lead to a context that should be avoided.
I imagine this is similar to how humans do it, and it’s a lot more vague with us. We match our experience to our internal model of what’s wrong and create a metric to determine just how wrong it is, and then we compare that metric against our goals and make a decision based on weather or not we believe this risk metric is too high.
I think the problem might mostly be in finding a balance between solidifying it’s current moral beliefs vs keeping them liquid enough to change and optimise. Our brains are pretty similar in that they become more rigid over time, and stochastically decreasing techniques are often used in optimisation problems
The solution might be in having a corpus of agents developing their own models with a master model that compares each of their usefulness’ against the input data and their rates of stochastic decline.
Or maybe I’m just talking out my ass, who actually knows.
DadSnare t1_j9z79il wrote
Check out how machine learning and complex neural networks work if you haven’t already. They work similarly to the way you describe the moral limits, and a liquid “hidden layer” by using biased recalculations. It’s fascinating.
Viewing a single comment thread. View all comments