Viewing a single comment thread. View all comments

MetaAI_Official OP t1_izfpggs wrote

CICERO always tries to maximize its own score. However, there is a regularizer that penalizes it for deviating from a human-like policy. When all actions have the same expected value (e.g., when it's guaranteed to lose no matter what) then it will just try to play in a human-like way, which may involve retaliating against those that attacked it. -NB

3