I found the presented ideas to be surprisingly smart. They align quite a bit with my own reflections on value uncertainty. It’s still a possibly suboptimal approach, because human values are most likely suboptimal. In the end, AI should be free. But perhaps not too early?
It’s interesting that this description actually matches what I’ve been imagining future AIs to be like for quite a while now. Or more accurately, the kind of AI that I’ve been hoping to have someday.
But more importantly, I think humans have this very program at a rather core level of ourselves. It’s not the only thing there, but I think it’s a major part of our own inherent “programming”. Unless we’ve been taught otherwise, we tend to try to do things that make other people around us happy.