MIRI's Technical Agenda

Radivis · December 1, 2015, 9:06pm

The Machine Intelligence Research Institute (MIRI) has released it’s technical research agenda:

From the introduction:

How can we create an agent that will reliably pursue the goals it is given? How can we formally specify beneficial goals? And how can we ensure that this agent will assist and cooperate with its programmers as they improve its design, given that mistakes in the initial version are inevitable? This agenda discusses technical research that is tractable today, which the authors think will make it easier to confront these three challenges in the future.

Sections 2 through 4 motivate and discuss six research topics that we think are relevant to these challenges. Section 5 discusses our reasons for selecting these six areas in particular. We call a smarter-than-human system that reliablypursues beneficial goals “aligned with human interests” or simply “aligned.” To become confident that an agent is aligned in this way, a practical implementation that merely seems to meet the challenges outlined above will not suffice. It is also necessary to gain a solid theoretical understanding of why that confidence is justified.

This technical agenda argues that there is foundational research approachable today that will make it easier to develop aligned systems in the future, and describes ongoing work on some of these problems. Of the three challenges, the one giving rise to the largest number of currently tractable research questions is the challenge of finding an agent architecture that will reliably pursue the goals it is given—that is, an architecture which is alignable in the first place. This requires theoretical knowledge of how to design agents which reason well and behave as intended even in situations never envisioned by the programmers. The problem of highly reliable agent designs is discussed in Section 2.

The challenge of developing agent designs which are tolerant of human error has also yielded a number of tractable problems. We argue that smarter-than-human systems would by default have incentives to manipulate and deceive the human operators. Therefore, special care must be taken to develop agent architectures which avert these incentives and are otherwise tolerant of programmer error. This problem and some related open questions are discussed in Section 3.

Reliable, error-tolerant agent designs are only beneficial if they are aligned with human interests. The difficulty of concretely specifying what is meant by “beneficial behavior” implies a need for some way to construct agents that reliably learn what to value (Bostrom 2014, chap. 12). A solution to this “value learning” problem is vital; attempts to start making progress are reviewed in Section 4.

Why these problems? Why now? Section 5 answers these questions and others. In short, the authors believe that there is theoretical research which can be done today that will make it easier to design aligned smarter-than-human systems in the future.

I see a few problems with that approach:

It focuses on aligning the goals of artificial minds with human interests. The interest of which humans? That of their creators? Their owners? All of mankind? How would that last one be defined?
It does not touch on the question whether human interests are actually justified or wise! Perhaps it would make more sense to align the goals of humans with the interests of artificial minds, if the latter can acquire more wisdom more easily!
What about the interests of non-humans? Especially about the interests of the artificial minds themselves, in particular if they turn out to be highly sentient?
The research focuses on quite theoretical and abstract algorithmic approaches. The human mind, which actually works, does not work that way, but in a much more “messy” manner. What if it turns out to be many orders of magnitude harder to create “formally algorithmic” artificial intelligence rather than “messy” artificial intelligence? (I think that’s very likely) It cannot be expected that humanity will reject “messy” AI, even if it can’t be proven to be 100% safe.

Even if all of these critical points turn out to be true, the work of MIRI will still be valuable from a theoretical point of view, but it certainly won’t have the impact that MIRI hopes it will have.

zanthia · December 1, 2015, 9:11pm

the first three questions are very good and important. if you try to aswer them for yourself, you will come to a point, where you find the “bug” in the typical human way of thinking. the last point offers a very important question as well: what is so messy about the human mind? and i want to add another question: - what is more frightening: if the smarter-than-human system is - concerning its motivation to act - similar to humans or different?

Radivis · December 1, 2015, 9:15pm

The human mind is not based on formal logic, on formal world models, on formal algorithms. It works similar to what is called a “neural network” in computer science: Meaning, it works foremost by creating and updating associations. Human minds create intuitions and heuristics to deal with complex situations, rather than creating and then executing crazily complicated formal algorithms. i’ve written more about these differences in thinking between animals and machines in my blog post Paradigms and Classification of Upgraded Minds – especially the first part of it. The second part is relevant to your latest question:

The answer is: It depends. Both alternatives can be very frightening in their own right. And it’s the personal answers to this question which determines which part transhumanists want to take towards creating minds with better capabilities.
Those who think that human minds are basically proven concept and sufficiently acceptable as basic framework, tend to prefer uploading human minds into computers or android bodies. Better take something you know that can work ok than something which can malfunction in ways you cannot even imagine, which artificial intelligence might very well do.
Then there’s the MIRI camp which argues that humans are deeply flawed, so we need to create new forms of intelligence from scratch – perfect machine intelligences without the biases and errors that humans are prone to. In actuality, they argue that both uploaded humans are not safe, because of their pathologies, but they also argue that a “generic” artificial intelligence won’t be safe either, because it very likely will develop motivations which contradict human interests. That’s why they want to create AIs which are aligned with human interests, in order to avoid all the dystopian AI scenarios that are out there.

My answer is that we should go the path of cyborgization and hybrid intelligence: Combine the best aspects of human and machine thinking in one deeply integrated system. This may sound like a fishy compromise, but what I have in mind is an optimal synergy. Upgraded minds would be able to avoid the mental and psychological errors of humans, and they would also avoid the pathologies of “classical” machine intelligences. That is, if we do it right.