Let me just elaborate on the ‘complex motivations’ idea, because I certainly thi...

Let me just elaborate on the ‘complex motivations’ idea, because I certainly think that ‘orthogonality’ is the weak point in the AGI doomsday story.

Orthogonality is defined by Bostrom as the postulate that a super-intelligence can have nearly any arbitrary goals. Here is a short argument as to why ‘orthogonality’ may be false:

In so far as an AGI has a precisely defined goal, it is likely that the AGI cannot be super-intelligent. The reason is because there’s always a certain irreducible amount of fuzziness or ambiguity in the definition of some types of concepts (‘non-trivial’ concepts associated with values don’t have necessary definitions). Let us call these concepts, fuzzy concepts (or f-concepts).

Now imagine that you are trying to define the goals that will let you specify that you want an AGI to doprecisely, but it turns out that for certain goals there’s an unavoidable trade-off: trying to increase the precision of the definitions reduces the cognitive power of the AGI. It’s because non-trivial goals need the aforementioned ‘f-concepts’, and you can’t define these precisely without over simplifying them.

The only way to deal with f-concepts is by using a ‘concept cloud’ – instead of a single crisp definition, you would need to have a ‘cloud’ or ‘cluster’ of multiple slightly different definitions, and it’s the totality of all these that specifies the goals of the AGI.

So for example, such f-concepts (f) would need a whole set of slightly differing definitions (d):

F= (d1, d2, d3, d4, d5, d6…)

But now the AGI needs a way to integrate all the slightly conflicting definitions into a single coherent set. Let us designate the methods that do this as <integration-methods>.

But finding better <integration methods> is an instrumental goal (needed for whatever other goals the AGI must have). So unavoidably, extra goals must emerge to handle these f-concepts, in addition to whatever original goals the programmer was trying to specify. And if these ‘extra’ goals conflict too badly with the original ones, then the AGI will be cognitively handicapped.

This falsifies orthogonality: f-concepts can only be handled via the emergence of additional goals to perform the internal conflict-resolution procedures that integrate multiple differing definitions of goals in a ‘concept-cloud’.

In so far as an AGI has goals that can be precisely specified, orthogonality is trivially true, but such an AGI probably can’t become super-intelligent. It’s cognitively handicapped.

In so far as an AGI has fuzzy goals, it can become super-intelligent, but orthogonality is likely falsified, because ‘extra’ goals need to emerge to handle ‘conflict resolution’ and integration of multiple differing definitions in the concept cloud.

All of this just confirms that goal-drift of our future descendants is unavoidable. The irony is that this is the very reason why ‘orthogonality’ may be false.