Desiderata for a Model of Human Values
Soares (2015) defines the value learning problem as: By what methods could an intelligent machine be constructed to reliably learn what to value and to act as its operators intended? There have been a few attempts to formalize this question. Dewey (2011) started from the notion of building an AI that maximized a given utility function, and then moved on to suggest that a value learner should exhibit uncertainty over utility functions and then take “the action with the highest expected value, calculated by a weighted average over the agent’s pool of possible utility functions.” This is a reasonable starting point, but a very general one: in particular, it gives us no criteria by which we or the AI could judge the correctness of a utility function which it is considering. To improve on Dewey’s definition, we would…
Link to Full Article: Desiderata for a Model of Human Values