An AI Safety Threat from Learned Planning Models


Historically, planning problems have often been constructed by hand, with the domain model and the goal developed together, leading to the model and goal being in harmony in the sense that the goal describes exactly which parts of the modelled state were desired to be changed (and not changed) as a consequence of the execution of the plan. With models learned from data, human goal specifiers may not know all the aspects of the model, nor have spent much time thinking about the real world situation that is being modelled. Also, naive users may expect the goals they specify to be interpreted in a commonsensical way by the automated planning system. These things may lead human goal specifiers to more often create incomplete goal specifications, failing to take into account all the different ways the environment can be changed– the potential side effects of plans. This could threaten safety. However, learned models may in some cases also have the feature of having detailed state representations, affording the opportunity for symbolic planning algorithms to recognize side effects that their human users did not think of, and to help avoid them. We propose in this position paper that researchers in symbolic planning should take up the challenge of developing planning algorithms that can safely deal with underspecified objectives– i.e., with problem goals that fail to specify everything that people want.

ICAPS Workshop on Reliable Data-Driven Planning and Scheduling