A truly autonomous agent does not live in a decision process with a single task and fixed environment;
the agent does not receive a clear task or well-shaped reward signals. It is left to figure out how to
prepare for future unknown challenges, ideally in an open-ended learning fashion.
In this work we equip an RL agent with different abilities that support this self-organized learning
process and make it efficient. The goal is to have an agent that explores its environment and thereby
figures out how to solve a number of tasks that require it to manipulating different parts of the state
space. Once the agent learned all tasks sufficiently well, we can ask the agent to solve a certain task
by manipulating the corresponding part of the state space until a desired goal state is achieved. In the
end, the agent should be capable of controlling all controllable parts of its state.
The main abilities we equip the agent with are (compare with figure): (3) A task selector that allows
the agent to distribute its available resource budget among all possible tasks it could learn (1) such
that, at any given time, most of the resource budget is spent on tasks the agent can make the most
progress in. (4) A task planner that learns a potentially existing inter-dependency between tasks, i.e.
if one task can be solved faster or at all only if some other task is solved before. (5) A dependency
graph that the agent uses for planning of sub-task sequences that allows it to solve a final desired
task. (6) A sub-goal generator that generates for each sub-task in the plan a goal state that, if reached
by the agent, makes it easier to solve the next sub-task. All components are learned concurrently
from an intrinsic motivation signal (2) that is computed from the experience the agent collects while
autonomously interacting with the environment (7, 8).
The code, the poster presented at NeurIPS 2019 and a 3 min summary video can be found here.