ProTo: Program-Guided Transformer for Program-Guided Tasks

Programs, consisting of semantic and structural information, play animportant role in the communication between humans and agents. Towards learninggeneral program executors to unify perception, reasoning, and decision making,we formulate program-guided tasks which require learning to execute a givenprogram on the observed task specification. Furthermore, we propose theProgram-guided Transformer (ProTo), which integrates both semantic andstructural guidance of a program by leveraging cross-attention and maskedself-attention to pass messages between the specification and routines in theprogram. ProTo executes a program in a learned latent space and enjoys strongerrepresentation ability than previous neural-symbolic approaches. We demonstratethat ProTo significantly outperforms the previous state-of-the-art methods onGQA visual reasoning and 2D Minecraft policy learning datasets. Additionally,ProTo demonstrates better generalization to unseen, complex, and human-writtenprograms.