ProgPrompt: Generating Situated Robot Task Plans using Large Language Models

1University of Southern California, 2NVIDIA

Abstract

Task planning can require defining myriad domain knowledge about the world in which a robot needs to act. To ameliorate that effort, large language models (LLMs) can be used to score potential next actions during task planning, and even generate action sequences directly, given an instruction in natural language with no additional domain information. However, such methods either require enumerating all possible next steps for scoring, or generate free-form text that may contain actions not possible on a given robot in its current context. We present a programmatic LLM prompt structure that enables plan generation functional across situated environments, robot capabilities, and tasks. Our key insight is to prompt the LLM with program-like specifications of the available actions and objects in an environment, as well as with example programs that can be executed. We make concrete recommendations about prompt structure and generation constraints through ablation experiments, demonstrate state of the art success rates in VirtualHome household tasks, and deploy our method on a physical robot arm for tabletop tasks.

Video

ProgPrompt

We introduce a prompting method that goes beyond conditioning LLMs in natural language, utilizing programming language structures, leveraging the fact that LLMs are trained on several open-source codebases. ProgPrompt provides an LLM with a pythonic program header that imports available actions and their arguments, shows a list of environment objects, followed by multiple example task plans, formatted as pythonic functions. The function name is the task specification, and the function implementation is an example task plan. The plan consists of comments, actions, and assertions. We use comments to group multiple high-level actions together, similar to chain-of-thought reasoning. Actions are expressed as imported function calls. Assertions check for action pre-conditions, and trigger recovery actions. Finally we append an incomplete function definition for the LLM to complete. The plan is interpreted by executing actions in the env and asserting preconditions using the LLM.

Results

Real Robot Demo
Task: sort fruits on the plate and bottle in the box

VirtualHome Demo
Task: microwave salmon

VirtualHome Results

bring coffeepot and cupcake to the coffee table brush teeth eat chips on the sofa
make toast put salmon in the fridge throw away apple
turn off light wash the plate. watch tv

Generated Task Programs

BibTeX

@article{singh2022progprompt,
  title={{ProgPrompt}: Generating Situated Robot Task Plans using Large Language Models}, 
  author={Ishika Singh and Valts Blukis and Arsalan Mousavian and Ankit Goyal and Danfei Xu and Jonathan Tremblay and Dieter Fox and Jesse Thomason and Animesh Garg},
  year={2022},
  eprint={2209.11302},
  archivePrefix={arXiv},
  primaryClass={cs.RO}
}