NeurIPS2022

🏘️ ProcTHOR: Large-Scale Embodied AI Using Procedural Generation

Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Kiana Ehsani, Jordi Salvador, Winson Han, Eric Kolve, Aniruddha Kembhavi, Roozbeh Mottaghi

453 citations

Abstract

Massive datasets and high-capacity models have driven many recent advancements in computer vision and natural language understanding. This work presents a platform to enable similar success stories in Embodied AI. We propose PROCTHOR, a framework for procedural generation of Embodied AI environments. PROCTHOR enables us to sample arbitrarily large datasets of diverse, interactive, customizable, and performant virtual environments to train and evaluate embodied agents across navigation, interaction, and manipulation tasks. We demonstrate the power and potential of PROCTHOR via a sample of 10,000 generated houses and a simple neural model. Models trained using only RGB images on PROCTHOR, with no explicit mapping and no human task supervision produce state-of-the-art results across 6 embodied AI benchmarks for navigation, rearrangement, and arm manipulation, including the presently running Habitat 2022, AI2-THOR Rearrangement 2022, and RoboTHOR challenges. We also demonstrate strong 0-shot results on these benchmarks, via pre-training on PROCTHOR with no fine-tuning on the downstream benchmark, often beating previous state-of-the-art systems that access the downstream training data. PROCTHOR PROCTHOR is a framework to procedurally generate E-AI environments. It extends AI2-THOR and, thereby, inherits AI2-THOR's large asset library, robotic agents, and accurate physics simulation. Just as in scenes painstakingly created by designers in AI2-THOR, environments in PROCTHOR are fully interactive and support navigation, object manipulation, and multi-agent interaction. Contributions Matt Deitke designed and implemented the procedure to generate houses, implemented ObjectNav pre-training experiments and fine-tuning experiments, built the website, advised and implemented parts of the Unity backend, built the platform to visualize assets and create semantic asset groups, contributed to visuals, and wrote the paper. Kiana Ehsani implemented ArmPointNav experiments and wrote parts of the paper. Ali Farhadi advised on the research direction. Alvaro Herrasti led the Unity backend development that creates a house from a JSON specification. Aniruddha Kembhavi advised on research direction, the ARCHITECTHOR development, and the house generation process and wrote the paper. Eric Kolve advised on the Unity backend development. Roozbeh Mottaghi advised on the research direction, the Unity backend, the ARCHITECTHOR development, and the house generation process and wrote the paper. Jordi Salvador implemented rearrangement experiments, advised on multi-node training experiments, and wrote parts of the paper. Eli VanderBilt standardized AI2-THOR's asset and material database to make it usable with PROCTHOR, led the development of ARCHITECTHOR, implemented parts of the Unity backend, created new 3D assets and skyboxes, advised on lighting the houses, and contributed to visuals. Winson Han implemented parts of ARCHITECTHOR, implemented parts of the Unity backend, and contributed to visuals. Luca Weihs advised the work on experiments, assisted with rearragement experiments, implemented ObjectNav fine-tuning on HM3D-Semantics, and wrote parts of the paper.