From Spatial AI to Embodied AI: The Path to Autonomous Systems
Alessio Del Bue
Alessio Del Bue is the director of the PAVIS (Pattern Analysis and computer VISion) research unit of the Italian Institute of Technology (IIT) in Genova, Italy. Previously, he was a researcher in the Institute for Systems and Robotics at the Instituto Superior Técnico (IST) in Lisbon, Portugal. Before that, he obtained my Ph.D. under the supervision of Dr. Lourdes Agapito in the Department of Computer Science at Queen Mary University of London. His current research interests are related to 3D scene understanding from multi-modal input (images, depth, audio) to support the development of assistive Artificial Intelligence systems. He is co-author of more than 150 scientific publications, in refereed journals and international conferences, member of the technical committees of important computer vision conferences (CVPR, ICCV, ECCV, etc.), and he serves as an associate editor of Transaction of Image Processing (TIP) journal. Finally, Dr. Del Bue is an IEEE and ELLIS member in the recently formed Genoa unit.
This talk presents our research progression from Spatial AI, focused on environmental perception, to Embodied AI, where autonomous systems actively interact with and learn from their surroundings. With advancements in visual scene representation through techniques like novel view synthesis (e.g. NeRF and 3DGS), we now have the tools to create highly realistic and semantically rich replicas of our world. These detailed visual models can then be used to train and solve higher-level tasks necessary for autonomy, such as navigation, object detection and search – possibly self-learning these skills in dynamic environments. Towards autonomy, we will finally show how visual and multi-modal data enables the creation of Embodied AI models that inject autonomous behaviours into the fully humanoid robotic platforms at IIT.