MaLGa Colloquia: Cross-modal generation and understanding of multimodal content
Title
MaLGa Colloquia: Cross-modal generation and understanding of multimodal content
Speaker
Niculae Sebe - Università di Trento
Abstract
In the first part of the presentation, we will present our work on video generation without annotations or prior object-specific information. Trained on videos of similar objects (e.g. faces, bodies), our method generalizes across the category. Building on this, we introduce a Learnable Game Engine (LGE), trained from monocular annotated videos, that maintains scene and object states and renders environments from controllable viewpoints. Like a game engine, it simulates physics and logic, allowing users to control the game play or use a director mode to guide agents via high-level language and goals, enabled by learned game AI. The second part will investigate the safety and fairness of the current generative models. While most of the existing research focuses on detecting closed sets of biases defined a priori, we tackle the challenge of open-set bias detection in text-to-image generative models. For this we proposed OpenBias, a new pipeline that agnostically identifies and quantifies the severity of biases without access to any precompiled set. We study the behavior of Stable Diffusion 1.5, 2, and XL emphasizing new biases, never investigated before. Via quantitative experiments, we demonstrate that OpenBias agrees with current closed-set bias detection methods and human judgement.
Bio
Nicu Sebe is a professor in the University of Trento, Italy, where he is leading the research in the areas of multimedia information retrieval and human-computer interaction in computer vision applications. He received his PhD from the University of Leiden, The Netherlands and has been in the past with the University of Amsterdam, The Netherlands and the University of Illinois at Urbana-Champaign, USA. He was involved in the organization of the major conferences and workshops addressing the computer vision and human-centered aspects of multimedia information retrieval, among which as a General Co-Chair of the IEEE Automatic Face and Gesture Recognition Conference, FG 2008, ACM International Conference on Multimedia Retrieval (ICMR) 2017 and ACM Multimedia 2013. He was a program chair of ACM Multimedia 2011 and 2007, ECCV 2016, ICCV 2017, ICPR 2020 and a general chair of ACM Multimedia 2022. He is a fellow of ELLIS, IAPR and a Senior member of ACM and IEEE. He is the co-editor in chief of Computer Vision and Image Understanding journal.
When
Monday May 19th, 16:00
Where
Room 322, UniGe DIBRIS/DIMA, Via Dodecaneso 35