MaLGa logoMaLGa black extendedMaLGa white extendedUniGe ¦ MaLGaUniGe ¦ MaLGaUniversita di Genova | MaLGaUniversita di Genova

Cognitively inspired motion understanding

dgt book

We are interested in the strong interplay between visual computation and cognitive science. We apply our research in particular to HRI.


  • Francesca Odone - DIBRIS, Università di Genova

  • Nicoletta Noceti - DIBRIS, Università di Genova

  • Vito Paolo Pastore - DIBRIS, Università di Genova

The MoCa project

The goal of the project is to acquire and maintain a multi-modal multi-view dataset in which we collect MoCap data and video sequences from multiple views of upper body actions in a cooking scenario. The acquisition has the specific purpose of investigating view-invariant action properties in both biological and artificial systems. Beside addressing classical action recognition tasks, the dataset enables research on different nuances of action understanding, from the segmentation of action primitives robust across different sensors and viewpoints, to the detection of actions categories depending on their dynamic evolution or the goal.

Collaboration with


E Nicora, G Goyal, N Noceti, A Vignolo, A Sciutti, F Odone “The MoCA dataset, kinematic and multi-view visual streams of fine-grained cooking actions”, Scientific Data 7 (1), 1-15, 2021

The MoCA dataset webpage

Methods: Efficient approaches for motion detection and representation

We investigate the use of efficient approaches for salient motion detection guided by psychological studies of human motion perception. We explore the use of motion features that well represent known regularities of biological motion to derive a motion detection approach robust to scene variability and occlusion.

Funded by AFOSR, project “Cognitively-inspired architectures for human motion understanding”, grant n. FA8655-20-1-7035.

Collaboration with


A Vignolo, N Noceti, F Rea, A Sciutti, F Odone, G Sandini “Detecting biological motion for human–robot interaction: A link between perception and action” Frontiers in Robotics and AI 4, 14, 2017

E. Nicora, N.Noceti. ​​On the use of efficient projection kernels for motion-based visual saliency estimation. Frontiers in Computer Science

Methods: Space-time hierarchical motion representations

We explore the use of hierarchical approaches for motion representation and understanding, where motion primitives, actions and activities can be obtained with compositional rules. We investigate the effect of using 2D and 3D information, where the latter can be used as weak supervision during training.

Funded by AFOSR, project “Cognitively-inspired architectures for human motion understanding”, grant n. FA8655-20-1-7035.

Collaboration with


Learning dictionaries of kinematic primitives for action classification, A Vignolo, N Noceti, A Sciutti, F Odone, G Sandini 2020 25th International Conference on Pattern Recognition (ICPR), 5965-5972

Nair, Vipul, et al. “Action similarity judgment based on kinematic primitives.” 2020 Joint IEEE 10th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob). IEEE, 2020.

Methods: Cross-view action recognition

Cross-view action recognition refers to the task of recognizing actions observed from view-points that are unfamiliar to the system. It is a natural task for humans, while it is a well known challenge for computer vision algorithms, which have to deal with signal variations in geometry and overall appearance. Inspired by the abilities of human perception, we explore the appropriateness of deep learning approaches to implicitly learn view-invariant features for the specific task of action recognition. To address the complexity of the problem, state of the art methods often rely on large-scale datasets, where the variability of viewpoints is appropriately represented. However, this comes to a significant price, in terms of computational power, time, costs, energy for both gathering data annotation and training the model. We rely on domain adaptation strategies to efficiently generalise action recognition to unseen viewpoints, focusing on small-scale datasets and working in a limited resources regime.


G Goyal, N Noceti, F Odone, “Cross-view action recognition with small-scale datasets”, Image and Vision Computing, 2022

G Goyal, N Noceti, F Odone, “Single view learning in action recognition”, 2020 25th International Conference on Pattern Recognition (ICPR), 3690-3697

Applications: Video-based imitation learning

We investigate the use of imitation learning strategies purely based on RGB videos in the context of Human-Robot Interaction. In our work we explore the use of synthetic data as a rich source of information from which extracting knowledge that can be transferred in real and more challenging conditions. Indeed, in our work we consider in particular the two challenges of the sim-to-real and the embodiment mismatch. We make use of Generative Adversarial Learning to learn a visual representation of the scene and the action from the robot ego-view point, starting from the third person view. Also, we derive an embedded action representation able to capture not only the dynamic, but also more subtle properties as the effect of objects properties (as weight and fragility) on the action itself.

Collaboration with

  • Alessandra Sciutti and Francesco Rea - IIT Contact Unit Genova

  • Fulvio Mastrogiovanni - DIBRIS, UNIGE

  • Yiannis Demiris - Imperial College London


Property-aware robot object manipulation: a generative approach, L Garello, L Lastrico, F Rea, F Mastrogiovanni, N Noceti, A Sciutti, 2021 IEEE International Conference on Development and Learning (ICDL), 1-7