Automatic 3D reconstruction from images
This research focuses on bringing new methodologies to reconstruct the 3D geometry of the urban environment from images. Recent development in related fields has shown optimistic and promising results that it is possible to obtain more complete point clouds and 3D models using learning-based methods. The goal of this research is to build a robust 3D reconstruction pipeline using modern AI-based data-driven methods.
Scene understanding from 3D point clouds
This research focuses on two fundamental tasks in 3D scene understanding: semantic segmentation and instance segmentation. Both tasks serve as crucial prerequisites for modelling and analysing the urban environment. The former task assigns a class label (i.e., building, vegetation, water, road) to each point, while the latter aims at detecting and segmenting 3D object instances. The aim of the project is to achieve a generic framework for 3D scene understanding by incorporating human knowledge into learning-based approaches.
Machine learning for 3D visual localisation
Determining the location where 2D images and other sensor measurement were captured, relative to each other or some existing map, is a key task for scene reconstruction and outdoor vehicle self-localisation. This project aims to improve the localisation accuracy using data-driven representation learning, for example for Visual Place Recognition and pose regression. We seek to improve robustness against different sensing conditions, varying outdoor conditions (day, night, weather), and robustness against perceptual aliasing (i.e., different locations with a similar visual appearance) and small viewpoint variations.
Multi-modal self-supervised learning
This project aims to make environment perception available and robust for many different sensing modalities, such as vision, lidar, but also audio or radar. Heterogenous sensor modalities are often available in a self-driving application, or when combining distinct geomatic resources. We seek to exploit the strength of each sensor, the availability of rich annotated data for only some sensors (e.g., vision), and the 3D geometric and temporal constraints between sensor measurements. A key change is to develop self-supervised learning for multi-modal data to facilitate model optimization even when few data annotations available in certain sensing modalities.