Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Joan Lasenby

LiteReality: Graphics-Ready 3D Scene Reconstruction from RGB-D Scans

Jul 03, 2025

Zhening Huang, Xiaoyang Wu, Fangcheng Zhong, Hengshuang Zhao, Matthias Nießner, Joan Lasenby

Abstract:We propose LiteReality, a novel pipeline that converts RGB-D scans of indoor environments into compact, realistic, and interactive 3D virtual replicas. LiteReality not only reconstructs scenes that visually resemble reality but also supports key features essential for graphics pipelines -- such as object individuality, articulation, high-quality physically based rendering materials, and physically based interaction. At its core, LiteReality first performs scene understanding and parses the results into a coherent 3D layout and objects with the help of a structured scene graph. It then reconstructs the scene by retrieving the most visually similar 3D artist-crafted models from a curated asset database. Next, the Material Painting module enhances realism by recovering high-quality, spatially varying materials. Finally, the reconstructed scene is integrated into a simulation engine with basic physical properties to enable interactive behavior. The resulting scenes are compact, editable, and fully compatible with standard graphics pipelines, making them suitable for applications in AR/VR, gaming, robotics, and digital twins. In addition, LiteReality introduces a training-free object retrieval module that achieves state-of-the-art similarity performance on the Scan2CAD benchmark, along with a robust material painting module capable of transferring appearances from images of any style to 3D assets -- even under severe misalignment, occlusion, and poor lighting. We demonstrate the effectiveness of LiteReality on both real-life scans and public datasets. Project page: https://litereality.github.io; Video: https://www.youtube.com/watch?v=ecK9m3LXg2c

* Project Page: https://litereality.github.io; Video: https://www.youtube.com/watch?v=ecK9m3LXg2c&feature=youtu.be

Via

Access Paper or Ask Questions

SmallGS: Gaussian Splatting-based Camera Pose Estimation for Small-Baseline Videos

Apr 22, 2025

Yuxin Yao, Yan Zhang, Zhening Huang, Joan Lasenby

Abstract:Dynamic videos with small baseline motions are ubiquitous in daily life, especially on social media. However, these videos present a challenge to existing pose estimation frameworks due to ambiguous features, drift accumulation, and insufficient triangulation constraints. Gaussian splatting, which maintains an explicit representation for scenes, provides a reliable novel view rasterization when the viewpoint change is small. Inspired by this, we propose SmallGS, a camera pose estimation framework that is specifically designed for small-baseline videos. SmallGS optimizes sequential camera poses using Gaussian splatting, which reconstructs the scene from the first frame in each video segment to provide a stable reference for the rest. The temporal consistency of Gaussian splatting within limited viewpoint differences reduced the requirement of sufficient depth variations in traditional camera pose estimation. We further incorporate pretrained robust visual features, e.g. DINOv2, into Gaussian splatting, where high-dimensional feature map rendering enhances the robustness of camera pose estimation. By freezing the Gaussian splatting and optimizing camera viewpoints based on rasterized features, SmallGS effectively learns camera poses without requiring explicit feature correspondences or strong parallax motion. We verify the effectiveness of SmallGS in small-baseline videos in TUM-Dynamics sequences, which achieves impressive accuracy in camera pose estimation compared to MonST3R and DORID-SLAM for small-baseline videos in dynamic scenes. Our project page is at: https://yuxinyao620.github.io/SmallGS

* 10 pages, 4 figures, Accepted by CVPR workshop

Via

Access Paper or Ask Questions

Continuous non-contact vital sign monitoring of neonates in intensive care units using RGB-D cameras

Dec 08, 2024

Silas Ruhrberg Estévez, Alex Grafton, Lynn Thomson, Joana Warnecke, Kathryn Beardsall, Joan Lasenby

Abstract:Neonates in intensive care require continuous monitoring. Current measurement devices are limited for long-term use due to the fragility of newborn skin and the interference of wires with medical care and parental interactions. Camera-based vital sign monitoring has the potential to address these limitations and has become of considerable interest in recent years due to the absence of physical contact between the recording equipment and the neonates, as well as the introduction of low-cost devices. We present a novel system to capture vital signs while offering clinical insights beyond current technologies using a single RGB-D camera. Heart rate and oxygen saturation were measured using colour and infrared signals with mean average errors (MAE) of 7.69 bpm and 3.37%, respectively. Using the depth signals, an MAE of 4.83 breaths per minute was achieved for respiratory rate. Tidal volume measurements were obtained with a MAE of 0.61 mL. Flow-volume loops can also be calculated from camera data, which have applications in respiratory disease diagnosis. Our system demonstrates promising capabilities for neonatal monitoring, augmenting current clinical recording techniques to potentially improve outcomes for neonates.

Via

Access Paper or Ask Questions

STAResNet: a Network in Spacetime Algebra to solve Maxwell's PDEs

Aug 24, 2024

Alberto Pepe, Sven Buchholz, Joan Lasenby

Abstract:We introduce STAResNet, a ResNet architecture in Spacetime Algebra (STA) to solve Maxwell's partial differential equations (PDEs). Recently, networks in Geometric Algebra (GA) have been demonstrated to be an asset for truly geometric machine learning. In \cite{brandstetter2022clifford}, GA networks have been employed for the first time to solve partial differential equations (PDEs), demonstrating an increased accuracy over real-valued networks. In this work we solve Maxwell's PDEs both in GA and STA employing the same ResNet architecture and dataset, to discuss the impact that the choice of the right algebra has on the accuracy of GA networks. Our study on STAResNet shows how the correct geometric embedding in Clifford Networks gives a mean square error (MSE), between ground truth and estimated fields, up to 2.6 times lower than than obtained with a standard Clifford ResNet with 6 times fewer trainable parameters. STAREsNet demonstrates consistently lower MSE and higher correlation regardless of scenario. The scenarios tested are: sampling period of the dataset; presence of obstacles with either seen or unseen configurations; the number of channels in the ResNet architecture; the number of rollout steps; whether the field is in 2D or 3D space. This demonstrates how choosing the right algebra in Clifford networks is a crucial factor for more compact, accurate, descriptive and better generalising pipelines.

Via

Access Paper or Ask Questions

OpenSUN3D: 1st Workshop Challenge on Open-Vocabulary 3D Scene Understanding

Feb 23, 2024

Francis Engelmann, Ayca Takmaz, Jonas Schult, Elisabetta Fedele, Johanna Wald, Songyou Peng, Xi Wang, Or Litany, Siyu Tang, Federico Tombari(+18 more)

Figure 1 for OpenSUN3D: 1st Workshop Challenge on Open-Vocabulary 3D Scene Understanding

Figure 2 for OpenSUN3D: 1st Workshop Challenge on Open-Vocabulary 3D Scene Understanding

Figure 3 for OpenSUN3D: 1st Workshop Challenge on Open-Vocabulary 3D Scene Understanding

Figure 4 for OpenSUN3D: 1st Workshop Challenge on Open-Vocabulary 3D Scene Understanding

Abstract:This report provides an overview of the challenge hosted at the OpenSUN3D Workshop on Open-Vocabulary 3D Scene Understanding held in conjunction with ICCV 2023. The goal of this workshop series is to provide a platform for exploration and discussion of open-vocabulary 3D scene understanding tasks, including but not limited to segmentation, detection and mapping. We provide an overview of the challenge hosted at the workshop, present the challenge dataset, the evaluation methodology, and brief descriptions of the winning methods. For additional details, please see https://opensun3d.github.io/index_iccv23.html.

* Our OpenSUN3D workshop website for ICCV 2023: https://opensun3d.github.io/index_iccv23.html

Via

Access Paper or Ask Questions

OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation

Sep 04, 2023

Zhening Huang, Xiaoyang Wu, Xi Chen, Hengshuang Zhao, Lei Zhu, Joan Lasenby

Abstract:Current 3D open-vocabulary scene understanding methods mostly utilize well-aligned 2D images as the bridge to learn 3D features with language. However, applying these approaches becomes challenging in scenarios where 2D images are absent. In this work, we introduce a completely new pipeline, namely, OpenIns3D, which requires no 2D image inputs, for 3D open-vocabulary scene understanding at the instance level. The OpenIns3D framework employs a "Mask-Snap-Lookup" scheme. The "Mask" module learns class-agnostic mask proposals in 3D point clouds. The "Snap" module generates synthetic scene-level images at multiple scales and leverages 2D vision language models to extract interesting objects. The "Lookup" module searches through the outcomes of "Snap" with the help of Mask2Pixel maps, which contain the precise correspondence between 3D masks and synthetic images, to assign category names to the proposed masks. This 2D input-free, easy-to-train, and flexible approach achieved state-of-the-art results on a wide range of indoor and outdoor datasets with a large margin. Furthermore, OpenIns3D allows for effortless switching of 2D detectors without re-training. When integrated with state-of-the-art 2D open-world models such as ODISE and GroundingDINO, superb results are observed on open-vocabulary instance segmentation. When integrated with LLM-powered 2D models like LISA, it demonstrates a remarkable capacity to process highly complex text queries, including those that require intricate reasoning and world knowledge. Project page: https://zheninghuang.github.io/OpenIns3D/

* 24 pages, 16 figures, 13 tables. Project page: https://zheninghuang.github.io/OpenIns3D/

Via

Access Paper or Ask Questions

CGA-PoseNet: Camera Pose Regression via a 1D-Up Approach to Conformal Geometric Algebra

Feb 10, 2023

Alberto Pepe, Joan Lasenby

Abstract:We introduce CGA-PoseNet, which uses the 1D-Up approach to Conformal Geometric Algebra (CGA) to represent rotations and translations with a single mathematical object, the motor, for camera pose regression. We do so starting from PoseNet, which successfully predicts camera poses from small datasets of RGB frames. State-of-the-art methods, however, require expensive tuning to balance the orientational and translational components of the camera pose.This is usually done through complex, ad-hoc loss function to be minimized, and in some cases also requires 3D points as well as images. Our approach has the advantage of unifying the camera position and orientation through the motor. Consequently, the network searches for a single object which lives in a well-behaved 4D space with a Euclidean signature. This means that we can address the case of image-only datasets and work efficiently with a simple loss function, namely the mean squared error (MSE) between the predicted and ground truth motors. We show that it is possible to achieve high accuracy camera pose regression with a significantly simpler problem formulation. This 1D-Up approach to CGA can be employed to overcome the dichotomy between translational and orientational components in camera pose regression in a compact and elegant way.

Via

Access Paper or Ask Questions

Sky-image-based solar forecasting using deep learning with multi-location data: training models locally, globally or via transfer learning?

Nov 03, 2022

Yuhao Nie, Quentin Paletta, Andea Scotta, Luis Martin Pomares, Guillaume Arbod, Sgouris Sgouridis, Joan Lasenby, Adam Brandt

Abstract:Solar forecasting from ground-based sky images using deep learning models has shown great promise in reducing the uncertainty in solar power generation. One of the biggest challenges for training deep learning models is the availability of labeled datasets. With more and more sky image datasets open sourced in recent years, the development of accurate and reliable solar forecasting methods has seen a huge growth in potential. In this study, we explore three different training strategies for deep-learning-based solar forecasting models by leveraging three heterogeneous datasets collected around the world with drastically different climate patterns. Specifically, we compare the performance of models trained individually based on local datasets (local models) and models trained jointly based on the fusion of multiple datasets from different locations (global models), and we further examine the knowledge transfer from pre-trained solar forecasting models to a new dataset of interest (transfer learning models). The results suggest that the local models work well when deployed locally, but significant errors are observed for the scale of the prediction when applied offsite. The global model can adapt well to individual locations, while the possible increase in training efforts need to be taken into account. Pre-training models on a large and diversified source dataset and transferring to a local target dataset generally achieves superior performance over the other two training strategies. Transfer learning brings the most benefits when there are limited local data. With 80% less training data, it can achieve 1% improvement over the local baseline model trained using the entire dataset. Therefore, we call on the efforts from the solar forecasting community to contribute to a global dataset containing a massive amount of imagery and displaying diversified samples with a range of sky conditions.

Via

Access Paper or Ask Questions

Evaluating Self-Supervised Learning for Molecular Graph Embeddings

Jun 16, 2022

Hanchen Wang, Jean Kaddour, Shengchao Liu, Jian Tang, Matt Kusner, Joan Lasenby, Qi Liu

Figure 1 for Evaluating Self-Supervised Learning for Molecular Graph Embeddings

Figure 2 for Evaluating Self-Supervised Learning for Molecular Graph Embeddings

Figure 3 for Evaluating Self-Supervised Learning for Molecular Graph Embeddings

Figure 4 for Evaluating Self-Supervised Learning for Molecular Graph Embeddings

Abstract:Graph Self-Supervised Learning (GSSL) paves the way for learning graph embeddings without expert annotation, which is particularly impactful for molecular graphs since the number of possible molecules is enormous and labels are expensive to obtain. However, by design, GSSL methods are not trained to perform well on one downstream task but aim for transferability to many, making evaluating them less straightforward. As a step toward obtaining profiles of molecular graph embeddings with diverse and interpretable attributes, we introduce Molecular Graph Representation Evaluation (MolGraphEval), a suite of probe tasks, categorised into (i) topological-, (ii) substructure-, and (iii) embedding space properties. By benchmarking existing GSSL methods on both existing downstream datasets and MolGraphEval, we discover surprising discrepancies between conclusions drawn from existing datasets alone versus more fine-grained probing, suggesting that current evaluation protocols do not provide the whole picture. Our modular, automated end-to-end GSSL pipeline code will be released upon acceptance, including standardised graph loading, experiment management, and embedding evaluation.

Via

Access Paper or Ask Questions

Omnivision forecasting: combining satellite observations with sky images for improved intra-hour solar energy predictions

Jun 07, 2022

Quentin Paletta, Guillaume Arbod, Joan Lasenby

Figure 1 for Omnivision forecasting: combining satellite observations with sky images for improved intra-hour solar energy predictions

Figure 2 for Omnivision forecasting: combining satellite observations with sky images for improved intra-hour solar energy predictions

Figure 3 for Omnivision forecasting: combining satellite observations with sky images for improved intra-hour solar energy predictions

Figure 4 for Omnivision forecasting: combining satellite observations with sky images for improved intra-hour solar energy predictions

Abstract:Integration of intermittent renewable energy sources into electric grids in large proportions is challenging. A well-established approach aimed at addressing this difficulty involves the anticipation of the upcoming energy supply variability to adapt the response of the grid. In solar energy, short-term changes in electricity production caused by occluding clouds can be predicted at different time scales from all-sky cameras (up to 30-min ahead) and satellite observations (up to 6h ahead). In this study, we integrate these two complementary points of view on the cloud cover in a single machine learning framework to improve intra-hour (up to 60-min ahead) irradiance forecasting. Both deterministic and probabilistic predictions are evaluated in different weather conditions (clear-sky, cloudy, overcast) and with different input configurations (sky images, satellite observations and/or past irradiance values). Our results show that the hybrid model benefits predictions in clear-sky conditions and improves longer-term forecasting. This study lays the groundwork for future novel approaches of combining sky images and satellite observations in a single learning framework to advance solar nowcasting.

* Submitted to Renewable Energy

Via

Access Paper or Ask Questions