Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dan Casas

Learning Extremely High Density Crowds as Active Matters

Mar 15, 2025

Feixiang He, Jiangbei Yue, Jialin Zhu, Armin Seyfried, Dan Casas, Julien Pettré, He Wang

Abstract:Video-based high-density crowd analysis and prediction has been a long-standing topic in computer vision. It is notoriously difficult due to, but not limited to, the lack of high-quality data and complex crowd dynamics. Consequently, it has been relatively under studied. In this paper, we propose a new approach that aims to learn from in-the-wild videos, often with low quality where it is difficult to track individuals or count heads. The key novelty is a new physics prior to model crowd dynamics. We model high-density crowds as active matter, a continumm with active particles subject to stochastic forces, named 'crowd material'. Our physics model is combined with neural networks, resulting in a neural stochastic differential equation system which can mimic the complex crowd dynamics. Due to the lack of similar research, we adapt a range of existing methods which are close to ours for comparison. Through exhaustive evaluation, we show our model outperforms existing methods in analyzing and forecasting extremely high-density crowds. Furthermore, since our model is a continuous-time physics model, it can be used for simulation and analysis, providing strong interpretability. This is categorically different from most deep learning methods, which are discrete-time models and black-boxes.

* Accepted by CVPR 2025

Via

Access Paper or Ask Questions

TexTile: A Differentiable Metric for Texture Tileability

Mar 19, 2024

Carlos Rodriguez-Pardo, Dan Casas, Elena Garces, Jorge Lopez-Moreno

Abstract:We introduce TexTile, a novel differentiable metric to quantify the degree upon which a texture image can be concatenated with itself without introducing repeating artifacts (i.e., the tileability). Existing methods for tileable texture synthesis focus on general texture quality, but lack explicit analysis of the intrinsic repeatability properties of a texture. In contrast, our TexTile metric effectively evaluates the tileable properties of a texture, opening the door to more informed synthesis and analysis of tileable textures. Under the hood, TexTile is formulated as a binary classifier carefully built from a large dataset of textures of different styles, semantics, regularities, and human annotations.Key to our method is a set of architectural modifications to baseline pre-train image classifiers to overcome their shortcomings at measuring tileability, along with a custom data augmentation and training regime aimed at increasing robustness and accuracy. We demonstrate that TexTile can be plugged into different state-of-the-art texture synthesis methods, including diffusion-based strategies, and generate tileable textures while keeping or even improving the overall texture quality. Furthermore, we show that TexTile can objectively evaluate any tileable texture synthesis method, whereas the current mix of existing metrics produces uncorrelated scores which heavily hinders progress in the field.

* CVPR 2024. Project page: https://mslab.es/projects/TexTile/

Via

Access Paper or Ask Questions

SMPLitex: A Generative Model and Dataset for 3D Human Texture Estimation from Single Image

Sep 07, 2023

Dan Casas, Marc Comino-Trinidad

Abstract:We propose SMPLitex, a method for estimating and manipulating the complete 3D appearance of humans captured from a single image. SMPLitex builds upon the recently proposed generative models for 2D images, and extends their use to the 3D domain through pixel-to-surface correspondences computed on the input image. To this end, we first train a generative model for complete 3D human appearance, and then fit it into the input image by conditioning the generative model to the visible parts of the subject. Furthermore, we propose a new dataset of high-quality human textures built by sampling SMPLitex conditioned on subject descriptions and images. We quantitatively and qualitatively evaluate our method in 3 publicly available datasets, demonstrating that SMPLitex significantly outperforms existing methods for human texture estimation while allowing for a wider variety of tasks such as editing, synthesis, and manipulation

* Accepted at BMVC 2023. Project website: https://dancasas.github.io/projects/SMPLitex

Via

Access Paper or Ask Questions

How Will It Drape Like? Capturing Fabric Mechanics from Depth Images

Apr 13, 2023

Carlos Rodriguez-Pardo, Melania Prieto-Martin, Dan Casas, Elena Garces

Abstract:We propose a method to estimate the mechanical parameters of fabrics using a casual capture setup with a depth camera. Our approach enables to create mechanically-correct digital representations of real-world textile materials, which is a fundamental step for many interactive design and engineering applications. As opposed to existing capture methods, which typically require expensive setups, video sequences, or manual intervention, our solution can capture at scale, is agnostic to the optical appearance of the textile, and facilitates fabric arrangement by non-expert operators. To this end, we propose a sim-to-real strategy to train a learning-based framework that can take as input one or multiple images and outputs a full set of mechanical parameters. Thanks to carefully designed data augmentation and transfer learning protocols, our solution generalizes to real images despite being trained only on synthetic data, hence successfully closing the sim-to-real loop.Key in our work is to demonstrate that evaluating the regression accuracy based on the similarity at parameter space leads to an inaccurate distances that do not match the human perception. To overcome this, we propose a novel metric for fabric drape similarity that operates on the image domain instead on the parameter space, allowing us to evaluate our estimation within the context of a similarity rank. We show that out metric correlates with human judgments about the perception of drape similarity, and that our model predictions produce perceptually accurate results compared to the ground truth parameters.

* 12 pages, 12 figures. Accepted to EUROGRAPHICS 2023. Project website: https://carlosrodriguezpardo.es/projects/MechFromDepth/

Via

Access Paper or Ask Questions

PERGAMO: Personalized 3D Garments from Monocular Video

Oct 26, 2022

Andrés Casado-Elvira, Marc Comino Trinidad, Dan Casas

Abstract:Clothing plays a fundamental role in digital humans. Current approaches to animate 3D garments are mostly based on realistic physics simulation, however, they typically suffer from two main issues: high computational run-time cost, which hinders their development; and simulation-to-real gap, which impedes the synthesis of specific real-world cloth samples. To circumvent both issues we propose PERGAMO, a data-driven approach to learn a deformable model for 3D garments from monocular images. To this end, we first introduce a novel method to reconstruct the 3D geometry of garments from a single image, and use it to build a dataset of clothing from monocular videos. We use these 3D reconstructions to train a regression model that accurately predicts how the garment deforms as a function of the underlying body pose. We show that our method is capable of producing garment animations that match the real-world behaviour, and generalizes to unseen body motions extracted from motion capture dataset.

* Published at Computer Graphics Forum (Proc. of ACM/SIGGRAPH SCA), 2022. Project website http://mslab.es/projects/PERGAMO/

Via

Access Paper or Ask Questions

HandFlow: Quantifying View-Dependent 3D Ambiguity in Two-Hand Reconstruction with Normalizing Flow

Oct 04, 2022

Jiayi Wang, Diogo Luvizon, Franziska Mueller, Florian Bernard, Adam Kortylewski, Dan Casas, Christian Theobalt

Figure 1 for HandFlow: Quantifying View-Dependent 3D Ambiguity in Two-Hand Reconstruction with Normalizing Flow

Figure 2 for HandFlow: Quantifying View-Dependent 3D Ambiguity in Two-Hand Reconstruction with Normalizing Flow

Figure 3 for HandFlow: Quantifying View-Dependent 3D Ambiguity in Two-Hand Reconstruction with Normalizing Flow

Figure 4 for HandFlow: Quantifying View-Dependent 3D Ambiguity in Two-Hand Reconstruction with Normalizing Flow

Abstract:Reconstructing two-hand interactions from a single image is a challenging problem due to ambiguities that stem from projective geometry and heavy occlusions. Existing methods are designed to estimate only a single pose, despite the fact that there exist other valid reconstructions that fit the image evidence equally well. In this paper we propose to address this issue by explicitly modeling the distribution of plausible reconstructions in a conditional normalizing flow framework. This allows us to directly supervise the posterior distribution through a novel determinant magnitude regularization, which is key to varied 3D hand pose samples that project well into the input image. We also demonstrate that metrics commonly used to assess reconstruction quality are insufficient to evaluate pose predictions under such severe ambiguity. To address this, we release the first dataset with multiple plausible annotations per image called MultiHands. The additional annotations enable us to evaluate the estimated distribution using the maximum mean discrepancy metric. Through this, we demonstrate the quality of our probabilistic reconstruction and show that explicit ambiguity modeling is better-suited for this challenging problem.

* VMV 2022 - Symposium on Vision, Modeling, and Visualization

Via

Access Paper or Ask Questions

SNUG: Self-Supervised Neural Dynamic Garments

Apr 05, 2022

Igor Santesteban, Miguel A. Otaduy, Dan Casas

Figure 1 for SNUG: Self-Supervised Neural Dynamic Garments

Figure 2 for SNUG: Self-Supervised Neural Dynamic Garments

Figure 3 for SNUG: Self-Supervised Neural Dynamic Garments

Figure 4 for SNUG: Self-Supervised Neural Dynamic Garments

Abstract:We present a self-supervised method to learn dynamic 3D deformations of garments worn by parametric human bodies. State-of-the-art data-driven approaches to model 3D garment deformations are trained using supervised strategies that require large datasets, usually obtained by expensive physics-based simulation methods or professional multi-camera capture setups. In contrast, we propose a new training scheme that removes the need for ground-truth samples, enabling self-supervised training of dynamic 3D garment deformations. Our key contribution is to realize that physics-based deformation models, traditionally solved in a frame-by-frame basis by implicit integrators, can be recasted as an optimization problem. We leverage such optimization-based scheme to formulate a set of physics-based loss terms that can be used to train neural networks without precomputing ground-truth data. This allows us to learn models for interactive garments, including dynamic deformations and fine wrinkles, with two orders of magnitude speed up in training time compared to state-of-the-art supervised methods

* CVPR 2022 (Oral). Project website: http://mslab.es/projects/SNUG/

Via

Access Paper or Ask Questions

A Survey on Intrinsic Images: Delving Deep Into Lambert and Beyond

Dec 07, 2021

Elena Garces, Carlos Rodriguez-Pardo, Dan Casas, Jorge Lopez-Moreno

Figure 1 for A Survey on Intrinsic Images: Delving Deep Into Lambert and Beyond

Figure 2 for A Survey on Intrinsic Images: Delving Deep Into Lambert and Beyond

Figure 3 for A Survey on Intrinsic Images: Delving Deep Into Lambert and Beyond

Figure 4 for A Survey on Intrinsic Images: Delving Deep Into Lambert and Beyond

Abstract:Intrinsic imaging or intrinsic image decomposition has traditionally been described as the problem of decomposing an image into two layers: a reflectance, the albedo invariant color of the material; and a shading, produced by the interaction between light and geometry. Deep learning techniques have been broadly applied in recent years to increase the accuracy of those separations. In this survey, we overview those results in context of well-known intrinsic image data sets and relevant metrics used in the literature, discussing their suitability to predict a desirable intrinsic image decomposition. Although the Lambertian assumption is still a foundational basis for many methods, we show that there is increasing awareness on the potential of more sophisticated physically-principled components of the image formation process, that is, optically accurate material models and geometry, and more complete inverse light transport estimations. We classify these methods in terms of the type of decomposition, considering the priors and models used, as well as the learning architecture and methodology driving the decomposition process. We also provide insights about future directions for research, given the recent advances in neural, inverse and differentiable rendering techniques.

* Accepted at International Journal of Computer Vision (to appear in 2022) http://www.elenagarces.es/projects/SurveyIntrinsicImages/

Via

Access Paper or Ask Questions

RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB Video

Jun 22, 2021

Jiayi Wang, Franziska Mueller, Florian Bernard, Suzanne Sorli, Oleksandr Sotnychenko, Neng Qian, Miguel A. Otaduy, Dan Casas, Christian Theobalt

Figure 1 for RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB Video

Figure 2 for RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB Video

Figure 3 for RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB Video

Figure 4 for RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB Video

Abstract:Tracking and reconstructing the 3D pose and geometry of two hands in interaction is a challenging problem that has a high relevance for several human-computer interaction applications, including AR/VR, robotics, or sign language recognition. Existing works are either limited to simpler tracking settings (e.g., considering only a single hand or two spatially separated hands), or rely on less ubiquitous sensors, such as depth cameras. In contrast, in this work we present the first real-time method for motion capture of skeletal pose and 3D surface geometry of hands from a single RGB camera that explicitly considers close interactions. In order to address the inherent depth ambiguities in RGB data, we propose a novel multi-task CNN that regresses multiple complementary pieces of information, including segmentation, dense matchings to a 3D hand model, and 2D keypoint positions, together with newly proposed intra-hand relative depth and inter-hand distance maps. These predictions are subsequently used in a generative model fitting framework in order to estimate pose and shape parameters of a 3D hand model for both hands. We experimentally verify the individual components of our RGB two-hand tracking and 3D reconstruction pipeline through an extensive ablation study. Moreover, we demonstrate that our approach offers previously unseen two-hand tracking performance from RGB, and quantitatively and qualitatively outperforms existing RGB-based methods that were not explicitly designed for two-hand interactions. Moreover, our method even performs on-par with depth-based real-time methods.

* ACM Transactions on Graphics (TOG) 39 (6), 1-16, 2020
* SIGGRAPH Asia 2020

Via

Access Paper or Ask Questions

Real-time Pose and Shape Reconstruction of Two Interacting Hands With a Single Depth Camera

Jun 15, 2021

Franziska Mueller, Micah Davis, Florian Bernard, Oleksandr Sotnychenko, Mickeal Verschoor, Miguel A. Otaduy, Dan Casas, Christian Theobalt

Figure 1 for Real-time Pose and Shape Reconstruction of Two Interacting Hands With a Single Depth Camera

Figure 2 for Real-time Pose and Shape Reconstruction of Two Interacting Hands With a Single Depth Camera

Figure 3 for Real-time Pose and Shape Reconstruction of Two Interacting Hands With a Single Depth Camera

Figure 4 for Real-time Pose and Shape Reconstruction of Two Interacting Hands With a Single Depth Camera

Abstract:We present a novel method for real-time pose and shape reconstruction of two strongly interacting hands. Our approach is the first two-hand tracking solution that combines an extensive list of favorable properties, namely it is marker-less, uses a single consumer-level depth camera, runs in real time, handles inter- and intra-hand collisions, and automatically adjusts to the user's hand shape. In order to achieve this, we embed a recent parametric hand pose and shape model and a dense correspondence predictor based on a deep neural network into a suitable energy minimization framework. For training the correspondence prediction network, we synthesize a two-hand dataset based on physical simulations that includes both hand pose and shape annotations while at the same time avoiding inter-hand penetrations. To achieve real-time rates, we phrase the model fitting in terms of a nonlinear least-squares problem so that the energy can be optimized based on a highly efficient GPU-based Gauss-Newton optimizer. We show state-of-the-art results in scenes that exceed the complexity level demonstrated by previous work, including tight two-hand grasps, significant inter-hand occlusions, and gesture interaction.

* ACM Transactions on Graphics (Proceedings SIGGRAPH 2019)

Via

Access Paper or Ask Questions