Abstract:Patch deformation-based methods have recently exhibited substantial effectiveness in multi-view stereo, due to the incorporation of deformable and expandable perception to reconstruct textureless areas. However, such approaches typically focus on exploring correlative reliable pixels to alleviate match ambiguity during patch deformation, but ignore the deformation instability caused by mistaken edge-skipping and visibility occlusion, leading to potential estimation deviation. To remedy the above issues, we propose DVP-MVS, which innovatively synergizes depth-edge aligned and cross-view prior for robust and visibility-aware patch deformation. Specifically, to avoid unexpected edge-skipping, we first utilize Depth Anything V2 followed by the Roberts operator to initialize coarse depth and edge maps respectively, both of which are further aligned through an erosion-dilation strategy to generate fine-grained homogeneous boundaries for guiding patch deformation. In addition, we reform view selection weights as visibility maps and restore visible areas by cross-view depth reprojection, then regard them as cross-view prior to facilitate visibility-aware patch deformation. Finally, we improve propagation and refinement with multi-view geometry consistency by introducing aggregated visible hemispherical normals based on view selection and local projection depth differences based on epipolar lines, respectively. Extensive evaluations on ETH3D and Tanks & Temples benchmarks demonstrate that our method can achieve state-of-the-art performance with excellent robustness and generalization.
Abstract:Tailoring XAI methods to individual needs is crucial for intuitive Human-AI interactions. While context and task goals are vital, factors like user personality traits could also influence method selection. Our study investigates using personality traits to predict user preferences among decision trees, texts, and factor graphs. We trained a Machine Learning model on responses to the Big Five personality test to predict preferences. Deploying these predicted preferences in a navigation game (n=6), we found users more receptive to personalized XAI recommendations, enhancing trust in the system. This underscores the significance of customization in XAI interfaces, impacting user engagement and confidence.
Abstract:Optimization for robot control tasks, spanning various methodologies, includes Model Predictive Control (MPC). However, the complexity of the system, such as non-convex and non-differentiable cost functions and prolonged planning horizons often drastically increases the computation time, limiting MPC's real-world applicability. Prior works in speeding up the optimization have limitations on solving convex problem and generalizing to hold out domains. To overcome this challenge, we develop a novel framework aiming at expediting optimization processes. In our framework, we combine offline self-supervised learning and online fine-tuning through reinforcement learning to improve the control performance and reduce optimization time. We demonstrate the effectiveness of our method on a novel, challenging Formula-1-track driving task, achieving 3.9\% higher performance in optimization time and 3.6\% higher performance in tracking accuracy on challenging holdout tracks.
Abstract:Reconstructing textureless areas in MVS poses challenges due to the absence of reliable pixel correspondences within fixed patch. Although certain methods employ patch deformation to expand the receptive field, their patches mistakenly skip depth edges to calculate areas with depth discontinuity, thereby causing ambiguity. Consequently, we introduce Multi-granularity Segmentation Prior Multi-View Stereo (MSP-MVS). Specifically, we first propose multi-granularity segmentation prior by integrating multi-granularity depth edges to restrict patch deformation within homogeneous areas. Moreover, we present anchor equidistribution that bring deformed patches with more uniformly distributed anchors to ensure an adequate coverage of their own homogeneous areas. Furthermore, we introduce iterative local search optimization to represent larger patch with sparse representative candidates, significantly boosting the expressive capacity for each patch. The state-of-the-art results on ETH3D and Tanks & Temples benchmarks demonstrate the effectiveness and robust generalization ability of our proposed method.
Abstract:In this paper, we introduce Segmentation-Driven Deformation Multi-View Stereo (SD-MVS), a method that can effectively tackle challenges in 3D reconstruction of textureless areas. We are the first to adopt the Segment Anything Model (SAM) to distinguish semantic instances in scenes and further leverage these constraints for pixelwise patch deformation on both matching cost and propagation. Concurrently, we propose a unique refinement strategy that combines spherical coordinates and gradient descent on normals and pixelwise search interval on depths, significantly improving the completeness of reconstructed 3D model. Furthermore, we adopt the Expectation-Maximization (EM) algorithm to alternately optimize the aggregate matching cost and hyperparameters, effectively mitigating the problem of parameters being excessively dependent on empirical tuning. Evaluations on the ETH3D high-resolution multi-view stereo benchmark and the Tanks and Temples dataset demonstrate that our method can achieve state-of-the-art results with less time consumption.
Abstract:Interpretability in machine learning is critical for the safe deployment of learned policies across legally-regulated and safety-critical domains. While gradient-based approaches in reinforcement learning have achieved tremendous success in learning policies for continuous control problems such as robotics and autonomous driving, the lack of interpretability is a fundamental barrier to adoption. We propose Interpretable Continuous Control Trees (ICCTs), a tree-based model that can be optimized via modern, gradient-based, reinforcement learning approaches to produce high-performing, interpretable policies. The key to our approach is a procedure for allowing direct optimization in a sparse decision-tree-like representation. We validate ICCTs against baselines across six domains, showing that ICCTs are capable of learning policies that parity or outperform baselines by up to 33% in autonomous driving scenarios while achieving a 300x-600x reduction in the number of parameters against deep learning baselines. We prove that ICCTs can serve as universal function approximators and display analytically that ICCTs can be verified in linear time. Furthermore, we deploy ICCTs in two realistic driving domains, based on interstate Highway-94 and 280 in the US. Finally, we verify ICCT's utility with end-users and find that ICCTs are rated easier to simulate, quicker to validate, and more interpretable than neural networks.
Abstract:The reconstruction of textureless areas has long been a challenging problem in MVS due to lack of reliable pixel correspondences between images. In this paper, we propose the Textureless-aware Segmentation And Correlative Refinement guided Multi-View Stereo (TSAR-MVS), a novel method that effectively tackles challenges posed by textureless areas in 3D reconstruction through filtering, refinement and segmentation. First, we implement joint hypothesis filtering, a technique that merges a confidence estimator with a disparity discontinuity detector to eliminate incorrect depth estimations. Second, to spread the pixels with confident depth, we introduce a iterative correlation refinement strategy that leverages RANSAC to generate superpixels, succeeded by a median filter for broadening the influence of accurately determined pixels.Finally, we present a textureless-aware segmentation method that leverages edge detection and line detection for accurately identify large textureless regions to be fitted using 3D planes. Experiments on extensive datasets demonstrate that our method significantly outperforms most non-learning methods and exhibits robustness to textureless areas while preserving fine details.
Abstract:Multi-view stereo is an important research task in computer vision while still keeping challenging. In recent years, deep learning-based methods have shown superior performance on this task. Cost volume pyramid network-based methods which progressively refine depth map in coarse-to-fine manner, have yielded promising results while consuming less memory. However, these methods fail to take fully consideration of the characteristics of the cost volumes in each stage, leading to adopt similar range search strategies for each cost volume stage. In this work, we present a novel cost volume pyramid based network with different searching strategies for multi-view stereo. By choosing different depth range sampling strategies and applying adaptive unimodal filtering, we are able to obtain more accurate depth estimation in low resolution stages and iteratively upsample depth map to arbitrary resolution. We conducted extensive experiments on both DTU and BlendedMVS datasets, and results show that our method outperforms most state-of-the-art methods.
Abstract:Accurate recovery of 3D geometrical surfaces from calibrated 2D multi-view images is a fundamental yet active research area in computer vision. Despite the steady progress in multi-view stereo reconstruction, most existing methods are still limited in recovering fine-scale details and sharp features while suppressing noises, and may fail in reconstructing regions with few textures. To address these limitations, this paper presents a Detail-preserving and Content-aware Variational (DCV) multi-view stereo method, which reconstructs the 3D surface by alternating between reprojection error minimization and mesh denoising. In reprojection error minimization, we propose a novel inter-image similarity measure, which is effective to preserve fine-scale details of the reconstructed surface and builds a connection between guided image filtering and image registration. In mesh denoising, we propose a content-aware $\ell_{p}$-minimization algorithm by adaptively estimating the $p$ value and regularization parameters based on the current input. It is much more promising in suppressing noise while preserving sharp features than conventional isotropic mesh smoothing. Experimental results on benchmark datasets demonstrate that our DCV method is capable of recovering more surface details, and obtains cleaner and more accurate reconstructions than state-of-the-art methods. In particular, our method achieves the best results among all published methods on the Middlebury dino ring and dino sparse ring datasets in terms of both completeness and accuracy.